First International Workshop on Population Informatics for Big Data
Social genomes are the digital footprints of individuals. They consist of records about people's interactions with governments, businesses, and other individuals, as collected and linked from many data sources. Social genomes are the basis of Population Informatics, the emerging discipline of studying populations by analyzing large population databases that contain detailed information about people, such as the health, education, financial, census, location, shopping, employment, or social networking records of a large proportion of individuals in a population.
Population Informatics is a crucial enabling technology to understand our rapidly changing dynamic societies. It is transforming how researchers in many domains address the global challenges we face today, and how businesses and governments make decisions. Population informatics can realize the potential of Big Data by employing methods such as data mining, data integration, visualization, health informatics, statistics, computational social science, and privacy technologies, on the increasingly large digital traces of individuals. It will provide fresh insights into many domains, for example the social sciences, public health, and demographics, to inform government policies and improve business processes.
PopInfo'15 will be an interdisciplinary workshop and we call for papers on both data mining algorithms and techniques for Population Informatics, as well as papers about applications of Population Informatics in diverse areas.
New *: The workshop program is now available.
Click here to download the workshop flyer
Information about registration via the main SIGKDD registration system
We are calling for papers, both research and applications, and from both academia and industry, for presentation at the workshop. All papers will go through peer-review by a program committee of international experts, and accepted papers will be published on the workshop Website, with selected papers to be invited for extension and inclusion into a possible special issue of a relevant journal.
PopInfo'15 invites contributions addressing current research in Population Informatics, as well as experiences, novel applications and future challenges. Topics of interest include, but are not restricted to:
Workshop paper submissions | Friday 5 June 2015 |
Workshop paper notifications | Tuesday 30 June 2015 |
Final submission of accepted papers | Wednesday 15 July 2015 |
Workshop date | Monday 10 August 2015 |
We invite two types of submissions for PopInfo’15:
Paper submissions are required to follow the standard double-column ACM Proceedings Template (http://www.acm.org/sigs/publications/proceedings-templates). We accept papers of length between 4 and 10 pages, including references, diagrams, and appendices. LaTeX styles and Word templates may be found on the above site. LaTeX is the recommended typesetting package.
We encourage shorter papers that describe ongoing work relevant to Population Informatics, or initial results of larger projects. As per KDD tradition, reviews are not double-blind, and author names and affiliations should be listed.
The electronic submissions must be in PDF only, and made through the PopInfo'15 Submission system.
We are pleased to announce that Assoc Prof Hye-Chung Kum will be giving the keynote at PopInfo’15.
Titie : Social Genome: Putting Big Data to Work for Population Informatics
Abstract : Population informatics is the burgeoning field at the intersection of social sciences, health sciences, computer science, and statistics that applies quantitative methods and computational tools to answer questions about human populations. It relies on using distributed, federated, person-level datasets, our social genome, in near real time to transform social, behavioral, economic, and health sciences but issues around privacy, confidentiality, access, and data integration have slowed progress in this area. The social genome represents a core set of data that information scientists can use to explore connections, build theories, and propel breakthroughs in managing a society. When technology is properly used to manage both privacy concerns and uncertainty, big data technology will help move the growing field of population informatics forward. This will enable big data to be used for the benefit of society in areas like population health, just as it has been used for intelligence and marketing. We will touch on topics of knowledge base platform required for the social genome data infrastructure, secure data access, privacy preserving data integration, and privacy preserving data analysis.
Biography : Dr. Hye-Chung Kum is an associate professor at the School of Public Health at Texas A&M. She holds a joint appointment in the Department of Computer Science at the University of North Carolina at Chapel Hill (UNC-CH). She received her Ph.D. (2004) in Computer Science and MSW (1998) in Policy and Management from UNC-CH. She is the founder and co-lead of the Population Informatics Research Group which applies informatics, data science, and computational methods to the increasingly large digital traces available about people to advance public health, social science, and population research by bringing together domain experts and computer science students. Her vision paper on population informatics and social genome was published in the IEEE Computer Special Outlook Issue in January 2014.
To provide an application perspective for Population Informatics, we are pleased to announce that Dr James Farrow will be speaking on his work designing and implementing next generation health data linkage applications.
Titie : Are relational databases the right tool for data linkage?
Abstract : Betteridge’s Law of Headlines would tell us, ‘No!’ Record linkage and linked data management is all about relationships between records, yet the dominant paradigm is to store and manipulate data using tools which are great for storing record data but suboptimal for querying the relationships between records.
Graph databases improve on this situation. Graph databases, in addition to storing record level data, allow the relationships between data to be explicitly and efficiently described and managed as first order objects. Emergent patterns of the nodes (records) and edges (relationships) and their properties can be therefore explored. SA-NT DataLink has built a system using graph databases and algorithms to store, manipulate and query linked record data. This next generation link management system will be described along with technical descriptions and benefits of the approach.
Biography : Dr James Farrow is a computer scientist and software engineer working with SA-NT DataLink to develop techniques based on graph theory and using graph databases for the better management and exploration of linked data to enhance research outcomes. He has worked in the areas of machine learning and text classification for NSW Health and ASIC, mapping and visualisation of historical and near real-time geocoded information for NSW Health, and biomedical record linkage for SA-NT DataLink. He helped designed and prototype SURE, a secure research environment for linked data for the Population Health Research Network (PHRN). He has recently developed a new technique for the anonymisation of geospatial data which removes location information but preserves the ability to make distance comparisons
07:30 – 09:00 | Arrival Coffee / Registration Location: Level 2 & Level 4 Pre-Function Areas |
09:15 - 09:30 | Workshop Opening and Welcome Note Location: Level 1 Meeting Room 5 |
09:30 - 10:30 | Invited Talk – Are relational databases the right tool for data linkage? James Farrow, SA NT DataLink |
10:30 – 11:00 | Morning Break Location: Level 2 & Level 4 Pre-Function Areas |
11:00 - 11:30 | Historical Population Informatics: Studying Migration using Big Data of Family (Full paper) D Guo, A B Kasakoff, C Koylu, Y Huang and J Grieve |
11:30 - 12:00 | Towards population reconstruction: extraction of family relationships from historical documents (Full paper) J Efremova, A M García, J Zhang and T Calders |
12:00 - 12:30 | Minimizing Dissemination in a Population While Maintaining its Community Structure (Full paper) C Zhang and T Eliassi-Rad |
12:30 – 13:30 | Lunch Break Location: Level 2 & Level 4 Pre-Function Areas |
13:30 - 14:30 | Invited Keynote – Social Genome: Putting Big Data to Work for Population Informatics Hye-Chung Kum, School of Public Health Texas A&M |
14:30 - 15:00 | Privacy preserving record linkage using homomorphic encryption (Full paper) S Randall, A Brown, A Ferrante, J Boyd and J Semmens |
15:00 – 15:30 | Afternoon Break Location: Level 2 & Level 4 Pre-Function Areas |
15:30 - 16:00 | Grouping methods for ongoing record linkage (Full paper) S Randall, J Boyd, A Ferrante, A Brown and J Semmens |
16:00 - 16:20 | Modelling the spread of influenza in Western Australia (Short paper) A Saavedra, S Wood, J Geoghegan, E Holmes and H Durrant-Whyte |
16:20 - 16:40 | Social genome mining for crisis prediction (Short paper) P Wlodarczak, J Soar and M Ally |
16:40 - 17:00 | Understanding and Improving Measurement of Quality of Residential Care in Australian Aged Care Audit Reports (Short paper) P Yu, S Qian and T Jiang (Please email the authors to get a copy of the paper.) |
17:00 - 17:15 | Closing remarks |
Note:
- Each full paper is allocated with 25 minutes for the presentation plus 5 minutes for Q&A
- Each short paper is allocated with 15 minutes for the presentation plus 5 minutes for Q&A
|
|
Peter Christen | The Australian National University, Canberra |
Erhard Rahm | University of Leipzig, Germany |
Qing Wang | The Australian National University, Canberra |
Dinusha Vatsalan | The Australian National University, Canberra |
Thilina Ranbaduge | The Australian National University, Canberra (Web master) |