First International Workshop on Population Informatics for Big Data

Co-located with the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Social genomes are the digital footprints of individuals. They consist of records about people's interactions with governments, businesses, and other individuals, as collected and linked from many data sources. Social genomes are the basis of Population Informatics, the emerging discipline of studying populations by analyzing large population databases that contain detailed information about people, such as the health, education, financial, census, location, shopping, employment, or social networking records of a large proportion of individuals in a population.

Population Informatics is a crucial enabling technology to understand our rapidly changing dynamic societies. It is transforming how researchers in many domains address the global challenges we face today, and how businesses and governments make decisions. Population informatics can realize the potential of Big Data by employing methods such as data mining, data integration, visualization, health informatics, statistics, computational social science, and privacy technologies, on the increasingly large digital traces of individuals. It will provide fresh insights into many domains, for example the social sciences, public health, and demographics, to inform government policies and improve business processes.

PopInfo'15 will be an interdisciplinary workshop and we call for papers on both data mining algorithms and techniques for Population Informatics, as well as papers about applications of Population Informatics in diverse areas.

New *The workshop program is now available.

Click here to download the workshop flyer

Information about registration via the main SIGKDD registration system

We are calling for papers, both research and applications, and from both academia and industry, for presentation at the workshop. All papers will go through peer-review by a program committee of international experts, and accepted papers will be published on the workshop Website, with selected papers to be invited for extension and inclusion into a possible special issue of a relevant journal.

PopInfo'15 invites contributions addressing current research in Population Informatics, as well as experiences, novel applications and future challenges. Topics of interest include, but are not restricted to:

  • Algorithms and techniques for managing, processing, analyzing, and mining large population databases
  • Requirements analysis for Population Informatics
  • Models and algorithms for Population Informatics
  • Scalable algorithms for dealing with dynamic and temporal population databases
  • Scalable algorithms for dealing with uncertain and probabilistic population databases
  • Scalable algorithms for dealing with privacy aspects in Population Informatics
  • Parallel and distributed algorithms for large-scale Population Informatics
  • Visualization, visual analytics, and user-interfaces for Population Informatics
  • Architectures and frameworks for Population Informatics
  • Large-scale Population Informatics in the cloud
  • Research case studies of Population Informatics in health, demographics, ecology, economics, the social sciences, and other research domains
  • Applications of Population Informatics in governments and businesses
  • Policy issues around population databases and Population Informatics, and using Population Informatics to manage public resources
  • Ethical, social, privacy, and confidentiality aspects when dealing with population databases

Workshop paper submissions Friday 5 June 2015
Workshop paper notifications Tuesday 30 June 2015
Final submission of accepted papers Wednesday 15 July 2015
Workshop date Monday 10 August 2015

We invite two types of submissions for PopInfo’15:

  • Research submissions:
    Normal academic submissions reporting on research progress, novel algorithms and techniques relevant to Population Informatics.

  • Application / case study submissions:
    Papers of this type report on example implementations, experiences, and case studies of Population Informatics from a diverse range of application areas. We encourage submissions from researchers outside of the main KDD/computer science community, as well as from practitioners working in government and industry.

Paper submissions are required to follow the standard double-column ACM Proceedings Template ( We accept papers of length between 4 and 10 pages, including references, diagrams, and appendices. LaTeX styles and Word templates may be found on the above site. LaTeX is the recommended typesetting package.

We encourage shorter papers that describe ongoing work relevant to Population Informatics, or initial results of larger projects. As per KDD tradition, reviews are not double-blind, and author names and affiliations should be listed.

The electronic submissions must be in PDF only, and made through the PopInfo'15 Submission system.

We are pleased to announce that Assoc Prof Hye-Chung Kum will be giving the keynote at PopInfo’15.

Titie : Social Genome: Putting Big Data to Work for Population Informatics

Abstract : Population informatics is the burgeoning field at the intersection of social sciences, health sciences, computer science, and statistics that applies quantitative methods and computational tools to answer questions about human populations. It relies on using distributed, federated, person-level datasets, our social genome, in near real time to transform social, behavioral, economic, and health sciences but issues around privacy, confidentiality, access, and data integration have slowed progress in this area. The social genome represents a core set of data that information scientists can use to explore connections, build theories, and propel breakthroughs in managing a society. When technology is properly used to manage both privacy concerns and uncertainty, big data technology will help move the growing field of population informatics forward. This will enable big data to be used for the benefit of society in areas like population health, just as it has been used for intelligence and marketing. We will touch on topics of knowledge base platform required for the social genome data infrastructure, secure data access, privacy preserving data integration, and privacy preserving data analysis.

Biography : Dr. Hye-Chung Kum is an associate professor at the School of Public Health at Texas A&M. She holds a joint appointment in the Department of Computer Science at the University of North Carolina at Chapel Hill (UNC-CH). She received her Ph.D. (2004) in Computer Science and MSW (1998) in Policy and Management from UNC-CH. She is the founder and co-lead of the Population Informatics Research Group which applies informatics, data science, and computational methods to the increasingly large digital traces available about people to advance public health, social science, and population research by bringing together domain experts and computer science students. Her vision paper on population informatics and social genome was published in the IEEE Computer Special Outlook Issue in January 2014.

To provide an application perspective for Population Informatics, we are pleased to announce that Dr James Farrow will be speaking on his work designing and implementing next generation health data linkage applications.

Titie : Are relational databases the right tool for data linkage?

Abstract : Betteridge’s Law of Headlines would tell us, ‘No!’ Record linkage and linked data management is all about relationships between records, yet the dominant paradigm is to store and manipulate data using tools which are great for storing record data but suboptimal for querying the relationships between records.

Graph databases improve on this situation. Graph databases, in addition to storing record level data, allow the relationships between data to be explicitly and efficiently described and managed as first order objects. Emergent patterns of the nodes (records) and edges (relationships) and their properties can be therefore explored. SA-NT DataLink has built a system using graph databases and algorithms to store, manipulate and query linked record data. This next generation link management system will be described along with technical descriptions and benefits of the approach.

Biography : Dr James Farrow is a computer scientist and software engineer working with SA-NT DataLink to develop techniques based on graph theory and using graph databases for the better management and exploration of linked data to enhance research outcomes. He has worked in the areas of machine learning and text classification for NSW Health and ASIC, mapping and visualisation of historical and near real-time geocoded information for NSW Health, and biomedical record linkage for SA-NT DataLink. He helped designed and prototype SURE, a secure research environment for linked data for the Population Health Research Network (PHRN). He has recently developed a new technique for the anonymisation of geospatial data which removes location information but preserves the ability to make distance comparisons

07:30 – 09:00 Arrival Coffee / Registration
Location: Level 2 & Level 4 Pre-Function Areas
09:15 - 09:30 Workshop Opening and Welcome Note
Location: Level 1 Meeting Room 5
09:30 - 10:30 Invited Talk – Are relational databases the right tool for data linkage?
James Farrow, SA NT DataLink
10:30 – 11:00 Morning Break
Location: Level 2 & Level 4 Pre-Function Areas
11:00 - 11:30 Historical Population Informatics: Studying Migration using Big Data of Family (Full paper)
D Guo, A B Kasakoff, C Koylu, Y Huang and J Grieve
11:30 - 12:00 Towards population reconstruction: extraction of family relationships from historical documents (Full paper)
J Efremova, A M García, J Zhang and T Calders
12:00 - 12:30 Minimizing Dissemination in a Population While Maintaining its Community Structure (Full paper)
C Zhang and T Eliassi-Rad
12:30 – 13:30 Lunch Break
Location: Level 2 & Level 4 Pre-Function Areas
13:30 - 14:30 Invited Keynote – Social Genome: Putting Big Data to Work for Population Informatics
Hye-Chung Kum, School of Public Health Texas A&M
14:30 - 15:00 Privacy preserving record linkage using homomorphic encryption (Full paper)
S Randall, A Brown, A Ferrante, J Boyd and J Semmens
15:00 – 15:30 Afternoon Break
Location: Level 2 & Level 4 Pre-Function Areas
15:30 - 16:00 Grouping methods for ongoing record linkage (Full paper)
S Randall, J Boyd, A Ferrante, A Brown and J Semmens
16:00 - 16:20 Modelling the spread of influenza in Western Australia (Short paper)
A Saavedra, S Wood, J Geoghegan, E Holmes and H Durrant-Whyte
16:20 - 16:40 Social genome mining for crisis prediction (Short paper)
P Wlodarczak, J Soar and M Ally
16:40 - 17:00 Understanding and Improving Measurement of Quality of Residential Care in Australian Aged Care Audit Reports (Short paper)
P Yu, S Qian and T Jiang (Please email the authors to get a copy of the paper.)
17:00 - 17:15 Closing remarks

- Each full paper is allocated with 25 minutes for the presentation plus 5 minutes for Q&A
- Each short paper is allocated with 15 minutes for the presentation plus 5 minutes for Q&A

Robert Ackland The Australian National University, Australia
Luiza Antonie University of Guelph, Canada
Rohan Baxter Australian Taxation Office, Australia
Elisa Bertino Purdue University, USA
Gerrit Bloothooft Utrecht University, The Netherlands
James Caverlee Texas A&M University, USA
Ahmed Elmagarmid Qatar Computing Research Institute, Qatar
Ross Gayler Connected Analytics, Australia
Ashok Krishnamurthy RENCI / University of North Carolina, USA
Ashwin Machanavajjhala Duke University, USA
Brad Malin Vanderbilt University, USA
Norman Mohammed University of Manitoba, Canada
Christine O’Keefe CSIRO, Australia
Rainer Schnell University of Duisburg-Essen, Germany
Vassilios Verykios Hellenic Open University, Greece
Jim Warren University of Auckland, New Zealand
William Winkler US Bureau of the Census, USA
Ping Yu University of Wollongong, Australia

Peter Christen The Australian National University, Canberra
Erhard Rahm University of Leipzig, Germany
Qing Wang The Australian National University, Canberra
Dinusha Vatsalan The Australian National University, Canberra
Thilina Ranbaduge The Australian National University, Canberra (Web master)

Contact us at