Teams

Linking Censuses

The goal of our project is to create longitudinal data linking persons across Canadian, American and British population census and military enlistment databases from 1851 to 1911. Large databases of this nature help us to answer a wide range of questions about society, economy and population health. On its own, each database would provide only a static snapshot of the population at one moment in time. Linking the databases together by means of tracking individuals between sources will, however, greatly enhance their research value and reconstitute, for the first time, the life trajectories of large numbers of individuals before 1980. The infrastructure will provide historians, social scientists and population health researchers interested in long-term change with a source similar in principle to modern longitudinal data provided by Statistics Canada and other statistical agencies.
Linking the records requires custom building a record-linking system that will allow us to identify the same person in two or more sources. Matching the records is a challenging and complex problem where one must rely on the attributes describing the entity (name, age, marital status, birthplace, etc.) and determine whether two (or more) records identify the same person. With a typical Canadian census having more than 4 million records and an American census of over 60 million, the computational expense of this comparison is clear. There are more challenges that add complexity to the problem: different database formats, typographical errors, missing data, and ill-reported data, to name only a few. Our project aims to develop a record linkage system that incorporates a supervised learning module for classifying pairs of entities as matches or non-matches.

A couple of lines about us:
Dr. Luiza Antonie is a Postdoctoral Fellow in the Department of Economics (Historical Data Research Unit) at the University of Guelph. She provides technical expertise in system design and development and leads the data processing and classification aspects of the project.
http://www.uoguelph.ca/~lantonie/

Dr. J. Andrew Ross is a Postdoctoral Fellow in the Historical Data Research Unit of the Departments of History and Economics at the University of Guelph. Dr. Ross provides expertise in the design, construction, and interpretation of longitudinal data sets.
http://www.uoguelph.ca/~jaross/