What is Record Linkage?

Record linkage allows researchers and policy makers to study trends and patterns in whole population groups, by bringing together routinely collected information from different sources. When linkage is conducted, an individual’s records from different agencies are brought together by a third party and provided to the researchers in a way that protects anonymity (that is, without personally identifying any individual). It provides a safe and secure way to bring together relevant information that enables researchers to examine relationships between events at a population (not an individual) level.

Because of the power provided by large population samples, record linkage is often used to answer research questions that cannot be studied accurately in smaller samples (where outcomes of interest may be rare, and therefore require large numbers of individuals for them to occur at all). Of equal importance, sometimes the use of administrative records can be useful when self-reported information (i.e., via in interviews or questionnaires) may bring problems of feasibility or accuracy. For example, people can’t always accurately remember events that happened a long time ago; instead, records that were made at the time of the event (e.g., at a child’s birth) are likely to be more accurate than trying to remember detailed information years later.

As an example, using record linkage, health services information for an entire population can be combined with information from other departments or agencies, such as education, to study questions that could not be answered accurately using any other method. For example, researchers could see whether birth weight has an influence on children’s readiness to learn at school, or whether better social and emotional functioning at school entry is related to later scholastic achievement.

Record Linkage can also bring together a child’s records with those of his/her parents to provide information about important influences in the child’s life – for example, to find out how significant events in the parents’ lives (such as hospitalisation for serious illness or a court appearance), might impact on children’s health and wellbeing. Collection of information about these possible influences on the lives of children could not be gathered accurately for a population cohort in any other way. We are fortunate that the NSW government has provided the infrastructure to enable this research to be undertaken in a way that protects the privacy of the people involved.

To see an example of the process of possible linkage of the Middle Childhood Survey (MCS) data to other datasets visit Principles of Record Linkage.

Who gives permission for records to be linked?

Before any record linkage projects can be undertaken, multiple approvals are required to ensure that all Commonwealth and State privacy and security regulations and laws are upheld, that the research is of significant scientific merit (that is, it asks important and relevant questions), and that there is no risk that an individual may be personally identified at any stage of the research.

The following approvals must be obtained before a record linkage project can be undertaken:

  • An authorised Human Research Ethics Committee (e.g., the NSW Population and Health Services Research Ethics Committee) MUST approve the linkage project, and will set an expiry date by which the record linkage project must be completed;
  • Data Custodians within the organisation responsible for each dataset MUST approve the use of their data in the linkage project;
  • The Record Linkage Integrating Authority (e.g., in NSW this is the Centre for Health Record Linkage) MUST approve the feasibility of the linkage project;
  • All government-owned data MUST be approved for linkage by the relevant Department’s own ethics committee.

How is Record Linkage done to ensure that privacy is protected?

In Australia, there are several third-party record linkage providers for data owned by State/Territory or Commonwealth Agencies. One of these linkage providers is the Centre for Health Record Linkage (CHeReL). This is the agency in NSW that provides linkage services for the NSW Child Development Study (NSW-CDS). It is important to note that the anonymous linkage of data requires cooperation from the CHeReL in direct liaison with Data Custodians within the organisations that are providing data for linkage, as follows.

For each research linkage study, the CHeReL receives a set of personal identifiers (information such as Name, Date of Birth, Postcode) from each Data Custodian for all the individual records to be linked. Using these sets of personal identifiers from different datasets, the CHeReL computer finds personal identifiers that ‘match’ (that is, belong to the same person), and generates a new, Project-specific Linkage ID code (‘Linkage IDs’) for these ‘matches’. The CHeReL pairs Linkage IDs with the unique ‘Record ID’ in each Data Custodian’s original file, and then sends these paired Linkage IDs and Record IDs (for all matched cases) back to the Data Custodians of each organisation. This is the end of the process for the CHeReL.

From here, each Data Custodian extract the research data (the information needed by the researchers) from their database, and removes the Record IDs from the research data files, leaving only the research data and the Linkage IDs for provision to researchers. Each Data Custodian then sends the researcher the Linkage IDs and their corresponding record content (data) without any personal identifying information. This is the end of the process for the Data Custodian.

Next, the Research Team receives from each Data Custodian the research data and Project-specific Linkage IDs, and links together data from different sources (e.g., Education, Health, etc.) that relates to the same person, but without being able to identify anyone personally. For further information about how the CHeReL operates record linkage, click here

This process of linkage ensures that the research databases for the NSW Child Development Study do not contain any information that could identify a child, parent/caregiver, or school. Only non-identifiable data is provided to the NSW-CDS Research Team.

How was privacy protected for those who completed the Middle Childhood Survey?  

The NSW-CDS team worked with a third party during the collection of the Middle Childhood Survey (MCS) questionnaire to ensure that the research team only receive the child’s responses (research data), while personal identifying information used in linkage processes are stored separately (held by separate organisations), so that no individual who participated in the MCS can ever be identified. This was done by coding the MCS data with a unique MCS-ID number.

Only the researchers named on the appropriate ethical approvals have access to the data collected during the MCS, and the University of New South Wales remains the Data Custodian for any future linkage of MCS data. On behalf of the University of New South Wales, the CHeReL holds only the MCS-ID and personal identifiers (without having any access to the MCS data).

As with all record linkage projects, the processes conducted by the third party linkage provider (the CHeReL) ensure that researchers cannot re-identify study participants, because the researchers do not hold the personal identifiers for the research participants – they access only the research data that is coded by the unique MCS-ID number (linked to other record sets via the Project –specific linkage ID that is generated by the CHeReL, as described in the section above: How is Record Linkage done to ensure that privacy is protected). Staff at the CHeReL will only access the personal identifiers associated with each MCS-ID number, but have no access to research data.

