Skip to Main Content

Research Data Management

What is Sensitive Research Data?

Sensitive Research Data is information that must be safeguarded against unwarranted access or disclosure, and may include:

  •  Personal information
  • Personal health information
  • Educational records
  • Customer records
  • Financial information
  • Criminal information
  • Geographic information
  • Confidential personnel information
  • Information that is deemed to be confidential; information entrusted to a person, organization or entity with the intent that it be kept private and access be controlled or restricted.
  • Information that is protected by institutional policy from unauthorized access

Sensitive data requires careful handling and protection, and often is not suitable for sharing. However, there may be ways to share sensitive data legally and ethnically such as anonymizing, aggregating or restricting access to the data. The FRDR Controlled Access Management Program is an example of a Canadian project for sharing sensitive research data. 

Sensitive Data Toolkit

The sensitive data toolkit helps you identify the risk level of your data, and how to manage it throughout the research lifecycle, based on the risk level. It also offers sample language for informed consent. 

Sensitive Data Toolkit for Researchers Part 1: Glossary of Terms for Sensitive Data used for Research Purposes

Sensitive Data Toolkit for Researchers Part 2: Human Participant Research Data Risk Matrix

Sensitive Data Toolkit for Researchers Part 3: Research Data Management Language for Informed Consent

Indigenous Data Management

As a university dedicated to Indigenous post-secondary education, we want to make special note on data sovereignty for Indigenous Data. We recognize that the First Nations Information Governance Center’s principles of OCAP (Ownership, Control, Access and Possession) of research data collection is owned and controlled by First Nations. (https://fnigc.ca/ocap-training/).

The Tri Agencies have specific language in the new Data Management Policy, in relation to Indigenous Data. CBU recommends that researchers follow these practices for all research by and with First Nations, Metis and Inuit communities, collectives and organizations as well as the TCPS 2, chapter 9: Research involving the First Nations, Inuit and Metis Peoples of Canada.

Regarding Data Management plans: “For research conducted by and with First Nations, Métis and Inuit communities, collectives and organizations, DMPs must be co-developed with these communities, collectives and organizations, in accordance with RDM principles or DMP formats that they accept. DMPs in the context of research by and with First Nations, Métis and Inuit communities, collectives and organizations should recognize Indigenous data sovereignty and include options for renegotiation of the DMP.” (https://www.science.gc.ca/eic/site/063.nsf/eng/h_97610.html )

De-Identification

De-identification is the process of removing or modifying information that might be used to identify someone in a dataset. By doing this researchers can share their data without disclosing sensitive information. However de-identification is not foolproof, there is always a possibility for identification, so researchers need to be aware of the risks and challenges involved. 

Methods of de-identification (reused from UBC Library

Method of de-identification Description Pros   Cons  
Anonymization the most strict form where all identifying information is removed from the dataset and cannot be restored. ensures a high level of privacy protection may reduce the usefulness and quality of the data
Pseudonymization identifying information is replaced with artificial identifiers, such as codes or numbers allows the data to be linked across different sources/datasets or over time increases the risk of re-identification if the codes are exposed or cracked
Aggregation individual data points are grouped together into categories or ranges preserves some statistical properties and patterns reduces the level of detail and variability in the data
Masking identifying information is hidden or obscured by using techniques such as encryption, hashing, blurring, or noise addition makes the data harder to read or interpret introduces errors or distortions in the data
Generalization identifying information is replaced with more general or vague terms. For example, dates can be replaced with years, addresses can be replaced with regions, or names can be replaced with initials preserves some semantic meaning and context makes the data less specific and more ambiguous