Using de-identified health data: Why is training crucial to overcome the challenges?

December 2, 2022

Data is one of the most valuable assets that an organization has. It’s required in order to run all business activities, to innovate the services that we offer to customers and to add value for all. While data is one of the greatest assets, it can also be a source of vulnerability; so it is vital that organizations protect their data whilst using it to its full potential.

In this blog, Helen Smith, experienced Analyst in the health sector talks about the sort of data she used in the past and what for, the challenges she had faced using de-identified data, how did she overcome these challenges and she shares her top three tips for newcomers to using de-identified data.

And we want to say a huge thanks to Helen for sharing these insights with us.

What sort of data do you use and what for?

I have worked with de-identified data for many years working across a range of health organisations.

Much of the data I analyse is data about people’s stays in hospital. The data can be aggregate data or person-level data. This data has all the identifiers removed such as name address, detailed location etc. and some fields have been transformed to reduce the data’s identifiability.  I also use linked data sometimes. All use of data is highly controlled and subject to significant security and governance processes, and can only be used for legitimate purposes, on a need-to-know basis.

Within my job, I identify trends in health that can be used to improve health services and health outcomes. Over my career, I have been very involved and passionate about using health data to help address health inequalities to help ensure that our health services meet the needs of all our population.

What are the challenges you have faced using de-identified data?

There are several challenges I work around.

  1. Awareness of the data, how it was collected, its limitations and its quality is not always passed onto organisations who do secondary analysis using data collected at multiple hospitals, increasing the chance of inappropriate interpretations from the data. 
  • The level of data transformation - Data in essential fields such as ethnicity can be aggregated to suppress small numbers, in order to reduce the identifiability of data.  There are many factors that impact ethnic coding including if people feel that their ethnicity fits in the coding structure used to categorise them. Aggregation can prevent analysis of trends for specific ethnic groups that may be suffering from different types of diseases at different times and who may benefit from more targeted interventions. Data quality and completeness can also be worse for some ethnic groups which further exacerbates this situation. The impact on our population of this challenge is that we often cannot use hospital stay data to identify the specific health trends for these minority groups meaning it is harder to put in proactive interventions to support them. This means that some minority groups may miss out on earlier interventions which may prevent them from getting more serious conditions.
  • The data quality of identifiers in health records has a big impact on the quality of data linkage which in turn needs to be addressed in how analysis is produced and interpreted. This is a really big problem for newly collected datasets. Not all source systems produce data in the same way even though, they are meant to. Identifier quality can vary and can be worse for certain population groups for many reasons including where they have common or tricky to spell names. It is important that care is taken to interpret and caveat analysis of linked data to avoid bias and flawed decision-making.

If you’re keen to learn more about how to build individual capability to use data responsibly, or to learn more about the different purposes you can use data for while maintaining legality, fairness, and transparency, the Responsible Data Usage Course is for you.

How do you overcome these challenges?

I personally train all staff that I work with at induction to understand the challenges and solutions I have come across so they can be more effective using data.

I make staff aware of the rules which must be abided by for addressing small numbers and sharing and publicising data and ensure that a data usage declaration is signed.

I encourage analysts to be curious about how data was collected. I encourage them to ask questions to determine where there is any potential for bias in data, to consider the impact of fields that are self-reported or independently collected.

Having access to mandatory training on the responsible use of data before staff get access is a great way to ensure that everyone knows what they are doing with data and helps our organisations be able to assure that their staff know their responsibilities when using health data.  This is important as traditional privacy courses do not always provide the practical advice needed.

When working with transformed data or aggregated data all you can really do is make the best of the data you have got. It is. However. important to caveat data and explain for example that the trends for grouped ethnic categories may disguise very different trends which may exist for the individual populations and that further work may be needed to investigate these

We are starting to profile data before linking it to understand population structures and then profiling linked population structures. This isn’t a precise method, but it does help to identify if certain age groups or population groups are less represented in the linked data.

If you’re keen to learn more about how to protect against the chance of a personal-data breach and/or loss of confidential business-sensitive data, or need to balance the responsibility for protecting privacy and building and maintaining public and consumer trust, the Quality Private Data Course is for you.

What are your top three tips for newcomers to using de-identified data?

  1. Firstly, understand your data, how it was collected, rules, disclosure rules, and issues with data, and compare analysis with any analysis which has been done on that data before.
  2. Secondly, where data is too aggregated you have to accept the data and communicate the data limitations. Make sure you don’t overinterpret the data and ensure that you caveat limitations appropriately.
  3. Finally, ensure you know your responsibilities through using appropriate training to use data effectively enabling you to both protect data and get accurate results from it.

©2007-2022 IT Training Zone Ltd – a Peoplecert, EXIN and APMG accredited training organisation.
ITSM Zone is a trading name of IT Training Zone Ltd.

​Payments we accept
ITIL Verified Leader Badge
Copyright © 2021. All rights reserved.
hello world!