Data is one of the most valuable assets that an organization has. It’s required in order to run all business activities, to innovate the services that we offer to customers and to add value for all. While data is one of the greatest assets, it can also be a source of vulnerability; so it is vital that organizations protect their data whilst using it to its full potential.
In this blog, Helen Smith, experienced Analyst in the health sector talks about the sort of data she used in the past and what for, the challenges she had faced using de-identified data, how did she overcome these challenges and she shares her top three tips for newcomers to using de-identified data.
And we want to say a huge thanks to Helen for sharing these insights with us.
What sort of data do you use and what for?
I have worked with de-identified data for many years working across a range of health organisations.
Much of the data I analyse is data about people’s stays in hospital. The data can be aggregate data or person-level data. This data has all the identifiers removed such as name address, detailed location etc. and some fields have been transformed to reduce the data’s identifiability. I also use linked data sometimes. All use of data is highly controlled and subject to significant security and governance processes, and can only be used for legitimate purposes, on a need-to-know basis.
Within my job, I identify trends in health that can be used to improve health services and health outcomes. Over my career, I have been very involved and passionate about using health data to help address health inequalities to help ensure that our health services meet the needs of all our population.
What are the challenges you have faced using de-identified data?
There are several challenges I work around.
If you’re keen to learn more about how to build individual capability to use data responsibly, or to learn more about the different purposes you can use data for while maintaining legality, fairness, and transparency, the Responsible Data Usage Course is for you.
How do you overcome these challenges?
I personally train all staff that I work with at induction to understand the challenges and solutions I have come across so they can be more effective using data.
I make staff aware of the rules which must be abided by for addressing small numbers and sharing and publicising data and ensure that a data usage declaration is signed.
I encourage analysts to be curious about how data was collected. I encourage them to ask questions to determine where there is any potential for bias in data, to consider the impact of fields that are self-reported or independently collected.
Having access to mandatory training on the responsible use of data before staff get access is a great way to ensure that everyone knows what they are doing with data and helps our organisations be able to assure that their staff know their responsibilities when using health data. This is important as traditional privacy courses do not always provide the practical advice needed.
When working with transformed data or aggregated data all you can really do is make the best of the data you have got. It is. However. important to caveat data and explain for example that the trends for grouped ethnic categories may disguise very different trends which may exist for the individual populations and that further work may be needed to investigate these
We are starting to profile data before linking it to understand population structures and then profiling linked population structures. This isn’t a precise method, but it does help to identify if certain age groups or population groups are less represented in the linked data.
If you’re keen to learn more about how to protect against the chance of a personal-data breach and/or loss of confidential business-sensitive data, or need to balance the responsibility for protecting privacy and building and maintaining public and consumer trust, the Quality Private Data Course is for you.
What are your top three tips for newcomers to using de-identified data?