In healthcare data research, accuracy and completeness are paramount. Researchers and analysts need to know that the data they rely on is not only comprehensive but also trustworthy. At Veritas Data Research, we understand that over-cleaning data before delivery can unintentionally filter out valuable, unique record information. 

That’s why our approach is different: we deliver all sourced records to give our customers the flexibility to apply their own downstream cleaning and consolidation methodologies, preserving flexibility for each use case and maximum transparency for users about how data is transformed prior to inclusion in analytics.  

However, we don’t believe in simply shifting the hard work of data cleaning and curation to our customers. Therefore, as part of our data transparency efforts, each record we deliver includes a confidence score to help users understand how robust the source data behind the record is. These Veritas record confidence scores are a critical tool for ensuring data quality and helping clients make informed decisions about which records to include. 

What does Veritas score mortality records for robustness?  

The Veritas Fact of Death confidence score is a quantitative, dynamic measure of a death record’s accuracy and reliability. This score reflects the quality and integrity of all the underlying sources that contributed to the death record. This score allows our clients to instantly assess the robustness of a record and streamline their data selection process for specific analytic projects. 

How We Calculate Confidence for Death Records 

The mortality confidence score is derived from a sophisticated weighting model based on three principal factors: 

  1. Source Type: This variable refers to the origin of the source records. Sources are not all created equally. We classify them based on their proximity to ‘gold standard’ records, which are typically those closely aligned with government-sourced death reports (e.g., Social Security Administration (SSA) or state databases). Lower-weighted sources include interment sites, memorial sites, obituaries, or funeral home notifications. 
  1. Diversity of Sources: This variable measures the variety of source types contributing to a record. A record supported by a combination of high-reliability government data and corroborating memorial or funeral home data will generally receive a higher confidence score than a record supported by only one type of source. 
  1. Source Count: This variable represents the raw number of contributing sources per record. While a high count is beneficial, the weighting model ensures that the quality and diversity of the sources are prioritized over just the sheer number. 

Our proprietary system dynamically weights sources based on their demonstrated correlation and alignment with government-sourced data. This continuous re-alignment ensures that the mortality confidence score remains robust, evidence-based, and highly reliable for research and operational applications. 

Managing Variations: The Likely Duplicate Indicator 

In the world of complex, real-world data, variations are inevitable. Potential duplicate records occur when minor differences are reported by the underlying sources for a person’s death. 

Veritas assigns a unique traceability number to each unique deceased individual based on four key attributes: first name, last name, date of birth, and date of death, and when all of these attributes match in the underlying sources, they will be consolidated into one death record. However, when one of those four attributes differs (e.g. an obituary reports death on January 5th, but the SSA reports it as January 6th), Veritas will create two unique records so that our users have full transparency to the underlying death data. However, many users will rightly believe that it is acceptable to consolidate these similar records as they most likely represent the same person. To facilitate this process, Veritas includes a Likely Duplicate Indicator for records of this type. This flag is appended to each potential duplicate, and points the user back to the anchor record that Veritas believes is the best to use. This anchor record is selected by the system by choosing the likely duplicate record with the highest confidence score. 

Transparent Data You Can Trust, for Research You Can Stand Behind  

The Likely Duplicate Indicator works alongside the confidence score to highlight records that possess variations in key underlying attributes, suggesting they may represent the same individual. This dual approach provides a holistic view: 

  • The confidence score tells you how reliable the record is 
  • The Likely Duplicate Indicator tells you how unique the record is 

This system puts the control into the hands of the user, allowing them to make nuanced decisions on record matching and de-duplication, ensuring they don’t prematurely discard a potentially critical piece of data. 

The Veritas mortality confidence score is more than just a number; it is our commitment to data integrity and the engine of operational efficiency for our partners. By providing this transparent, weighted measure, we allow data scientists to focus their resources on analysis and workflow integration rather than on preliminary data scrubbing, leading directly to improved operational efficiency and more confident research outcomes.