In this insightful conversation from September 5th 2025, Jason LaBonte of Veritas Data Research sits down with Arnaub Chatterjee of Datavant to discuss their partnership and its transformative impact on the use of mortality data in life sciences.
This conversation has been lightly edited for conciseness and clarity.
Arnaub: Okay. Hi everyone. So, I’m very excited. My name is Arnaub Chatterjee. I’m the President and GM of our Life Sciences business here at Datavant. By way of background, Datavant is focused on facilitating the secure sharing of real-world data across a network of more than 350 data partners within our ecosystem and we’re here to support the transformation of life sciences research and improve patient outcomes. This is a particularly exciting moment for us because we’re actually kicking off our first LinkedIn Live conversation as part of a new series in which I get the chance to chat with our real-world data partners across this data network and we want to hear what’s top of mind for them at this very moment within our industry. How they might be thinking about some of the key strategies within sciences and how the collaborations are actually unlocking new and more efficient ways to answer these all-important research questions. So, I’m actually going to turn it over to our special guest today Jason to introduce himself.
Jason: Thanks, Arnaub. So, hi everyone. I’m Jason LaBonte. I’m the CEO of Veritas Data Research. We focus on collecting, cleaning and delivering contextual data sets to augment traditional transactional real-world data like claims, EHRs and lab data and basically provide missing variables that we think are critical for expanding what the analytics are you can do with real world data. And we started with mortality data, and I think that’s what we’re going to talk about today.
Arnaub: Yeah. Amazing. So, let’s get into it. I think it’s a really interesting time we’re in and there’s a moment where the sensitivity and the value of data is at an all-time high. So, we know this. Like most of the data that we connect is particularly sensitive. But then you have mortality data which is an entirely different category of its own. And earlier this year Veritas and Datavant actually expanded our working relationship and our partnership so that more organizations can better discover or evaluate or link to mortality data in a privacy compliant way. And we think that’s going to ultimately empower the healthcare research world, the life sciences organizations to answer really critical questions with much more confidence. So, Jason and the Veritas team have been building what I think is the most complete and accurate mortality data set that’s available today. And we’re going to explore why this matters. And you know I think as part of the first question I kind of had for you, Jason, is you know what I guess how has the role and the perception of mortality data changed in recent years and where are you seeing sort of the biggest unmet needs today?
Jason: Yeah, great question. So, you know mortality is interesting. You’d think this would be, you know, if you had to pick one endpoint for healthcare analytics, you know, mortality might be it. But actually traditionally, it’s been pretty hard to access it as an endpoint, in terms of, you know, third party data. So, folks have used in the past Social Security Administration’s Death Master File. Up until about 2010, that was a pretty comprehensive file, had about 80, 85% of the deaths in the country. But after that, states were allowed to opt out. And so that file today is probably only 15% deaths in the country. CDC has a good, comprehensive research grade death data set called the National Death Index, but it’s not commercially available. You’re not allowed to use it for commercial analytics. And so, it’s really not that useful for a lot of things that we in industry want to do with this data. The same thing is true with the states. Some states have death data available, some don’t, but there are a lot of variable restrictions on how you can use that data. So, the landscape for kind of traditional, you know, government sourced death data has been really hard to access and poor coverage. There are folks in the past that have used deaths that occur in a hospital. So, you might see those come through in an EHR data set or claims data set. In our analysis, we’ve seen those be maybe 15 to 20% deaths in the country that you can see in that way. So, I think historically folks have said ‘Well I can get a little bit of death data. I’ll hope it’s enough. I’ll boil my cohort down to where I have some death data and look at that.’ But I think for the more advanced analytics that folks want to do and certainly for applications beyond some basic analytics those just simply weren’t good enough. So, I think what we’ve seen is a consistent need for a much more comprehensive mortality data set that can be appended on to real world data really unlock a lot more types of analyses. So that’s what we’ve done at Veritas. We spent the last three years really building out a comprehensive kind of data collection mechanism and data cleaning. We’re going out to 40 to 50,000 different sites every week. So that is funeral homes and obituaries and crematoriums, and you name it. We’re out there collecting all of this death data and bring it together and what that’s allowed us to do is build a data set that is about 90% coverage. It’s really hard to get to 100. But we think we’re at about 90, 90% plus. And we collect deaths within two weeks of the date of death. So, most of our 90% of our data is within two weeks of the date of death. So, it’s really timely and it’s really comprehensive. And now that allows our clients to do a lot of new interesting things with mortality that really previously weren’t available.
Arnaub: So, I mean you’re exposing some really important gaps within the industry at large. One is sort of the longitudinality, the time of capture, the latency. And I think what you’re saying is, you know, if you are not leveraging mortality data, you’re basically not creating sort of an expansive or all-encompassing real world data set.
Jason: I think that’s right.
Arnaub: Yeah, and that’s like an interesting challenge, right? Because having the right mortality data is one thing, but it sounds like getting a complete picture of what happened and sort of that end to end is something else entirely. And this is where I think, you know, we we’ve chatted a lot, you know, as both of our respective organizations and like how we think about bringing together Veritas’s depth with a network of data and tools around it.
Jason: Yep.
Arnaub: Just to kind of piggyback and maybe transition to another set of questions here. You mentioned, you know, how we think about fact of death and that’s one of the big differentiators for Veritas. How do you build a more complete and accurate record?
Jason: Yeah, great question. Real world data is messy, and death data is no exception. So, we think about fact of death as who died and when and where, right? As I said, we’re sourcing that from 40,000 different places. There is a lot that goes on behind the scenes to make that a usable record. So, you know the big differentiator is not just that we have a death record but is it any good? So, we spend a lot of time building transparency into our system around how we consolidate multiple records. We might get one from an obituary, one from a cemetery, one from social security, right? Are those the same person? How do we match them together? How do we derive what the best value is for the date of death, for that location of death? And so, we spent a lot of time adding extra data elements into our data set that give our clients an idea of what is the confidence in this record based on where we sourced it. Where did we source it from? Is it potentially a duplicate with another record? Right? So, there might be a typo in the date of death, and it looks like the same person, but they died one year apart. Well, which one we think is the right record? Are they potentially duplicates? Can we flag that for our customers to make the data more usable? Right? So, there’s providing data, but then there’s making that data usable and allowing clients the flexibility to decide for themselves. I like these records over this confidence threshold, but not ones below it or I like to de-duplicate it further or no I want to keep those separate. There’s a lot of data science folks are doing with our data after it leaves our hands, and we want to facilitate that as much as possible that you know they understand what our data is and where it comes from. And so, to the second point we’re really big believers in the FAIR principles and FDA’s emerging guidance on traceability and provenance of data. And we want to support that as much as possible. So, when we deliver data we’re actually delivering not just where this came from an obituary, you know, a government source, what have you, but you can actually for clients identified data click through and see the actual URL of the obituary that was the underlying source. And so all of our records are fully traceable back to where we got them. And I think that’s really important as people start to use this in real world evidence and combining it with other data that they are able to trace it back to an originating source. If FDA wants to audit it we can support all that. So it’s another key piece of when we deliver data it’s not just the data itself but it’s all that kind of supporting infrastructure. Last piece we’re working on is not just fact of death but also cause of death. So the fact that somebody died doesn’t mean that that person is necessarily relevant to your study. So if you’re doing a cancer survival curve and somebody dies for some other unrelated reason, you may decide you don’t want to include them in your cohort. So where do you go get the cause of death data to append to this? Cause of death is tricky. That’s a whole another problem area. Typically you can find it on death certificates. Those again are not accessible to most of us for commercial applications or at all. But there was also a study done in 2021, I think it was, where the cause of death list on a death certificate is wrong almost 40% of the time. It’s usually just picked off the chart before the person is discharged. And the person choosing that cause of death doesn’t necessarily have to be a physician. Every state is different. Could be all sorts of different folks that actually are writing out that death certificate. So, we take a different approach. We look backwards using real world data from a date of death. We go back three years and we basically do a chart review at scale. We look at all the ICD codes and procedure codes and things like that and assign weightings to all those different clinical events and say what do we think is the most likely cause of death, the secondary causes of death based on this clinical history so that our clients have an ability to kind of look at okay all these people died but which ones are the ones that died for a reason I’m studying and I want to include them versus not.
Arnaub: You’re picking on some things that I think have been kind of ever-present in our industry. One of them being what is sort of the bar for regulatory grade evidence right in our space. And you know the other one being traceability, auditability coming back to like you know a true source that you can validate and then go back to you know a regulator or a payer or some other body with kind of a source to that. I do think that that that is really important. It’s like a step change for how it’s been done before. And I guess once you have that level of completeness and you know kind of the longitudinal record it’s you know connecting it with all of these other data sources so that if you’re a researcher, if you’re in clinical trial design, whatever it is, you see the full story and not just sort of an isolated data point.
Jason: Right. And if there’s ever any oddities in the data, it’s a lot easier to trace back those might where they might come from if you have all this other supporting information.
Arnaub: That’s super cool. One thing I’ve been thinking a lot about is like the work that we’ve done together is really trying to kind of make a change in terms of how mortality data is not just available but it’s actionable, it’s reliable, it’s everything I was just saying and you were saying around like trust and how have we been driving that shared approach and what are things that are on your mind?
Jason: I think so we’ve talked about kind of the data being available, right? Just kind of as a you know the data set exists. I just talked about kind of the things that we add to it to make it more usable from kind of a data science point of view. But there I think there’s a shared view with between us and Datavant on you know there are other problems you also have to solve to make available and usable, right? So I think for instance, our data is being combined with clinical data very often. Clinical data has a lot of privacy, implications and concerns that you need to address. I think Datavant’s done a great job of building out a whole bunch of services and solutions to, understand what the privacy risk is of a data set, especially when you join them together. How do you de-identify those data sets and ensure that they stay de-identified? And because our data is kind of being used in that modality a lot of times we’re really kind of I think sharing your vision that that’s a key piece you have to address and we’re happy to be part of your ecosystem to do that. So, I think there’s that privacy piece that you have to understand. But then there’s also the fact that this is just sensitive data and we’re big believers as I think you are that you have to appreciate that even being de-identified data sometimes just can’t move from the environment that’s being worked on. And so, it’s really important to kind of help clients work with data where they are and where they’ve set up their own tools and processes. So, we’ve built out a very robust kind of data delivery engine that can move our data into wherever our clients want to be. And I think that’s a shared ethos with you guys. And I think that’s also why we’re excited to be an early adopter of methods to make mortality data more usable. Such as using the Datavant Connect, kind of solution that’s powered by the AWS clean rooms. So, I think being able to support your customers, in whichever environment they want to work within, making sure that their privacy and security procedures are able to be followed without a lot of friction. Those are really important additional elements to making your data usable.
Arnaub: Yeah. No, I definitely appreciate the call out there. You know, kind of like one of our main goals is to ensure that you know, our data partner ecosystem here can maintain control of their data, but so that it’s discoverable through you know our portal through Datavant Connect. You know and from our end and how we support you guys like, you know, I just want to make sure that you know, the partners we’re working with, especially Veritas, has made mortality data available and accessible for researchers you know to evaluate and link alongside the rest of the RWD community so you know it’s this mutual goal there.
Jason: Yeah, yeah that’s a key piece of discoverability and then usability.
Arnaub: Totally. Which kind of gets to like usability right? Like Do you think this is probably pretty meta in terms of a question, but it’s like is mortality data underutilized today and where do you see the upside, the opportunity moving forward? How do you kind of see your guys’ space changing?
Jason: Yeah, great question. I would say I think folks have gotten a pretty good appreciation for mortality data within health economics and outcomes research, because I think that’s where traditionally mortality has been an endpoint. I think with survival curves mortality is well pretty well understood. I think we just provide a better data input for a lot of that stuff. And then I think for risk modeling, I think we’re seeing a rise that especially with AI and people trying to build models around things. Mortality is an important input into a lot of the risk modeling work that folks are doing. So I would say within those use cases, I would say the industry as a whole has come up the curve and is pretty aware that this data is out there and starting to really incorporate it. I think there’s, you know, more that everybody can do and I think, we’ll see kind of as people get more creative with the modeling side, what’s possible there. Where I think there’s going to be, some real new applications for mortality data that people really haven’t considered, mostly because the data wasn’t there, is really moving into kind of the operational side. And so there’s a couple elements of that I think that we think about, clinical trials, where mortality may or may not be an official endpoint. But people do want to look at survival even after the trial closes. They want to look at maybe you know long-term survival. Going out and reconnecting all those participants and gathering you know current data about them is very expensive and slow. I think with trial tokenization the ability to append on mortality data to do those survival analyses really quickly and efficiently is something I think we’re going to see really start to happen over the next couple years. So, excited about kind of unlocking those things. I think there’s a lot of operational applications to patient support programs and other types of things where there’s a lot of patient outreach. Where our data can be used to basically continuously monitor the group of folks you’re trying to support. Those support programs are pretty expensive. You have often nurses calling up patients and trying to assist them. And if you can understand ‘well this patient is non-responsive because they might be deceased.’ You can not only see yourself a lot of time and energy trying to do outreach to them. But you can also prevent a lot of insensitive outreach that’s probably, you know, not appreciated by their family. And so I think we see a lot of applications where mortality data, especially when it’s timely like ours is built into some of those operational workflows to help people do things more efficiently and more sensitively. And we see that also in health systems and payers. UCLA did a study where 20% of their severely diseased patients in their EHR were actually deceased and they didn’t know it. They were holding all sorts of appointments open for them, filling prescriptions, ordering labs. And so that’s a that’s a strain on their health system, especially when folks are on big waiting list trying to get into those appointments. But again, it’s also has this insensitivity element that people want to really avoid. So, I think we’re going to see mortality move from kind of the traditional key survival world into clinical trials and patient support programs into operational kind of just outreach efforts. And then I think you know that’s I think mostly led by the analytics and then the operations side. But recently FDA just announced I think the push to have overall survival be part of all cancer studies kind of going forward. So, there’s draft guidance now where they are basically saying even if overall survival is not your primary endpoint, we want you to start thinking about how to measure that because that’s important from a safety perspective even if not from an efficacy standpoint. That could be a big burden on folks unless our mortality data can help them kind of measure that survival endpoint in a much more efficient way. So, I think we’ll see how the regulatory side might start to take advantage of real-world data like mortality and really use that as part of making the entire you know clinical research side come up with you know more usable results better results but hopefully not at a large burden to the industry.
Arnaub: Yeah I think you know you called out some interesting things and as I’m kind of looking through our audience here I’d love for you guys to add feedback and comments as we’re kind of going through this. So it’s not just a one-on-one between me and Jason here, but I know there’s a bunch of folks who have pretty particular opinions on some of the topics that you laid out. But, you know, one thing we’ve been thinking a lot about on the data side is you know, we’re now tokenizing you know, close to 400 clinical trials and, you know, these are going to have a necessity to connect, you know, to mortality data. Just because many of these are oncology trials, many of them are looking at, you know, long-term follow-up. We do want to track kind of specific mortality outcomes that are associated with you know trial success or trial failure. So I do think that’s a really important piece of the puzzle here is like as we’re trying to build you know holy grail end to end data sets like having that linkage and understanding of that endpoint is super important. I think the other really interesting call out you made was like this regulatory component. The fact that you know the FDA basically just said that this is you know getting date of death data from a nice to have to a must-have. Particularly for like a randomized oncology trial, it’s a bit of a game changer I think in sort of the validation and necessity of that data. You know in terms of how you look at effectiveness or safety you know efficacy of therapy. So what they’re saying if I’m if I’m reading you right is like this data point is now to be expected to be collected you know even if you’re not using it as the primary endpoint and then over time you know if you’re on the pharmaceutical side of a life sciences company, you have to push towards collecting this, you know, externally validated mortality data that is auditable, traceable, can be connected to an individual patient you know, in order to like satisfy certain regulatory requirements.
Jason: Right. I think that’s what we’re starting to see in some of this guidance which I think is smart overall. I think this is a way to improve these trials and make sure that we’re kind of keeping our eye on you know, what should be primary endpoints even if they’re difficult to measure. And I think that’s it’s an exciting term for the industry, to be able to think about other ways to collect that data and append it, rather than just, you know, my trial just got more expensive because now more data and more touch points I have to conduct to collect these things.
Arnaub: Yeah, totally. And then that that stat you mentioned on the UCLA side is crazy, right? Where not knowing where 20% of your ill patients are in that EHR I mean that that’s you know we’re you know dead or alive. It’s a really important kind of finding for the payer provider side of our industry too.
Jason: It is. And you know UCLA is a pretty advanced system, really good systems. They had folks really working on this problem which is why they did the study and published it. So you know if that’s what the good end of the spectrum like I can only imagine you know what the more busy systems or you know less well-resourced folks are doing.
Arnaub: Yeah totally. Well I think like you made a pretty strong case for kind of the expansiveness of where you’re seeing mortality data, you know, kind of go and, you know, it sounds like there’s going to be a whole number of different use cases that we hadn’t considered before, which I think is really fascinating. So, you know, it sounds like we’re moving in a direction here where expertise, data discoverability and data infrastructure, data linkage, and then the actual data itself, right? Like on the mortality side, you know, we’re going to open up different doors, right? And I think it’s sort of that triumvirate of, you know, people and technology and quality data that kind of brings it all together. So, you know I’ll definitely open up the floor here for questions you know as we’re starting to see you know what people think about this topic. I’ll take one from you know Medhi here who’s a wonderful researcher works you know within the clinical trial space. He’s wondering if there’s any formal validation of Veritas data against a gold standard like NDI, National Death Index.
Jason: Yeah. Great question. So, we chart our progress using, NDI’s kind of total stats. So, we, CDC publishes out, the stats of number of deaths by state, by age, by gender. And so you can really get a very nice granular view of how many deaths CDC is reporting across the country. And those are kind of continuously updated. So we’re continuously measuring, our volumes against their volumes to understand A. how many deaths are we finding overall, but B. do we have any kind of holes in in our data collection methodology. Are there spots in the country that we’re light? Are there certain demographic groups that we’re light on? And I think generally we find that we’re pretty well aligned. I think as you would imagine if we have 90% of deaths, you know, we don’t have any glaring holes. So that’s one validation we do is against kind of those total counts. We’ve done a number of client evaluations as you can imagine, right? Anytime folks are trying out data, sometimes they want to look at it. They may have in-house repositories of data that they’ve used in the past. So, we’ve done a number of those. Typically, folks are looking at A, coverage and B, timeliness. Those are pretty easy to measure. But the other thing that really trying to validate against is the congruity of our death dates versus the death dates they have internally. Because we do see some variability there. And so yeah, we’ve done a number of validations which basically say, you know, here are all the people we think are dead within your patient cohort, you have your internal records who you think is dead. Do we find the same people? When we find the same people, do we have the same dates of death and things like that? And typically what we find is we’ll have a little bit of a gap and often due more to a matching problem, right? Is this the same person? We have different first names, right? One’s a nickname, one’s the full name. So, we’ll find a little bit of difference in in who we identify as being deceased. Sometimes we’re more comprehensive, so we’ll find additional folks that they don’t have. But I would say our alignment on matching is well within kind of parameters what you expect to see is false positives and false negatives. So, we typically find the same folks if we’ve got the same folks. And then we have very high congruity on the date of death. I think that’s what we found to be very commonly true. So those are the types of validations we’ve done. We are working on getting validation studies done on the cause of death. You can imagine that one’s a little more fraught. That one is a bit tricky because as I said NDI cause of death data coming from death certificates is not actually a gold standard right, we know there’s a lot of errors so what do you compare against? So what we’re working on now is basically a comparison of our algorithmic approach to determine cause of death in medical records to actually what a physician chart review would be by a human. We think that’s probably the best comparator of you know being a gold standard is an actual doctor sitting down a chart review. So that don’t have results for that yet but that is the way we’re thinking about approaching validating cause of death.
Arnaub: That’s very super thorough and makes a ton of sense. Yeah. Opening up to anybody else. You know, but if we don’t have any other questions at this point, you know, like you can certainly reach out to me or Jason, you know, whenever on LinkedIn. More than happy to keep kind of continuing this conversation. Like I mentioned, you know, this is the first of a number of different LinkedIn Live series that, you know, Datavant is hosting. And we’re so honored to, you know, have Jason here, longtime friend, partner you know, at Datavant and the work that he’s doing with Veritas is an amazing example of I think what happens when you put together partners who can come across an entire data network. You bring together unique data assets and you have kind of tech infrastructure people you know putting it all together and kind of activating at scale. So really appreciate the conversation today, Jason. We should do it again soon.
Jason: Yeah absolutely. Thank you for having me.
Arnaub: All right. Thanks all.
LinkedIn Live Transcript
Arnaub: Okay. Hi everyone. So, I’m very excited. My name is Arnaub Chatterjee. I’m the President and GM of our Life Sciences business here at Datavant. By way of background, Datavant is focused on facilitating the secure sharing of real-world data across a network of more than 350 data partners within our ecosystem and we’re here to support the transformation of life sciences research and improve patient outcomes. This is a particularly exciting moment for us because we’re actually kicking off our first LinkedIn Live conversation as part of a new series in which I get the chance to chat with our real-world data partners across this data network and we want to hear what’s top of mind for them at this very moment within our industry. How they might be thinking about some of the key strategies within sciences and how the collaborations are actually unlocking new and more efficient ways to answer these all-important research questions. So, I’m actually going to turn it over to our special guest today Jason to introduce himself.
Jason: Thanks, Arnaub. So, hi everyone. I’m Jason LaBonte. I’m the CEO of Veritas Data Research. We focus on collecting, cleaning and delivering contextual data sets to augment traditional transactional real-world data like claims, EHRs and lab data and basically provide missing variables that we think are critical for expanding what the analytics are you can do with real world data. And we started with mortality data, and I think that’s what we’re going to talk about today.
Arnaub: Yeah. Amazing. So, let’s get into it. I think it’s a really interesting time we’re in and there’s a moment where the sensitivity and the value of data is at an all-time high. So, we know this. Like most of the data that we connect is particularly sensitive. But then you have mortality data which is an entirely different category of its own. And earlier this year Veritas and Datavant actually expanded our working relationship and our partnership so that more organizations can better discover or evaluate or link to mortality data in a privacy compliant way. And we think that’s going to ultimately empower the healthcare research world, the life sciences organizations to answer really critical questions with much more confidence. So, Jason and the Veritas team have been building what I think is the most complete and accurate mortality data set that’s available today. And we’re going to explore why this matters. And you know I think as part of the first question I kind of had for you, Jason, is you know what I guess how has the role and the perception of mortality data changed in recent years and where are you seeing sort of the biggest unmet needs today?
Jason: Yeah, great question. So, you know mortality is interesting. You’d think this would be, you know, if you had to pick one endpoint for healthcare analytics, you know, mortality might be it. But actually traditionally, it’s been pretty hard to access it as an endpoint, in terms of, you know, third party data. So, folks have used in the past Social Security Administration’s Death Master File. Up until about 2010, that was a pretty comprehensive file, had about 80, 85% of the deaths in the country. But after that, states were allowed to opt out. And so that file today is probably only 15% deaths in the country. CDC has a good, comprehensive research grade death data set called the National Death Index, but it’s not commercially available. You’re not allowed to use it for commercial analytics. And so, it’s really not that useful for a lot of things that we in industry want to do with this data. The same thing is true with the states. Some states have death data available, some don’t, but there are a lot of variable restrictions on how you can use that data. So, the landscape for kind of traditional, you know, government sourced death data has been really hard to access and poor coverage. There are folks in the past that have used deaths that occur in a hospital. So, you might see those come through in an EHR data set or claims data set. In our analysis, we’ve seen those be maybe 15 to 20% deaths in the country that you can see in that way. So, I think historically folks have said ‘Well I can get a little bit of death data. I’ll hope it’s enough. I’ll boil my cohort down to where I have some death data and look at that.’ But I think for the more advanced analytics that folks want to do and certainly for applications beyond some basic analytics those just simply weren’t good enough. So, I think what we’ve seen is a consistent need for a much more comprehensive mortality data set that can be appended on to real world data really unlock a lot more types of analyses. So that’s what we’ve done at Veritas. We spent the last three years really building out a comprehensive kind of data collection mechanism and data cleaning. We’re going out to 40 to 50,000 different sites every week. So that is funeral homes and obituaries and crematoriums, and you name it. We’re out there collecting all of this death data and bring it together and what that’s allowed us to do is build a data set that is about 90% coverage. It’s really hard to get to 100. But we think we’re at about 90, 90% plus. And we collect deaths within two weeks of the date of death. So, most of our 90% of our data is within two weeks of the date of death. So, it’s really timely and it’s really comprehensive. And now that allows our clients to do a lot of new interesting things with mortality that really previously weren’t available.
Arnaub: So, I mean you’re exposing some really important gaps within the industry at large. One is sort of the longitudinality, the time of capture, the latency. And I think what you’re saying is, you know, if you are not leveraging mortality data, you’re basically not creating sort of an expansive or all-encompassing real world data set.
Jason: I think that’s right.
Arnaub: Yeah, and that’s like an interesting challenge, right? Because having the right mortality data is one thing, but it sounds like getting a complete picture of what happened and sort of that end to end is something else entirely. And this is where I think, you know, we we’ve chatted a lot, you know, as both of our respective organizations and like how we think about bringing together Veritas’s depth with a network of data and tools around it.
Jason: Yep.
Arnaub: Just to kind of piggyback and maybe transition to another set of questions here. You mentioned, you know, how we think about fact of death and that’s one of the big differentiators for Veritas. How do you build a more complete and accurate record?
Jason: Yeah, great question. Real world data is messy, and death data is no exception. So, we think about fact of death as who died and when and where, right? As I said, we’re sourcing that from 40,000 different places. There is a lot that goes on behind the scenes to make that a usable record. So, you know the big differentiator is not just that we have a death record but is it any good? So, we spend a lot of time building transparency into our system around how we consolidate multiple records. We might get one from an obituary, one from a cemetery, one from social security, right? Are those the same person? How do we match them together? How do we derive what the best value is for the date of death, for that location of death? And so, we spent a lot of time adding extra data elements into our data set that give our clients an idea of what is the confidence in this record based on where we sourced it. Where did we source it from? Is it potentially a duplicate with another record? Right? So, there might be a typo in the date of death, and it looks like the same person, but they died one year apart. Well, which one we think is the right record? Are they potentially duplicates? Can we flag that for our customers to make the data more usable? Right? So, there’s providing data, but then there’s making that data usable and allowing clients the flexibility to decide for themselves. I like these records over this confidence threshold, but not ones below it or I like to de-duplicate it further or no I want to keep those separate. There’s a lot of data science folks are doing with our data after it leaves our hands, and we want to facilitate that as much as possible that you know they understand what our data is and where it comes from. And so, to the second point we’re really big believers in the FAIR principles and FDA’s emerging guidance on traceability and provenance of data. And we want to support that as much as possible. So, when we deliver data we’re actually delivering not just where this came from an obituary, you know, a government source, what have you, but you can actually for clients identified data click through and see the actual URL of the obituary that was the underlying source. And so all of our records are fully traceable back to where we got them. And I think that’s really important as people start to use this in real world evidence and combining it with other data that they are able to trace it back to an originating source. If FDA wants to audit it we can support all that. So it’s another key piece of when we deliver data it’s not just the data itself but it’s all that kind of supporting infrastructure. Last piece we’re working on is not just fact of death but also cause of death. So the fact that somebody died doesn’t mean that that person is necessarily relevant to your study. So if you’re doing a cancer survival curve and somebody dies for some other unrelated reason, you may decide you don’t want to include them in your cohort. So where do you go get the cause of death data to append to this? Cause of death is tricky. That’s a whole another problem area. Typically you can find it on death certificates. Those again are not accessible to most of us for commercial applications or at all. But there was also a study done in 2021, I think it was, where the cause of death list on a death certificate is wrong almost 40% of the time. It’s usually just picked off the chart before the person is discharged. And the person choosing that cause of death doesn’t necessarily have to be a physician. Every state is different. Could be all sorts of different folks that actually are writing out that death certificate. So, we take a different approach. We look backwards using real world data from a date of death. We go back three years and we basically do a chart review at scale. We look at all the ICD codes and procedure codes and things like that and assign weightings to all those different clinical events and say what do we think is the most likely cause of death, the secondary causes of death based on this clinical history so that our clients have an ability to kind of look at okay all these people died but which ones are the ones that died for a reason I’m studying and I want to include them versus not.
Arnaub: You’re picking on some things that I think have been kind of ever-present in our industry. One of them being what is sort of the bar for regulatory grade evidence right in our space. And you know the other one being traceability, auditability coming back to like you know a true source that you can validate and then go back to you know a regulator or a payer or some other body with kind of a source to that. I do think that that that is really important. It’s like a step change for how it’s been done before. And I guess once you have that level of completeness and you know kind of the longitudinal record it’s you know connecting it with all of these other data sources so that if you’re a researcher, if you’re in clinical trial design, whatever it is, you see the full story and not just sort of an isolated data point.
Jason: Right. And if there’s ever any oddities in the data, it’s a lot easier to trace back those might where they might come from if you have all this other supporting information.
Arnaub: That’s super cool. One thing I’ve been thinking a lot about is like the work that we’ve done together is really trying to kind of make a change in terms of how mortality data is not just available but it’s actionable, it’s reliable, it’s everything I was just saying and you were saying around like trust and how have we been driving that shared approach and what are things that are on your mind?
Jason: I think so we’ve talked about kind of the data being available, right? Just kind of as a you know the data set exists. I just talked about kind of the things that we add to it to make it more usable from kind of a data science point of view. But there I think there’s a shared view with between us and Datavant on you know there are other problems you also have to solve to make available and usable, right? So I think for instance, our data is being combined with clinical data very often. Clinical data has a lot of privacy, implications and concerns that you need to address. I think Datavant’s done a great job of building out a whole bunch of services and solutions to, understand what the privacy risk is of a data set, especially when you join them together. How do you de-identify those data sets and ensure that they stay de-identified? And because our data is kind of being used in that modality a lot of times we’re really kind of I think sharing your vision that that’s a key piece you have to address and we’re happy to be part of your ecosystem to do that. So, I think there’s that privacy piece that you have to understand. But then there’s also the fact that this is just sensitive data and we’re big believers as I think you are that you have to appreciate that even being de-identified data sometimes just can’t move from the environment that’s being worked on. And so, it’s really important to kind of help clients work with data where they are and where they’ve set up their own tools and processes. So, we’ve built out a very robust kind of data delivery engine that can move our data into wherever our clients want to be. And I think that’s a shared ethos with you guys. And I think that’s also why we’re excited to be an early adopter of methods to make mortality data more usable. Such as using the Datavant Connect, kind of solution that’s powered by the AWS clean rooms. So, I think being able to support your customers, in whichever environment they want to work within, making sure that their privacy and security procedures are able to be followed without a lot of friction. Those are really important additional elements to making your data usable.
Arnaub: Yeah. No, I definitely appreciate the call out there. You know, kind of like one of our main goals is to ensure that you know, our data partner ecosystem here can maintain control of their data, but so that it’s discoverable through you know our portal through Datavant Connect. You know and from our end and how we support you guys like, you know, I just want to make sure that you know, the partners we’re working with, especially Veritas, has made mortality data available and accessible for researchers you know to evaluate and link alongside the rest of the RWD community so you know it’s this mutual goal there.
Jason: Yeah, yeah that’s a key piece of discoverability and then usability.
Arnaub: Totally. Which kind of gets to like usability right? Like Do you think this is probably pretty meta in terms of a question, but it’s like is mortality data underutilized today and where do you see the upside, the opportunity moving forward? How do you kind of see your guys’ space changing?
Jason: Yeah, great question. I would say I think folks have gotten a pretty good appreciation for mortality data within health economics and outcomes research, because I think that’s where traditionally mortality has been an endpoint. I think with survival curves mortality is well pretty well understood. I think we just provide a better data input for a lot of that stuff. And then I think for risk modeling, I think we’re seeing a rise that especially with AI and people trying to build models around things. Mortality is an important input into a lot of the risk modeling work that folks are doing. So I would say within those use cases, I would say the industry as a whole has come up the curve and is pretty aware that this data is out there and starting to really incorporate it. I think there’s, you know, more that everybody can do and I think, we’ll see kind of as people get more creative with the modeling side, what’s possible there. Where I think there’s going to be, some real new applications for mortality data that people really haven’t considered, mostly because the data wasn’t there, is really moving into kind of the operational side. And so there’s a couple elements of that I think that we think about, clinical trials, where mortality may or may not be an official endpoint. But people do want to look at survival even after the trial closes. They want to look at maybe you know long-term survival. Going out and reconnecting all those participants and gathering you know current data about them is very expensive and slow. I think with trial tokenization the ability to append on mortality data to do those survival analyses really quickly and efficiently is something I think we’re going to see really start to happen over the next couple years. So, excited about kind of unlocking those things. I think there’s a lot of operational applications to patient support programs and other types of things where there’s a lot of patient outreach. Where our data can be used to basically continuously monitor the group of folks you’re trying to support. Those support programs are pretty expensive. You have often nurses calling up patients and trying to assist them. And if you can understand ‘well this patient is non-responsive because they might be deceased.’ You can not only see yourself a lot of time and energy trying to do outreach to them. But you can also prevent a lot of insensitive outreach that’s probably, you know, not appreciated by their family. And so I think we see a lot of applications where mortality data, especially when it’s timely like ours is built into some of those operational workflows to help people do things more efficiently and more sensitively. And we see that also in health systems and payers. UCLA did a study where 20% of their severely diseased patients in their EHR were actually deceased and they didn’t know it. They were holding all sorts of appointments open for them, filling prescriptions, ordering labs. And so that’s a that’s a strain on their health system, especially when folks are on big waiting list trying to get into those appointments. But again, it’s also has this insensitivity element that people want to really avoid. So, I think we’re going to see mortality move from kind of the traditional key survival world into clinical trials and patient support programs into operational kind of just outreach efforts. And then I think you know that’s I think mostly led by the analytics and then the operations side. But recently FDA just announced I think the push to have overall survival be part of all cancer studies kind of going forward. So, there’s draft guidance now where they are basically saying even if overall survival is not your primary endpoint, we want you to start thinking about how to measure that because that’s important from a safety perspective even if not from an efficacy standpoint. That could be a big burden on folks unless our mortality data can help them kind of measure that survival endpoint in a much more efficient way. So, I think we’ll see how the regulatory side might start to take advantage of real-world data like mortality and really use that as part of making the entire you know clinical research side come up with you know more usable results better results but hopefully not at a large burden to the industry.
Arnaub: Yeah I think you know you called out some interesting things and as I’m kind of looking through our audience here I’d love for you guys to add feedback and comments as we’re kind of going through this. So it’s not just a one-on-one between me and Jason here, but I know there’s a bunch of folks who have pretty particular opinions on some of the topics that you laid out. But, you know, one thing we’ve been thinking a lot about on the data side is you know, we’re now tokenizing you know, close to 400 clinical trials and, you know, these are going to have a necessity to connect, you know, to mortality data. Just because many of these are oncology trials, many of them are looking at, you know, long-term follow-up. We do want to track kind of specific mortality outcomes that are associated with you know trial success or trial failure. So I do think that’s a really important piece of the puzzle here is like as we’re trying to build you know holy grail end to end data sets like having that linkage and understanding of that endpoint is super important. I think the other really interesting call out you made was like this regulatory component. The fact that you know the FDA basically just said that this is you know getting date of death data from a nice to have to a must-have. Particularly for like a randomized oncology trial, it’s a bit of a game changer I think in sort of the validation and necessity of that data. You know in terms of how you look at effectiveness or safety you know efficacy of therapy. So what they’re saying if I’m if I’m reading you right is like this data point is now to be expected to be collected you know even if you’re not using it as the primary endpoint and then over time you know if you’re on the pharmaceutical side of a life sciences company, you have to push towards collecting this, you know, externally validated mortality data that is auditable, traceable, can be connected to an individual patient you know, in order to like satisfy certain regulatory requirements.
Jason: Right. I think that’s what we’re starting to see in some of this guidance which I think is smart overall. I think this is a way to improve these trials and make sure that we’re kind of keeping our eye on you know, what should be primary endpoints even if they’re difficult to measure. And I think that’s it’s an exciting term for the industry, to be able to think about other ways to collect that data and append it, rather than just, you know, my trial just got more expensive because now more data and more touch points I have to conduct to collect these things.
Arnaub: Yeah, totally. And then that that stat you mentioned on the UCLA side is crazy, right? Where not knowing where 20% of your ill patients are in that EHR I mean that that’s you know we’re you know dead or alive. It’s a really important kind of finding for the payer provider side of our industry too.
Jason: It is. And you know UCLA is a pretty advanced system, really good systems. They had folks really working on this problem which is why they did the study and published it. So you know if that’s what the good end of the spectrum like I can only imagine you know what the more busy systems or you know less well-resourced folks are doing.
Arnaub: Yeah totally. Well I think like you made a pretty strong case for kind of the expansiveness of where you’re seeing mortality data, you know, kind of go and, you know, it sounds like there’s going to be a whole number of different use cases that we hadn’t considered before, which I think is really fascinating. So, you know, it sounds like we’re moving in a direction here where expertise, data discoverability and data infrastructure, data linkage, and then the actual data itself, right? Like on the mortality side, you know, we’re going to open up different doors, right? And I think it’s sort of that triumvirate of, you know, people and technology and quality data that kind of brings it all together. So, you know I’ll definitely open up the floor here for questions you know as we’re starting to see you know what people think about this topic. I’ll take one from you know Medhi here who’s a wonderful researcher works you know within the clinical trial space. He’s wondering if there’s any formal validation of Veritas data against a gold standard like NDI, National Death Index.
Jason: Yeah. Great question. So, we chart our progress using, NDI’s kind of total stats. So, we, CDC publishes out, the stats of number of deaths by state, by age, by gender. And so you can really get a very nice granular view of how many deaths CDC is reporting across the country. And those are kind of continuously updated. So we’re continuously measuring, our volumes against their volumes to understand A. how many deaths are we finding overall, but B. do we have any kind of holes in in our data collection methodology. Are there spots in the country that we’re light? Are there certain demographic groups that we’re light on? And I think generally we find that we’re pretty well aligned. I think as you would imagine if we have 90% of deaths, you know, we don’t have any glaring holes. So that’s one validation we do is against kind of those total counts. We’ve done a number of client evaluations as you can imagine, right? Anytime folks are trying out data, sometimes they want to look at it. They may have in-house repositories of data that they’ve used in the past. So, we’ve done a number of those. Typically, folks are looking at A, coverage and B, timeliness. Those are pretty easy to measure. But the other thing that really trying to validate against is the congruity of our death dates versus the death dates they have internally. Because we do see some variability there. And so yeah, we’ve done a number of validations which basically say, you know, here are all the people we think are dead within your patient cohort, you have your internal records who you think is dead. Do we find the same people? When we find the same people, do we have the same dates of death and things like that? And typically what we find is we’ll have a little bit of a gap and often due more to a matching problem, right? Is this the same person? We have different first names, right? One’s a nickname, one’s the full name. So, we’ll find a little bit of difference in in who we identify as being deceased. Sometimes we’re more comprehensive, so we’ll find additional folks that they don’t have. But I would say our alignment on matching is well within kind of parameters what you expect to see is false positives and false negatives. So, we typically find the same folks if we’ve got the same folks. And then we have very high congruity on the date of death. I think that’s what we found to be very commonly true. So those are the types of validations we’ve done. We are working on getting validation studies done on the cause of death. You can imagine that one’s a little more fraught. That one is a bit tricky because as I said NDI cause of death data coming from death certificates is not actually a gold standard right, we know there’s a lot of errors so what do you compare against? So what we’re working on now is basically a comparison of our algorithmic approach to determine cause of death in medical records to actually what a physician chart review would be by a human. We think that’s probably the best comparator of you know being a gold standard is an actual doctor sitting down a chart review. So that don’t have results for that yet but that is the way we’re thinking about approaching validating cause of death.
Arnaub: That’s very super thorough and makes a ton of sense. Yeah. Opening up to anybody else. You know, but if we don’t have any other questions at this point, you know, like you can certainly reach out to me or Jason, you know, whenever on LinkedIn. More than happy to keep kind of continuing this conversation. Like I mentioned, you know, this is the first of a number of different LinkedIn Live series that, you know, Datavant is hosting. And we’re so honored to, you know, have Jason here, longtime friend, partner you know, at Datavant and the work that he’s doing with Veritas is an amazing example of I think what happens when you put together partners who can come across an entire data network. You bring together unique data assets and you have kind of tech infrastructure people you know putting it all together and kind of activating at scale. So really appreciate the conversation today, Jason. We should do it again soon.
Jason: Yeah absolutely. Thank you for having me.
Arnaub: All right. Thanks all.
Arnaub: Okay. Hi everyone. So, I’m very excited. My name is Arnaub Chatterjee. I’m the President and GM of our Life Sciences business here at Datavant. By way of background, Datavant is focused on facilitating the secure sharing of real-world data across a network of more than 350 data partners within our ecosystem and we’re here to support the transformation of life sciences research and improve patient outcomes. This is a particularly exciting moment for us because we’re actually kicking off our first LinkedIn Live conversation as part of a new series in which I get the chance to chat with our real-world data partners across this data network and we want to hear what’s top of mind for them at this very moment within our industry. How they might be thinking about some of the key strategies within sciences and how the collaborations are actually unlocking new and more efficient ways to answer these all-important research questions. So, I’m actually going to turn it over to our special guest today Jason to introduce himself.
Jason: Thanks, Arnaub. So, hi everyone. I’m Jason LaBonte. I’m the CEO of Veritas Data Research. We focus on collecting, cleaning and delivering contextual data sets to augment traditional transactional real-world data like claims, EHRs and lab data and basically provide missing variables that we think are critical for expanding what the analytics are you can do with real world data. And we started with mortality data, and I think that’s what we’re going to talk about today.
Arnaub: Yeah. Amazing. So, let’s get into it. I think it’s a really interesting time we’re in and there’s a moment where the sensitivity and the value of data is at an all-time high. So, we know this. Like most of the data that we connect is particularly sensitive. But then you have mortality data which is an entirely different category of its own. And earlier this year Veritas and Datavant actually expanded our working relationship and our partnership so that more organizations can better discover or evaluate or link to mortality data in a privacy compliant way. And we think that’s going to ultimately empower the healthcare research world, the life sciences organizations to answer really critical questions with much more confidence. So, Jason and the Veritas team have been building what I think is the most complete and accurate mortality data set that’s available today. And we’re going to explore why this matters. And you know I think as part of the first question I kind of had for you, Jason, is you know what I guess how has the role and the perception of mortality data changed in recent years and where are you seeing sort of the biggest unmet needs today?
Jason: Yeah, great question. So, you know mortality is interesting. You’d think this would be, you know, if you had to pick one endpoint for healthcare analytics, you know, mortality might be it. But actually traditionally, it’s been pretty hard to access it as an endpoint, in terms of, you know, third party data. So, folks have used in the past Social Security Administration’s Death Master File. Up until about 2010, that was a pretty comprehensive file, had about 80, 85% of the deaths in the country. But after that, states were allowed to opt out. And so that file today is probably only 15% deaths in the country. CDC has a good, comprehensive research grade death data set called the National Death Index, but it’s not commercially available. You’re not allowed to use it for commercial analytics. And so, it’s really not that useful for a lot of things that we in industry want to do with this data. The same thing is true with the states. Some states have death data available, some don’t, but there are a lot of variable restrictions on how you can use that data. So, the landscape for kind of traditional, you know, government sourced death data has been really hard to access and poor coverage. There are folks in the past that have used deaths that occur in a hospital. So, you might see those come through in an EHR data set or claims data set. In our analysis, we’ve seen those be maybe 15 to 20% deaths in the country that you can see in that way. So, I think historically folks have said ‘Well I can get a little bit of death data. I’ll hope it’s enough. I’ll boil my cohort down to where I have some death data and look at that.’ But I think for the more advanced analytics that folks want to do and certainly for applications beyond some basic analytics those just simply weren’t good enough. So, I think what we’ve seen is a consistent need for a much more comprehensive mortality data set that can be appended on to real world data really unlock a lot more types of analyses. So that’s what we’ve done at Veritas. We spent the last three years really building out a comprehensive kind of data collection mechanism and data cleaning. We’re going out to 40 to 50,000 different sites every week. So that is funeral homes and obituaries and crematoriums, and you name it. We’re out there collecting all of this death data and bring it together and what that’s allowed us to do is build a data set that is about 90% coverage. It’s really hard to get to 100. But we think we’re at about 90, 90% plus. And we collect deaths within two weeks of the date of death. So, most of our 90% of our data is within two weeks of the date of death. So, it’s really timely and it’s really comprehensive. And now that allows our clients to do a lot of new interesting things with mortality that really previously weren’t available.
Arnaub: So, I mean you’re exposing some really important gaps within the industry at large. One is sort of the longitudinality, the time of capture, the latency. And I think what you’re saying is, you know, if you are not leveraging mortality data, you’re basically not creating sort of an expansive or all-encompassing real world data set.
Jason: I think that’s right.
Arnaub: Yeah, and that’s like an interesting challenge, right? Because having the right mortality data is one thing, but it sounds like getting a complete picture of what happened and sort of that end to end is something else entirely. And this is where I think, you know, we we’ve chatted a lot, you know, as both of our respective organizations and like how we think about bringing together Veritas’s depth with a network of data and tools around it.
Jason: Yep.
Arnaub: Just to kind of piggyback and maybe transition to another set of questions here. You mentioned, you know, how we think about fact of death and that’s one of the big differentiators for Veritas. How do you build a more complete and accurate record?
Jason: Yeah, great question. Real world data is messy, and death data is no exception. So, we think about fact of death as who died and when and where, right? As I said, we’re sourcing that from 40,000 different places. There is a lot that goes on behind the scenes to make that a usable record. So, you know the big differentiator is not just that we have a death record but is it any good? So, we spend a lot of time building transparency into our system around how we consolidate multiple records. We might get one from an obituary, one from a cemetery, one from social security, right? Are those the same person? How do we match them together? How do we derive what the best value is for the date of death, for that location of death? And so, we spent a lot of time adding extra data elements into our data set that give our clients an idea of what is the confidence in this record based on where we sourced it. Where did we source it from? Is it potentially a duplicate with another record? Right? So, there might be a typo in the date of death, and it looks like the same person, but they died one year apart. Well, which one we think is the right record? Are they potentially duplicates? Can we flag that for our customers to make the data more usable? Right? So, there’s providing data, but then there’s making that data usable and allowing clients the flexibility to decide for themselves. I like these records over this confidence threshold, but not ones below it or I like to de-duplicate it further or no I want to keep those separate. There’s a lot of data science folks are doing with our data after it leaves our hands, and we want to facilitate that as much as possible that you know they understand what our data is and where it comes from. And so, to the second point we’re really big believers in the FAIR principles and FDA’s emerging guidance on traceability and provenance of data. And we want to support that as much as possible. So, when we deliver data we’re actually delivering not just where this came from an obituary, you know, a government source, what have you, but you can actually for clients identified data click through and see the actual URL of the obituary that was the underlying source. And so all of our records are fully traceable back to where we got them. And I think that’s really important as people start to use this in real world evidence and combining it with other data that they are able to trace it back to an originating source. If FDA wants to audit it we can support all that. So it’s another key piece of when we deliver data it’s not just the data itself but it’s all that kind of supporting infrastructure. Last piece we’re working on is not just fact of death but also cause of death. So the fact that somebody died doesn’t mean that that person is necessarily relevant to your study. So if you’re doing a cancer survival curve and somebody dies for some other unrelated reason, you may decide you don’t want to include them in your cohort. So where do you go get the cause of death data to append to this? Cause of death is tricky. That’s a whole another problem area. Typically you can find it on death certificates. Those again are not accessible to most of us for commercial applications or at all. But there was also a study done in 2021, I think it was, where the cause of death list on a death certificate is wrong almost 40% of the time. It’s usually just picked off the chart before the person is discharged. And the person choosing that cause of death doesn’t necessarily have to be a physician. Every state is different. Could be all sorts of different folks that actually are writing out that death certificate. So, we take a different approach. We look backwards using real world data from a date of death. We go back three years and we basically do a chart review at scale. We look at all the ICD codes and procedure codes and things like that and assign weightings to all those different clinical events and say what do we think is the most likely cause of death, the secondary causes of death based on this clinical history so that our clients have an ability to kind of look at okay all these people died but which ones are the ones that died for a reason I’m studying and I want to include them versus not.
Arnaub: You’re picking on some things that I think have been kind of ever-present in our industry. One of them being what is sort of the bar for regulatory grade evidence right in our space. And you know the other one being traceability, auditability coming back to like you know a true source that you can validate and then go back to you know a regulator or a payer or some other body with kind of a source to that. I do think that that that is really important. It’s like a step change for how it’s been done before. And I guess once you have that level of completeness and you know kind of the longitudinal record it’s you know connecting it with all of these other data sources so that if you’re a researcher, if you’re in clinical trial design, whatever it is, you see the full story and not just sort of an isolated data point.
Jason: Right. And if there’s ever any oddities in the data, it’s a lot easier to trace back those might where they might come from if you have all this other supporting information.
Arnaub: That’s super cool. One thing I’ve been thinking a lot about is like the work that we’ve done together is really trying to kind of make a change in terms of how mortality data is not just available but it’s actionable, it’s reliable, it’s everything I was just saying and you were saying around like trust and how have we been driving that shared approach and what are things that are on your mind?
Jason: I think so we’ve talked about kind of the data being available, right? Just kind of as a you know the data set exists. I just talked about kind of the things that we add to it to make it more usable from kind of a data science point of view. But there I think there’s a shared view with between us and Datavant on you know there are other problems you also have to solve to make available and usable, right? So I think for instance, our data is being combined with clinical data very often. Clinical data has a lot of privacy, implications and concerns that you need to address. I think Datavant’s done a great job of building out a whole bunch of services and solutions to, understand what the privacy risk is of a data set, especially when you join them together. How do you de-identify those data sets and ensure that they stay de-identified? And because our data is kind of being used in that modality a lot of times we’re really kind of I think sharing your vision that that’s a key piece you have to address and we’re happy to be part of your ecosystem to do that. So, I think there’s that privacy piece that you have to understand. But then there’s also the fact that this is just sensitive data and we’re big believers as I think you are that you have to appreciate that even being de-identified data sometimes just can’t move from the environment that’s being worked on. And so, it’s really important to kind of help clients work with data where they are and where they’ve set up their own tools and processes. So, we’ve built out a very robust kind of data delivery engine that can move our data into wherever our clients want to be. And I think that’s a shared ethos with you guys. And I think that’s also why we’re excited to be an early adopter of methods to make mortality data more usable. Such as using the Datavant Connect, kind of solution that’s powered by the AWS clean rooms. So, I think being able to support your customers, in whichever environment they want to work within, making sure that their privacy and security procedures are able to be followed without a lot of friction. Those are really important additional elements to making your data usable.
Arnaub: Yeah. No, I definitely appreciate the call out there. You know, kind of like one of our main goals is to ensure that you know, our data partner ecosystem here can maintain control of their data, but so that it’s discoverable through you know our portal through Datavant Connect. You know and from our end and how we support you guys like, you know, I just want to make sure that you know, the partners we’re working with, especially Veritas, has made mortality data available and accessible for researchers you know to evaluate and link alongside the rest of the RWD community so you know it’s this mutual goal there.
Jason: Yeah, yeah that’s a key piece of discoverability and then usability.
Arnaub: Totally. Which kind of gets to like usability right? Like Do you think this is probably pretty meta in terms of a question, but it’s like is mortality data underutilized today and where do you see the upside, the opportunity moving forward? How do you kind of see your guys’ space changing?
Jason: Yeah, great question. I would say I think folks have gotten a pretty good appreciation for mortality data within health economics and outcomes research, because I think that’s where traditionally mortality has been an endpoint. I think with survival curves mortality is well pretty well understood. I think we just provide a better data input for a lot of that stuff. And then I think for risk modeling, I think we’re seeing a rise that especially with AI and people trying to build models around things. Mortality is an important input into a lot of the risk modeling work that folks are doing. So I would say within those use cases, I would say the industry as a whole has come up the curve and is pretty aware that this data is out there and starting to really incorporate it. I think there’s, you know, more that everybody can do and I think, we’ll see kind of as people get more creative with the modeling side, what’s possible there. Where I think there’s going to be, some real new applications for mortality data that people really haven’t considered, mostly because the data wasn’t there, is really moving into kind of the operational side. And so there’s a couple elements of that I think that we think about, clinical trials, where mortality may or may not be an official endpoint. But people do want to look at survival even after the trial closes. They want to look at maybe you know long-term survival. Going out and reconnecting all those participants and gathering you know current data about them is very expensive and slow. I think with trial tokenization the ability to append on mortality data to do those survival analyses really quickly and efficiently is something I think we’re going to see really start to happen over the next couple years. So, excited about kind of unlocking those things. I think there’s a lot of operational applications to patient support programs and other types of things where there’s a lot of patient outreach. Where our data can be used to basically continuously monitor the group of folks you’re trying to support. Those support programs are pretty expensive. You have often nurses calling up patients and trying to assist them. And if you can understand ‘well this patient is non-responsive because they might be deceased.’ You can not only see yourself a lot of time and energy trying to do outreach to them. But you can also prevent a lot of insensitive outreach that’s probably, you know, not appreciated by their family. And so I think we see a lot of applications where mortality data, especially when it’s timely like ours is built into some of those operational workflows to help people do things more efficiently and more sensitively. And we see that also in health systems and payers. UCLA did a study where 20% of their severely diseased patients in their EHR were actually deceased and they didn’t know it. They were holding all sorts of appointments open for them, filling prescriptions, ordering labs. And so that’s a that’s a strain on their health system, especially when folks are on big waiting list trying to get into those appointments. But again, it’s also has this insensitivity element that people want to really avoid. So, I think we’re going to see mortality move from kind of the traditional key survival world into clinical trials and patient support programs into operational kind of just outreach efforts. And then I think you know that’s I think mostly led by the analytics and then the operations side. But recently FDA just announced I think the push to have overall survival be part of all cancer studies kind of going forward. So, there’s draft guidance now where they are basically saying even if overall survival is not your primary endpoint, we want you to start thinking about how to measure that because that’s important from a safety perspective even if not from an efficacy standpoint. That could be a big burden on folks unless our mortality data can help them kind of measure that survival endpoint in a much more efficient way. So, I think we’ll see how the regulatory side might start to take advantage of real-world data like mortality and really use that as part of making the entire you know clinical research side come up with you know more usable results better results but hopefully not at a large burden to the industry.
Arnaub: Yeah I think you know you called out some interesting things and as I’m kind of looking through our audience here I’d love for you guys to add feedback and comments as we’re kind of going through this. So it’s not just a one-on-one between me and Jason here, but I know there’s a bunch of folks who have pretty particular opinions on some of the topics that you laid out. But, you know, one thing we’ve been thinking a lot about on the data side is you know, we’re now tokenizing you know, close to 400 clinical trials and, you know, these are going to have a necessity to connect, you know, to mortality data. Just because many of these are oncology trials, many of them are looking at, you know, long-term follow-up. We do want to track kind of specific mortality outcomes that are associated with you know trial success or trial failure. So I do think that’s a really important piece of the puzzle here is like as we’re trying to build you know holy grail end to end data sets like having that linkage and understanding of that endpoint is super important. I think the other really interesting call out you made was like this regulatory component. The fact that you know the FDA basically just said that this is you know getting date of death data from a nice to have to a must-have. Particularly for like a randomized oncology trial, it’s a bit of a game changer I think in sort of the validation and necessity of that data. You know in terms of how you look at effectiveness or safety you know efficacy of therapy. So what they’re saying if I’m if I’m reading you right is like this data point is now to be expected to be collected you know even if you’re not using it as the primary endpoint and then over time you know if you’re on the pharmaceutical side of a life sciences company, you have to push towards collecting this, you know, externally validated mortality data that is auditable, traceable, can be connected to an individual patient you know, in order to like satisfy certain regulatory requirements.
Jason: Right. I think that’s what we’re starting to see in some of this guidance which I think is smart overall. I think this is a way to improve these trials and make sure that we’re kind of keeping our eye on you know, what should be primary endpoints even if they’re difficult to measure. And I think that’s it’s an exciting term for the industry, to be able to think about other ways to collect that data and append it, rather than just, you know, my trial just got more expensive because now more data and more touch points I have to conduct to collect these things.
Arnaub: Yeah, totally. And then that that stat you mentioned on the UCLA side is crazy, right? Where not knowing where 20% of your ill patients are in that EHR I mean that that’s you know we’re you know dead or alive. It’s a really important kind of finding for the payer provider side of our industry too.
Jason: It is. And you know UCLA is a pretty advanced system, really good systems. They had folks really working on this problem which is why they did the study and published it. So you know if that’s what the good end of the spectrum like I can only imagine you know what the more busy systems or you know less well-resourced folks are doing.
Arnaub: Yeah totally. Well I think like you made a pretty strong case for kind of the expansiveness of where you’re seeing mortality data, you know, kind of go and, you know, it sounds like there’s going to be a whole number of different use cases that we hadn’t considered before, which I think is really fascinating. So, you know, it sounds like we’re moving in a direction here where expertise, data discoverability and data infrastructure, data linkage, and then the actual data itself, right? Like on the mortality side, you know, we’re going to open up different doors, right? And I think it’s sort of that triumvirate of, you know, people and technology and quality data that kind of brings it all together. So, you know I’ll definitely open up the floor here for questions you know as we’re starting to see you know what people think about this topic. I’ll take one from you know Medhi here who’s a wonderful researcher works you know within the clinical trial space. He’s wondering if there’s any formal validation of Veritas data against a gold standard like NDI, National Death Index.
Jason: Yeah. Great question. So, we chart our progress using, NDI’s kind of total stats. So, we, CDC publishes out, the stats of number of deaths by state, by age, by gender. And so you can really get a very nice granular view of how many deaths CDC is reporting across the country. And those are kind of continuously updated. So we’re continuously measuring, our volumes against their volumes to understand A. how many deaths are we finding overall, but B. do we have any kind of holes in in our data collection methodology. Are there spots in the country that we’re light? Are there certain demographic groups that we’re light on? And I think generally we find that we’re pretty well aligned. I think as you would imagine if we have 90% of deaths, you know, we don’t have any glaring holes. So that’s one validation we do is against kind of those total counts. We’ve done a number of client evaluations as you can imagine, right? Anytime folks are trying out data, sometimes they want to look at it. They may have in-house repositories of data that they’ve used in the past. So, we’ve done a number of those. Typically, folks are looking at A, coverage and B, timeliness. Those are pretty easy to measure. But the other thing that really trying to validate against is the congruity of our death dates versus the death dates they have internally. Because we do see some variability there. And so yeah, we’ve done a number of validations which basically say, you know, here are all the people we think are dead within your patient cohort, you have your internal records who you think is dead. Do we find the same people? When we find the same people, do we have the same dates of death and things like that? And typically what we find is we’ll have a little bit of a gap and often due more to a matching problem, right? Is this the same person? We have different first names, right? One’s a nickname, one’s the full name. So, we’ll find a little bit of difference in in who we identify as being deceased. Sometimes we’re more comprehensive, so we’ll find additional folks that they don’t have. But I would say our alignment on matching is well within kind of parameters what you expect to see is false positives and false negatives. So, we typically find the same folks if we’ve got the same folks. And then we have very high congruity on the date of death. I think that’s what we found to be very commonly true. So those are the types of validations we’ve done. We are working on getting validation studies done on the cause of death. You can imagine that one’s a little more fraught. That one is a bit tricky because as I said NDI cause of death data coming from death certificates is not actually a gold standard right, we know there’s a lot of errors so what do you compare against? So what we’re working on now is basically a comparison of our algorithmic approach to determine cause of death in medical records to actually what a physician chart review would be by a human. We think that’s probably the best comparator of you know being a gold standard is an actual doctor sitting down a chart review. So that don’t have results for that yet but that is the way we’re thinking about approaching validating cause of death.
Arnaub: That’s very super thorough and makes a ton of sense. Yeah. Opening up to anybody else. You know, but if we don’t have any other questions at this point, you know, like you can certainly reach out to me or Jason, you know, whenever on LinkedIn. More than happy to keep kind of continuing this conversation. Like I mentioned, you know, this is the first of a number of different LinkedIn Live series that, you know, Datavant is hosting. And we’re so honored to, you know, have Jason here, longtime friend, partner you know, at Datavant and the work that he’s doing with Veritas is an amazing example of I think what happens when you put together partners who can come across an entire data network. You bring together unique data assets and you have kind of tech infrastructure people you know putting it all together and kind of activating at scale. So really appreciate the conversation today, Jason. We should do it again soon.
Jason: Yeah absolutely. Thank you for having me.
Arnaub: All right. Thanks all.
LinkedIn Live Transcript
Arnaub: Okay. Hi everyone. So, I’m very excited. My name is Arnaub Chatterjee. I’m the President and GM of our Life Sciences business here at Datavant. By way of background, Datavant is focused on facilitating the secure sharing of real-world data across a network of more than 350 data partners within our ecosystem and we’re here to support the transformation of life sciences research and improve patient outcomes. This is a particularly exciting moment for us because we’re actually kicking off our first LinkedIn Live conversation as part of a new series in which I get the chance to chat with our real-world data partners across this data network and we want to hear what’s top of mind for them at this very moment within our industry. How they might be thinking about some of the key strategies within sciences and how the collaborations are actually unlocking new and more efficient ways to answer these all-important research questions. So, I’m actually going to turn it over to our special guest today Jason to introduce himself.
Jason: Thanks, Arnaub. So, hi everyone. I’m Jason LaBonte. I’m the CEO of Veritas Data Research. We focus on collecting, cleaning and delivering contextual data sets to augment traditional transactional real-world data like claims, EHRs and lab data and basically provide missing variables that we think are critical for expanding what the analytics are you can do with real world data. And we started with mortality data, and I think that’s what we’re going to talk about today.
Arnaub: Yeah. Amazing. So, let’s get into it. I think it’s a really interesting time we’re in and there’s a moment where the sensitivity and the value of data is at an all-time high. So, we know this. Like most of the data that we connect is particularly sensitive. But then you have mortality data which is an entirely different category of its own. And earlier this year Veritas and Datavant actually expanded our working relationship and our partnership so that more organizations can better discover or evaluate or link to mortality data in a privacy compliant way. And we think that’s going to ultimately empower the healthcare research world, the life sciences organizations to answer really critical questions with much more confidence. So, Jason and the Veritas team have been building what I think is the most complete and accurate mortality data set that’s available today. And we’re going to explore why this matters. And you know I think as part of the first question I kind of had for you, Jason, is you know what I guess how has the role and the perception of mortality data changed in recent years and where are you seeing sort of the biggest unmet needs today?
Jason: Yeah, great question. So, you know mortality is interesting. You’d think this would be, you know, if you had to pick one endpoint for healthcare analytics, you know, mortality might be it. But actually traditionally, it’s been pretty hard to access it as an endpoint, in terms of, you know, third party data. So, folks have used in the past Social Security Administration’s Death Master File. Up until about 2010, that was a pretty comprehensive file, had about 80, 85% of the deaths in the country. But after that, states were allowed to opt out. And so that file today is probably only 15% deaths in the country. CDC has a good, comprehensive research grade death data set called the National Death Index, but it’s not commercially available. You’re not allowed to use it for commercial analytics. And so, it’s really not that useful for a lot of things that we in industry want to do with this data. The same thing is true with the states. Some states have death data available, some don’t, but there are a lot of variable restrictions on how you can use that data. So, the landscape for kind of traditional, you know, government sourced death data has been really hard to access and poor coverage. There are folks in the past that have used deaths that occur in a hospital. So, you might see those come through in an EHR data set or claims data set. In our analysis, we’ve seen those be maybe 15 to 20% deaths in the country that you can see in that way. So, I think historically folks have said ‘Well I can get a little bit of death data. I’ll hope it’s enough. I’ll boil my cohort down to where I have some death data and look at that.’ But I think for the more advanced analytics that folks want to do and certainly for applications beyond some basic analytics those just simply weren’t good enough. So, I think what we’ve seen is a consistent need for a much more comprehensive mortality data set that can be appended on to real world data really unlock a lot more types of analyses. So that’s what we’ve done at Veritas. We spent the last three years really building out a comprehensive kind of data collection mechanism and data cleaning. We’re going out to 40 to 50,000 different sites every week. So that is funeral homes and obituaries and crematoriums, and you name it. We’re out there collecting all of this death data and bring it together and what that’s allowed us to do is build a data set that is about 90% coverage. It’s really hard to get to 100. But we think we’re at about 90, 90% plus. And we collect deaths within two weeks of the date of death. So, most of our 90% of our data is within two weeks of the date of death. So, it’s really timely and it’s really comprehensive. And now that allows our clients to do a lot of new interesting things with mortality that really previously weren’t available.
Arnaub: So, I mean you’re exposing some really important gaps within the industry at large. One is sort of the longitudinality, the time of capture, the latency. And I think what you’re saying is, you know, if you are not leveraging mortality data, you’re basically not creating sort of an expansive or all-encompassing real world data set.
Jason: I think that’s right.
Arnaub: Yeah, and that’s like an interesting challenge, right? Because having the right mortality data is one thing, but it sounds like getting a complete picture of what happened and sort of that end to end is something else entirely. And this is where I think, you know, we we’ve chatted a lot, you know, as both of our respective organizations and like how we think about bringing together Veritas’s depth with a network of data and tools around it.
Jason: Yep.
Arnaub: Just to kind of piggyback and maybe transition to another set of questions here. You mentioned, you know, how we think about fact of death and that’s one of the big differentiators for Veritas. How do you build a more complete and accurate record?
Jason: Yeah, great question. Real world data is messy, and death data is no exception. So, we think about fact of death as who died and when and where, right? As I said, we’re sourcing that from 40,000 different places. There is a lot that goes on behind the scenes to make that a usable record. So, you know the big differentiator is not just that we have a death record but is it any good? So, we spend a lot of time building transparency into our system around how we consolidate multiple records. We might get one from an obituary, one from a cemetery, one from social security, right? Are those the same person? How do we match them together? How do we derive what the best value is for the date of death, for that location of death? And so, we spent a lot of time adding extra data elements into our data set that give our clients an idea of what is the confidence in this record based on where we sourced it. Where did we source it from? Is it potentially a duplicate with another record? Right? So, there might be a typo in the date of death, and it looks like the same person, but they died one year apart. Well, which one we think is the right record? Are they potentially duplicates? Can we flag that for our customers to make the data more usable? Right? So, there’s providing data, but then there’s making that data usable and allowing clients the flexibility to decide for themselves. I like these records over this confidence threshold, but not ones below it or I like to de-duplicate it further or no I want to keep those separate. There’s a lot of data science folks are doing with our data after it leaves our hands, and we want to facilitate that as much as possible that you know they understand what our data is and where it comes from. And so, to the second point we’re really big believers in the FAIR principles and FDA’s emerging guidance on traceability and provenance of data. And we want to support that as much as possible. So, when we deliver data we’re actually delivering not just where this came from an obituary, you know, a government source, what have you, but you can actually for clients identified data click through and see the actual URL of the obituary that was the underlying source. And so all of our records are fully traceable back to where we got them. And I think that’s really important as people start to use this in real world evidence and combining it with other data that they are able to trace it back to an originating source. If FDA wants to audit it we can support all that. So it’s another key piece of when we deliver data it’s not just the data itself but it’s all that kind of supporting infrastructure. Last piece we’re working on is not just fact of death but also cause of death. So the fact that somebody died doesn’t mean that that person is necessarily relevant to your study. So if you’re doing a cancer survival curve and somebody dies for some other unrelated reason, you may decide you don’t want to include them in your cohort. So where do you go get the cause of death data to append to this? Cause of death is tricky. That’s a whole another problem area. Typically you can find it on death certificates. Those again are not accessible to most of us for commercial applications or at all. But there was also a study done in 2021, I think it was, where the cause of death list on a death certificate is wrong almost 40% of the time. It’s usually just picked off the chart before the person is discharged. And the person choosing that cause of death doesn’t necessarily have to be a physician. Every state is different. Could be all sorts of different folks that actually are writing out that death certificate. So, we take a different approach. We look backwards using real world data from a date of death. We go back three years and we basically do a chart review at scale. We look at all the ICD codes and procedure codes and things like that and assign weightings to all those different clinical events and say what do we think is the most likely cause of death, the secondary causes of death based on this clinical history so that our clients have an ability to kind of look at okay all these people died but which ones are the ones that died for a reason I’m studying and I want to include them versus not.
Arnaub: You’re picking on some things that I think have been kind of ever-present in our industry. One of them being what is sort of the bar for regulatory grade evidence right in our space. And you know the other one being traceability, auditability coming back to like you know a true source that you can validate and then go back to you know a regulator or a payer or some other body with kind of a source to that. I do think that that that is really important. It’s like a step change for how it’s been done before. And I guess once you have that level of completeness and you know kind of the longitudinal record it’s you know connecting it with all of these other data sources so that if you’re a researcher, if you’re in clinical trial design, whatever it is, you see the full story and not just sort of an isolated data point.
Jason: Right. And if there’s ever any oddities in the data, it’s a lot easier to trace back those might where they might come from if you have all this other supporting information.
Arnaub: That’s super cool. One thing I’ve been thinking a lot about is like the work that we’ve done together is really trying to kind of make a change in terms of how mortality data is not just available but it’s actionable, it’s reliable, it’s everything I was just saying and you were saying around like trust and how have we been driving that shared approach and what are things that are on your mind?
Jason: I think so we’ve talked about kind of the data being available, right? Just kind of as a you know the data set exists. I just talked about kind of the things that we add to it to make it more usable from kind of a data science point of view. But there I think there’s a shared view with between us and Datavant on you know there are other problems you also have to solve to make available and usable, right? So I think for instance, our data is being combined with clinical data very often. Clinical data has a lot of privacy, implications and concerns that you need to address. I think Datavant’s done a great job of building out a whole bunch of services and solutions to, understand what the privacy risk is of a data set, especially when you join them together. How do you de-identify those data sets and ensure that they stay de-identified? And because our data is kind of being used in that modality a lot of times we’re really kind of I think sharing your vision that that’s a key piece you have to address and we’re happy to be part of your ecosystem to do that. So, I think there’s that privacy piece that you have to understand. But then there’s also the fact that this is just sensitive data and we’re big believers as I think you are that you have to appreciate that even being de-identified data sometimes just can’t move from the environment that’s being worked on. And so, it’s really important to kind of help clients work with data where they are and where they’ve set up their own tools and processes. So, we’ve built out a very robust kind of data delivery engine that can move our data into wherever our clients want to be. And I think that’s a shared ethos with you guys. And I think that’s also why we’re excited to be an early adopter of methods to make mortality data more usable. Such as using the Datavant Connect, kind of solution that’s powered by the AWS clean rooms. So, I think being able to support your customers, in whichever environment they want to work within, making sure that their privacy and security procedures are able to be followed without a lot of friction. Those are really important additional elements to making your data usable.
Arnaub: Yeah. No, I definitely appreciate the call out there. You know, kind of like one of our main goals is to ensure that you know, our data partner ecosystem here can maintain control of their data, but so that it’s discoverable through you know our portal through Datavant Connect. You know and from our end and how we support you guys like, you know, I just want to make sure that you know, the partners we’re working with, especially Veritas, has made mortality data available and accessible for researchers you know to evaluate and link alongside the rest of the RWD community so you know it’s this mutual goal there.
Jason: Yeah, yeah that’s a key piece of discoverability and then usability.
Arnaub: Totally. Which kind of gets to like usability right? Like Do you think this is probably pretty meta in terms of a question, but it’s like is mortality data underutilized today and where do you see the upside, the opportunity moving forward? How do you kind of see your guys’ space changing?
Jason: Yeah, great question. I would say I think folks have gotten a pretty good appreciation for mortality data within health economics and outcomes research, because I think that’s where traditionally mortality has been an endpoint. I think with survival curves mortality is well pretty well understood. I think we just provide a better data input for a lot of that stuff. And then I think for risk modeling, I think we’re seeing a rise that especially with AI and people trying to build models around things. Mortality is an important input into a lot of the risk modeling work that folks are doing. So I would say within those use cases, I would say the industry as a whole has come up the curve and is pretty aware that this data is out there and starting to really incorporate it. I think there’s, you know, more that everybody can do and I think, we’ll see kind of as people get more creative with the modeling side, what’s possible there. Where I think there’s going to be, some real new applications for mortality data that people really haven’t considered, mostly because the data wasn’t there, is really moving into kind of the operational side. And so there’s a couple elements of that I think that we think about, clinical trials, where mortality may or may not be an official endpoint. But people do want to look at survival even after the trial closes. They want to look at maybe you know long-term survival. Going out and reconnecting all those participants and gathering you know current data about them is very expensive and slow. I think with trial tokenization the ability to append on mortality data to do those survival analyses really quickly and efficiently is something I think we’re going to see really start to happen over the next couple years. So, excited about kind of unlocking those things. I think there’s a lot of operational applications to patient support programs and other types of things where there’s a lot of patient outreach. Where our data can be used to basically continuously monitor the group of folks you’re trying to support. Those support programs are pretty expensive. You have often nurses calling up patients and trying to assist them. And if you can understand ‘well this patient is non-responsive because they might be deceased.’ You can not only see yourself a lot of time and energy trying to do outreach to them. But you can also prevent a lot of insensitive outreach that’s probably, you know, not appreciated by their family. And so I think we see a lot of applications where mortality data, especially when it’s timely like ours is built into some of those operational workflows to help people do things more efficiently and more sensitively. And we see that also in health systems and payers. UCLA did a study where 20% of their severely diseased patients in their EHR were actually deceased and they didn’t know it. They were holding all sorts of appointments open for them, filling prescriptions, ordering labs. And so that’s a that’s a strain on their health system, especially when folks are on big waiting list trying to get into those appointments. But again, it’s also has this insensitivity element that people want to really avoid. So, I think we’re going to see mortality move from kind of the traditional key survival world into clinical trials and patient support programs into operational kind of just outreach efforts. And then I think you know that’s I think mostly led by the analytics and then the operations side. But recently FDA just announced I think the push to have overall survival be part of all cancer studies kind of going forward. So, there’s draft guidance now where they are basically saying even if overall survival is not your primary endpoint, we want you to start thinking about how to measure that because that’s important from a safety perspective even if not from an efficacy standpoint. That could be a big burden on folks unless our mortality data can help them kind of measure that survival endpoint in a much more efficient way. So, I think we’ll see how the regulatory side might start to take advantage of real-world data like mortality and really use that as part of making the entire you know clinical research side come up with you know more usable results better results but hopefully not at a large burden to the industry.
Arnaub: Yeah I think you know you called out some interesting things and as I’m kind of looking through our audience here I’d love for you guys to add feedback and comments as we’re kind of going through this. So it’s not just a one-on-one between me and Jason here, but I know there’s a bunch of folks who have pretty particular opinions on some of the topics that you laid out. But, you know, one thing we’ve been thinking a lot about on the data side is you know, we’re now tokenizing you know, close to 400 clinical trials and, you know, these are going to have a necessity to connect, you know, to mortality data. Just because many of these are oncology trials, many of them are looking at, you know, long-term follow-up. We do want to track kind of specific mortality outcomes that are associated with you know trial success or trial failure. So I do think that’s a really important piece of the puzzle here is like as we’re trying to build you know holy grail end to end data sets like having that linkage and understanding of that endpoint is super important. I think the other really interesting call out you made was like this regulatory component. The fact that you know the FDA basically just said that this is you know getting date of death data from a nice to have to a must-have. Particularly for like a randomized oncology trial, it’s a bit of a game changer I think in sort of the validation and necessity of that data. You know in terms of how you look at effectiveness or safety you know efficacy of therapy. So what they’re saying if I’m if I’m reading you right is like this data point is now to be expected to be collected you know even if you’re not using it as the primary endpoint and then over time you know if you’re on the pharmaceutical side of a life sciences company, you have to push towards collecting this, you know, externally validated mortality data that is auditable, traceable, can be connected to an individual patient you know, in order to like satisfy certain regulatory requirements.
Jason: Right. I think that’s what we’re starting to see in some of this guidance which I think is smart overall. I think this is a way to improve these trials and make sure that we’re kind of keeping our eye on you know, what should be primary endpoints even if they’re difficult to measure. And I think that’s it’s an exciting term for the industry, to be able to think about other ways to collect that data and append it, rather than just, you know, my trial just got more expensive because now more data and more touch points I have to conduct to collect these things.
Arnaub: Yeah, totally. And then that that stat you mentioned on the UCLA side is crazy, right? Where not knowing where 20% of your ill patients are in that EHR I mean that that’s you know we’re you know dead or alive. It’s a really important kind of finding for the payer provider side of our industry too.
Jason: It is. And you know UCLA is a pretty advanced system, really good systems. They had folks really working on this problem which is why they did the study and published it. So you know if that’s what the good end of the spectrum like I can only imagine you know what the more busy systems or you know less well-resourced folks are doing.
Arnaub: Yeah totally. Well I think like you made a pretty strong case for kind of the expansiveness of where you’re seeing mortality data, you know, kind of go and, you know, it sounds like there’s going to be a whole number of different use cases that we hadn’t considered before, which I think is really fascinating. So, you know, it sounds like we’re moving in a direction here where expertise, data discoverability and data infrastructure, data linkage, and then the actual data itself, right? Like on the mortality side, you know, we’re going to open up different doors, right? And I think it’s sort of that triumvirate of, you know, people and technology and quality data that kind of brings it all together. So, you know I’ll definitely open up the floor here for questions you know as we’re starting to see you know what people think about this topic. I’ll take one from you know Medhi here who’s a wonderful researcher works you know within the clinical trial space. He’s wondering if there’s any formal validation of Veritas data against a gold standard like NDI, National Death Index.
Jason: Yeah. Great question. So, we chart our progress using, NDI’s kind of total stats. So, we, CDC publishes out, the stats of number of deaths by state, by age, by gender. And so you can really get a very nice granular view of how many deaths CDC is reporting across the country. And those are kind of continuously updated. So we’re continuously measuring, our volumes against their volumes to understand A. how many deaths are we finding overall, but B. do we have any kind of holes in in our data collection methodology. Are there spots in the country that we’re light? Are there certain demographic groups that we’re light on? And I think generally we find that we’re pretty well aligned. I think as you would imagine if we have 90% of deaths, you know, we don’t have any glaring holes. So that’s one validation we do is against kind of those total counts. We’ve done a number of client evaluations as you can imagine, right? Anytime folks are trying out data, sometimes they want to look at it. They may have in-house repositories of data that they’ve used in the past. So, we’ve done a number of those. Typically, folks are looking at A, coverage and B, timeliness. Those are pretty easy to measure. But the other thing that really trying to validate against is the congruity of our death dates versus the death dates they have internally. Because we do see some variability there. And so yeah, we’ve done a number of validations which basically say, you know, here are all the people we think are dead within your patient cohort, you have your internal records who you think is dead. Do we find the same people? When we find the same people, do we have the same dates of death and things like that? And typically what we find is we’ll have a little bit of a gap and often due more to a matching problem, right? Is this the same person? We have different first names, right? One’s a nickname, one’s the full name. So, we’ll find a little bit of difference in in who we identify as being deceased. Sometimes we’re more comprehensive, so we’ll find additional folks that they don’t have. But I would say our alignment on matching is well within kind of parameters what you expect to see is false positives and false negatives. So, we typically find the same folks if we’ve got the same folks. And then we have very high congruity on the date of death. I think that’s what we found to be very commonly true. So those are the types of validations we’ve done. We are working on getting validation studies done on the cause of death. You can imagine that one’s a little more fraught. That one is a bit tricky because as I said NDI cause of death data coming from death certificates is not actually a gold standard right, we know there’s a lot of errors so what do you compare against? So what we’re working on now is basically a comparison of our algorithmic approach to determine cause of death in medical records to actually what a physician chart review would be by a human. We think that’s probably the best comparator of you know being a gold standard is an actual doctor sitting down a chart review. So that don’t have results for that yet but that is the way we’re thinking about approaching validating cause of death.
Arnaub: That’s very super thorough and makes a ton of sense. Yeah. Opening up to anybody else. You know, but if we don’t have any other questions at this point, you know, like you can certainly reach out to me or Jason, you know, whenever on LinkedIn. More than happy to keep kind of continuing this conversation. Like I mentioned, you know, this is the first of a number of different LinkedIn Live series that, you know, Datavant is hosting. And we’re so honored to, you know, have Jason here, longtime friend, partner you know, at Datavant and the work that he’s doing with Veritas is an amazing example of I think what happens when you put together partners who can come across an entire data network. You bring together unique data assets and you have kind of tech infrastructure people you know putting it all together and kind of activating at scale. So really appreciate the conversation today, Jason. We should do it again soon.
Jason: Yeah absolutely. Thank you for having me.
Arnaub: All right. Thanks all.
Request More Information
Speak to a Veritas expert to learn how subscribing to our data can make your organization’s operations and analytics more effective.
