How Atropos Health Accelerates Research with LLM Observability
Atropos Health aims to close the evidence gap to make it easier for physicians to have access to on-demand observational studies whenever needed. We caught up with Rebecca Hyde, Principal Data Scientist at Atropos Health, after her session at Arize:Observe in San Francisco.
Hyde has over ten years of expertise in data science, public health, and product development. Her career spans significant roles in both academic and corporate environments, where she has led numerous initiatives, including assessing the impact of health interventions, developing new epidemiological methods, and implementing predictive analytics to improve health data use.
“We perform all of these observational studies, but at the end of them we need a physician to summarize the results in context,” Hyde notes.
Given the time demands and the sheer number of studies, that task can be challenging at scale.
“So we have been developing AutoSummary, which is an LLM-based prompting tool to produce those summaries. We’ve been running that in production and physicians can edit those summaries.”
Watch it
Q+A with Rebecca Hyde, Principal Data Scientist at Atropos Health
Tell us about yourself and your role at Atropos Health
My name is Rebecca Hyde. I’m a Principal Data Scientist with Atropos Health. Atropos Health is a company that wants to close the evidence gap to help make it easier for physicians to have access to on demand observational studies whenever they need it.
What are your Gen AI use cases?
We perform all these observational studies, but at the end of them, we need a physician to summarize the results in context. That can take them 10 minutes per study, and we’re doing hundreds of studies per month. So we have been developing AutoSummary, which is a LLM based prompted tool to produce those summaries.
We’ve been running that in production, and physicians can edit those summaries, but we didn’t really know how it was performing at scale. So we took on a project to start setting up a measurement framework to understand what the performance of AutoSummary was.
How did you approach LLM Observability?
I would say one of the challenges with monitoring was kind of getting the initial inertia to monitoring in a startup environment. There’s always more things to do and more things to create. And once people feel like something’s good enough they often don’t want to dig back in to figure out the right metrics to monitor it.
But once we got over that hump, it was a relatively smooth process to actually just like build an iterative and scrappy approach to figuring out questions we wanted to answer and a few metrics that answered those questions. I think the next step that we’re going to take from here is refining that monitoring to narrow down to the one or two metrics that really matter to us, figure out how to use LLM as a judge properly to get a very accurate read on what’s going on, and getting buy in from our company.
What are your impressions of Arize:Observe?
I think overall it’s been a really fun event with a wide variety of speakers talking about things they’ve built, how they’ve evaluated things. I definitely have learned some tactical tips about basically what LLMs are, what they are good and bad at, and how structured and pointed we should be in our use case of them.