What it does
Clinical trial patient recruitment is costly and consumes 32% of trial budgets and accounts for $16B annually. Bond Health uses proprietary large language models (LLMs) to analyze structured and unstructured EHR data, identifying patients more efficiently.
Your inspiration
I joined a biotech startup, as the lead data scientist, in 2021 when we started recruiting patients for acute respiratory distress syndrome. By 2025, despite our best efforts, we only recruited 2 patients, a stark indicator of how flawed traditional recruitment methods are. In the 4 years that had passed, the company lost over $5M. This experience, coupled with my background in biomedical informatics, inspired me to develop an AI-powered solution that leverages unstructured and structured EHR data to more accurately and efficiently match patients to clinical trials, ultimately transforming patient recruitment in healthcare.
How it works
Our solution, Bond Health, is an AI-powered platform that leverages unstructured and structured electronic health records (EHRs) to match patients to personalized clinical trials in real-time. While these alternatives depend on traditional methods such as physician referrals, flyers, or traditional natural language processing (NLP) models that require structured data and often yield inconsistent results, Bond Health leverages proprietary large language models (LLMs) and transformer models to process unstructured EHR data with higher accuracy and efficiency. This enables accurate identification of relevant clinical trials across a broad range of therapeutic areas. Additionally, the flexibility and adaptability of our LLMs allow for seamless integration of diverse medical terminologies and languages, ensuring effective communication and recruitment globally as we scale our technology.
Design process
My design process began in August 2024 when I started my master’s program at Harvard Medical School. Inspired by the challenges I observed at Gen1e Lifesciences, where we struggled to recruit patients for acute respiratory distress syndrome trials, I began refining our approach to patient recruitment. I developed an early prototype that integrated with electronic health records (EHRs) to ingest both structured data (like lab results) and unstructured data (such as clinical notes). At the heart of the system is a Retrieval Augmented Generation (RAG) framework. This component first indexes and retrieves relevant clinical information from the unstructured notes. Then, a proprietary large language model, fine-tuned on clinical datasets, processes the retrieved data to extract key patient eligibility factors. The output is subsequently matched against clinical trial criteria. Early pilot tests revealed issues like data retrieval latency and false-positive matches. Improvements included optimizing the indexing mechanism and refining the language model through additional domain-specific training Today, our modular platform efficiently ingests and processes complex EHR data, delivering highly accurate patient-trial matches that significantly reduce recruitment times and costs.
How it is different
Bond Health distinguishes itself from competitors like Syneos Health, AutoCruitment, and Haystack Health by addressing critical gaps in recruitment accuracy and automation. While these alternatives depend on traditional natural language processing (NLP) models that require structured data and often yield inconsistent results, Bond Health leverages proprietary large language models (LLMs) and transformer models to process unstructured EHR data with higher accuracy and efficiency. This enables accurate identification of relevant clinical trials across a broad range of therapeutic areas. Additionally, the flexibility and adaptability of our LLMs allow for seamless integration of diverse medical terminologies and languages, ensuring effective communication and recruitment globally as we scale our technology.
Future plans
Our next steps involve launching a comprehensive pilot study using real-world EHR data from our partner hospitals. In this phase, we’ll integrate our Retrieval Augmented Generation (RAG) framework and proprietary language model into live clinical workflows, while rigorously pressure testing the system’s ability to handle large-scale data ingestion. We plan to measure metrics such as patient match accuracy, model precision and recall, false-positive/negative rates, data processing latency, and overall system throughput. Our goal is to improve matching accuracy by 15-20% and reduce processing time by 30% compared to our initial prototype.
Awards
We won the MIT $100K Pitch Competition Fall 2024, which awarded us $2,000 in non-dilutive funding. Additionally, we were finalists of MIT 100K Accelerate (Spring 2025), which selects 11 products across MIT and Harvard. We recently also received a $2500 grant from the MIT Sandbox Program to continue customer discovery.
Share this page on