How Epitel Spent 5 Years Building an AI Dataset That Couldn’t Be Bought

How Epitel turned a missing EEG dataset into a five-year moat – showing founders what it really takes to build defensible AI when no data exists to buy.

Written By: Brett

0

How Epitel Spent 5 Years Building an AI Dataset That Couldn’t Be Bought

How Epitel Spent 5 Years Building an AI Dataset That Couldn’t Be Bought

Ask a cardiology AI company where they got their training data. The answer: they bought 30,000 records. Ask Epitel the same question. The answer: they built it themselves over five years because no one had any data to sell.

In a recent episode of Category Visionaries, Mark Lehmkuhle, CEO and Founder of Epitel, a brain health technology platform that’s raised over $20 million, revealed an infrastructure gap that reshaped their entire development timeline and accidentally created their most defensible competitive advantage. The lesson for AI founders isn’t about neurology—it’s about what happens when the basic infrastructure your technology requires simply doesn’t exist in your market.

The Dataset That Doesn’t Exist

When Epitel needed training data for their seizure detection AI, Mark looked to adjacent markets for guidance. Cardiology companies had figured this out decades ago. “I got to talk to one of the companies back then and ask them, hey, how did you do this? Because we need this clean data set that’s annotated to train our machine learning detection of these seizures, to train our AI,” Mark recalls.

Their answer was straightforward: “We just purchased 30,000 EKG records that were well annotated.”

Perfect. Epitel would do the same thing. Mark approached hospitals with epilepsy monitoring units that regularly recorded EEG data. “We go and we look and talking to many of these hospitals that have an epilepsy monitoring unit and things like that, and said, hey, do you have records that we could purchase?”

The response fundamentally changed Epitel’s trajectory: “No, we don’t have the storage for that. You know, after we record it, we enter the report into the electronic health record, and then we just delete the data because we don’t have the storage for it.”

This wasn’t a procurement problem or a price negotiation. The data simply didn’t exist. Hospitals were conducting EEG tests, extracting clinical insights, documenting findings in patient records, and then deleting the underlying brain wave data because storing it served no clinical or economic purpose.

Why Infrastructure Gaps Persist

The gap between cardiology and neurology data infrastructure reveals how market maturity creates invisible advantages for some technologies while blocking others. Cardiology went digital earlier, established standards for data storage and annotation, and created enough commercial demand that selling historical records became viable.

EEG never made that transition at scale. The three-day hospital EEG remained the standard, reviewed by neurologists who manually scanned for seizure patterns and documented findings. There was no economic incentive to store raw EEG data long-term, no established marketplace for historical records, and no infrastructure for annotation at scale.

For Epitel, this meant their AI strategy couldn’t follow the playbook that worked in other medical domains. “So now we’re going to have to develop our own data set,” Mark realized. What should have been a procurement challenge became a multi-year data collection initiative.

Building Ground Truth

Epitel’s approach required running dual collection: the gold standard wired EEG alongside their wireless sensor system. “We used the wired EEG as kind of the ground truth and then trained our machine learning on our sensor data,” Mark explains.

This dual approach was necessary because you can’t train AI without knowing what the “right answer” is. The wired EEG, reviewed by expert neurologists, provided that ground truth. Epitel’s wireless sensors collected the same brain activity data. The AI learned to identify seizure patterns in wireless sensor data by comparing it to expert-annotated wired EEG data from the same patients at the same moments.

This process couldn’t be rushed. Each data collection session required:

  • Recruiting patients who consented to wearing both systems
  • Coordinating with hospitals for controlled environment collection
  • Having neurologists review and annotate the wired EEG data
  • Ensuring wireless sensor data quality matched timing of wired recordings
  • Building enough dataset volume to train robust algorithms

Mark spent years on this data collection before Epitel could even begin training their AI algorithms in earnest. The timeline was dictated by how quickly they could collect high-quality, annotated data—not by how fast they could write code.

The Validation Gauntlet

Building the dataset was only the beginning. Epitel then had to prove their technology worked outside controlled environments. “We’ve got this dataset. We’ve trained our machine learning. We’ve gotten it through the FDA and as software, as a medical device, it’s good to go. But we’ve only ever recorded with these sensors in these very controlled environments in the hospital,” Mark explains.

The next phase required proving real-world viability: “What I’m telling you is that we want to get this outside the hospital for people so that they can live their lives and have this long term recording. So ultimately we had to run a number of pilots to prove to ourselves that, yeah, it’s good to go both in the hospital and outside the hospital as well.”

Each environment introduced new variables. Hospital data is clean—patients are stationary, technicians ensure good sensor contact, interference is minimal. Home data is messy—patients move, sensors shift, electrical interference from appliances creates noise. The AI had to work in both contexts, which meant collecting validation data across multiple settings.

The Unintended Moat

What started as a frustrating obstacle became Epitel’s most defensible competitive advantage. Any company wanting to compete in seizure detection AI faces the same multi-year data collection challenge Epitel already completed.

You can’t buy your way around this. The data doesn’t exist in the market. You can’t scrape it from public sources. You can’t shortcut the collection process because FDA approval requires demonstrating your AI works on properly annotated, clinically validated data.

The five years Epitel spent building their dataset creates a time-based moat that capital can’t compress. A well-funded competitor could hire brilliant engineers and build sensors quickly. But they can’t accelerate the process of recruiting patients, collecting dual EEG recordings, having neurologists annotate data, and accumulating enough volume for robust AI training.

This type of moat is particularly powerful because it’s invisible until competitors try to replicate it. From the outside, Epitel’s technology looks like wireless sensors plus AI algorithms. The underlying data infrastructure that enables those algorithms is hidden but essential—and took half a decade to build.

The AI That Enables the Business Model

The dataset’s value extends beyond competitive advantage—it enables Epitel’s entire business model. “Beyond the three days, no neurologist has the time to review that EEG data, because they’re literally reviewing it beat by beat, going from the very beginning to the very end, looking for these patterns that they’ve been trained to recognize as seizures,” Mark explains.

Without AI, long-term EEG monitoring isn’t economically viable. Neurologists can’t spend hours reviewing weeks of brain wave data where nothing interesting happens. The AI’s role is triage: “It’s meant to take these weeks and weeks of EEG, where in most cases, nothing’s going on that’s interesting and flag interesting bits of information for a neurologist to then go through.”

This transforms the economics of long-term monitoring. Instead of neurologists reviewing 168 hours of data for a week-long recording, they review the minutes the AI flagged as potentially containing seizures. The dataset that took five years to build doesn’t just power a feature—it enables the core value proposition.

Platform Leverage from Dataset Investment

The dataset’s strategic value multiplies as Epitel expands beyond seizure detection. Mark’s vision includes Alzheimer’s early detection, stroke differentiation, and preventive brain health monitoring. Each new application leverages the same underlying infrastructure.

“I’ve got this, like, roadmap that is huge of all the different spaces that we want to go with this technology,” Mark notes. The AI training methodology proven for seizures—collect dual recordings, build ground truth, train algorithms, validate across environments—becomes reusable for each new condition.

This platform leverage means the marginal cost of entering adjacent markets is dramatically lower than the original infrastructure build. A competitor trying to enter Alzheimer’s detection would face the same multi-year data collection challenge. Epitel can retrain existing models on new patterns using collection infrastructure they’ve already proven works.

The Framework for Infrastructure-First Markets

Epitel’s experience reveals a pattern that applies beyond medical AI to any market where basic infrastructure is missing:

First, identify whether infrastructure gaps exist in your market. Can you buy the data, tools, or resources that more mature markets take for granted? If not, you’re facing an infrastructure-first challenge.

Second, recognize that building missing infrastructure takes longer than investors expect but creates defensible advantages. The timeline can’t be compressed with capital—it’s dictated by the inherent pace of data collection, relationship building, or capability development.

Third, find patient capital that matches infrastructure-building timelines. Epitel used grants specifically because “any seed stage investor is just not going to see the return on that investment for a long period of time.”

Fourth, leverage infrastructure investments across multiple use cases. The dataset that took five years for one application becomes the foundation for an entire platform.

The Takeaway

When Mark discovered that EEG training data didn’t exist to purchase, it seemed like a setback. Five years later, that “setback” is Epitel’s competitive moat—a time-based advantage that well-funded competitors can’t buy their way around.

For founders building AI in domains where training data doesn’t exist, Epitel’s story offers both warning and opportunity. The warning: you’re looking at multi-year timelines that most investors won’t fund. The opportunity: once you’ve built that infrastructure, you own a defensible advantage in your market.

Sometimes the best moat isn’t brilliant technology or network effects. Sometimes it’s just data that took five years to collect because no one else bothered to save it.