BOOK THIS SPACE FOR AD
ARTICLE ADFollowing the success of OpenAI's GPT series of large language models, an increasing number of institutions are proposing "foundation" models for artificial intelligence that, like GPT, are "pre-trained" to have very broad capabilities in a domain of knowledge. We saw this last week with Nvidia CEO Jensen Huang proposing a "world foundation model" for autonomous vehicles and robots.
On Tuesday, at the annual JP Morgan Healthcare Conference in San Francisco, AI computer startup Cerebras Systems and medical research powerhouse Mayo Clinic presented findings of what they're calling a foundation model for genomics that can tease out the genetic root of inherited conditions. The goal is to "build the ChatGPT of healthcare," according to Cerebras and Mayo Clinic.
Also: AI agents may soon surpass people as primary application users
The first breakthrough of the year-long collaboration is the potential capability to predict drug response from patients with rheumatoid arthritis. That potential breakthrough could, the companies said, "significantly accelerate diagnostic time and improve accuracy."
"It's exciting the work that our teams have done together, something that we've always heard about, which is that you'll be able to predict response to therapy," said Dr. Matthew Callstrom, M.D., the Mayo Clinic's medical director for strategy and chair of radiology, in an interview we conducted preceding the presentation. Callstrom oversees teams at Mayo that are working with Cerebras.
"It's probably going to become real in the next few years as we take advantage of using these foundation models and using data that isn't text," said Callstrom.
Matthew Callstrom [left], Mayo Clinic medical director for strategy, and Cerebras CEO Andrew Feldman.
"There's been a foundation model for language, there have been foundation models for protein folding, and the work Mayo has done on our equipment is a foundation model for genomics," said Cerebras co-founder and CEO Andrew Feldman in the same interview.
Cerebras and Mayo Clinic first announced a partnership to work with Cerebras CS-3 AI computers a year ago. Cerebras spent several months obtaining HIPPAA certification to work with private patient data. The experiments were run on units of the CS-3 operating in a Cerebras cloud computing facility reserved for The Mayo Clinic, and all data used was stored locally in order to be compliant with HIPAA requirements.
"This partnership has unfolded exactly as you hoped a partnership might where they brought domain expertise, and they had tremendous data assets and AI expertise," said Feldman. "And we brought AI expertise and world-class compute."
Also: Nvidia Cosmos - an AI platform to change the future of robots and cars - wins Best of CES 2025
The life sciences have long used neural networks to predict whether a change in a DNA nucleotide (one of the individual nucleic acids of DNA), guanine, adenine, cytosine, or thymine can predict a heritable condition such as rheumatoid arthritis.
In the case of the Cerebras-Mayo model, the technology operates instead on groups of nucleotide changes, to use the intersection of DNA changes to gain greater predictive power.
The foundation model is made up of a billion parameters, or neural weights, to sift the data, which Feldman notes is 10 times larger than Google DeepMind's AlphaFold, which is regarded as a foundation model for protein folding problems.
The Cerebras-Mayo model was pre-trained on a trillion tokens, a mix of open-source genomic data, and Mayo's in-house patient data, called Tapestry, for a total of 100,000 patients' data.
According to Feldman and Callstrom, that specific, individual genomic data in Tapestry -- rather than the idealized, generic data from the public domain -- contributes to the increased accuracy of the model.
"Mayo has one of the greatest data sets on earth," said Feldman. "They've been leaders for decades in thinking carefully about data in the medical domain, and now, they're finding insight in it, and that's exactly what you would have predicted years ago."
Rheumatoid arthritis is a crippling condition affecting 1.3 million people. To date, the standard of care has been to head off the progression of inflammation through trial-and-error treatment with a chemotherapy drug called methotrexate.
Also: Absci and Memorial Sloan Kettering partner to search for cancer drugs using AI
Scientists have found the condition is 60% heritable, meaning there is a greater than 50-50 chance that someone develops the condition based on their genetic makeup.
"Rheumatoid arthritis is a fairly common autoimmune disease that causes inflammation of the joints," explained Callstrom. "Cartilage gets eroded, and you get bone on bone, and, quite often, misalignment of the joints."
"The goal is to arrest the inflammation early," said Callstrom, because rheumatoid arthritis is a permanent condition. "And the problem with rheumatoid arthritis is, you don't know what patients are going to respond to." Only 40% of patients, on average, respond to methotrexate, he said. Those who don't respond have to go through another round of months of treatment with another therapy.
"It's not uncommon for patients to go through multiple drug efforts to see if they can arrest the course of the disease," said Callstrom.
The new foundation model not only focuses on Tapestry's particular patient genomes but then also "fine tunes" the model using Mayo Clinic data from 500 patients known to have responded to treatment.
"The key is that our rheumatology team actually tracked patients and how they respond to therapy, with methotrexate and other targeted therapies, and kept an incredible database of 6,000 patients," explained Callstrom. "If you didn't have that, you would have a bunch of patient data, but you wouldn't know what to test it against."
The model is then back-tested to predict what happened to a held-out sample of patients who received methotrexate -- in other words, the model is tested to see if it can accurately anticipate what actually occurred in historical therapy trials.
"You can imagine doing an A/B compare", said Callstrom, where one group gets the therapy and the other gets a placebo.
"Their genes are basically pushed up against the general model to look and see if you can predict for the new patient if they'll respond or not," said Callstrom, meaning, the fine-tuning cohort of rheumatoid arthritis patients.
"What we found is that it looks like it shows early promise in being able to do that for methotrexate," to predict response, he said.
Employing an AI model to predict a response to methotrexate is a first in medicine, said Callstrom. "There is not a model out there that predicts response for rheumatoid arthritis patients," he said. "You couldn't say, 'You will respond to methotrexate' -- you couldn't say those words."
Also: How Cerebras boosted Meta's Llama to 'frontier model' performance
The hypothesis, said Callstrom, is that the new foundation model is pointing to the underlying genetics of the disease.
"The hypothesis is that a patient's response to therapy is at least partially encoded in their DNA," he said. "Your DNA generates certain proteins that either do or do not respond to therapy. That's always been the hypothesis for mixed response, whether or not it's a particular enzyme or cellular response or whatever it might be."
The results are "preliminary," cautioned Callstrom, based on a small number of patients' historical data. Although the foundation model "demonstrate[s] high performance against benchmarks," it is too soon to declare the model has solved the problem, he said. A publication covering the results is "at the final stages" of being put together, he said.
The work "found pretty good signal," he said. "We're expanding that, we're going to do more." Even being able to say some patients won't respond to a drug can be an early benefit of the tool, he said. "If you can remove some people with some certainty from methotrexate, that's a win."
Also: How AI could supercharge your glucose monitor - and catch other health issues
For Cerebras, which has made a practice of tackling especially large neural network tasks, the speed from concept to results is a validation of its superior hardware, said Feldman.
"With blisteringly fast compute, we were able to get results, and, even though they're still early, this has been much faster than is historically the norm in medical research," he said.
The next step is to further improve the accuracy of the foundation model, he said. That can include feeding into the model not just the genomic data but also other data points, including radiology films of the hands and feet. Proteomics, the study of expressed proteins, may very well become part of the data.
"The expression of these genes is really important," meaning, how DNA converts into proteins, said Callstrom. "So, proteomics, and all of the gene expression level things, that will be another phase of what we'll do."
The true test will come with actual patients in treatment.
"What has to be done going forward is, take these early results, this use case, and actually do the work that we do in medicine, which is to prove it in patients going forward," said Callstrom.