
Photo illustrations by Judy Blomquist/Harvard Staff
Health
Machine healing
Experts assert that artificial intelligence can alleviate human suffering. Can we keep pace?
During his second year as a medical student in the 2000s, Adam Rodman utilized the library for a patient whose health issue had baffled physicians. Rodman explored the catalog, duplicated research papers, and provided them to the care team.
“It made a significant impact on that patient’s treatment,” Rodman remarked. “Everyone said, ‘This is wonderful. This is evidence-based medicine.’ But it required two hours. Today, I can achieve that in 15 seconds.”
Rodman, currently an assistant professor at Harvard Medical School and a physician at Beth Israel Deaconess Medical Center, now carries a medical library in his pocket — a smartphone application developed following the launch of the large language model ChatGPT in 2022. OpenEvidence — partially created by Medical School faculty — allows him to inquire about specific illnesses and symptoms. It searches through medical literature, generates a summary of discoveries, and provides the most relevant sources for further examination, supplying answers while Rodman is still in front of his patient.
“We say, ‘Wow, the technology is indeed formidable.’ But what steps do we take to genuinely enact change?”
Adam Rodman
AI in various forms has been employed in the field of medicine for decades — but not in this manner. Analysts forecast that the integration of large language models will transform healthcare. Some liken the potential influence to that of decoding the human genome or the advent of the internet. The effects are anticipated to be evident in doctor-patient interactions, the workload of physicians, the administration of hospitals and practices, medical research, and education in medicine.
Most of these outcomes are expected to be beneficial, enhancing efficiency, diminishing errors, alleviating the nationwide shortage in primary care, contributing data more thoroughly to decision-making, lessening administrative loads, and creating opportunities for longer, more substantial one-on-one interactions.

Adam Rodman, assistant professor at Harvard Medical School and physician at Beth Israel Deaconess Medical Center
“The optimist in me hopes that AI can enhance our capabilities as doctors to provide better care for our patients.”
transcript
Transcript:
ADAM RODMAN: I am fascinated by metacognition, by the thought process around thought. So what intrigues me most about AI and healthcare? Well, the optimist in me wishes that AI and medicine could evolve us as physicians into enhanced versions of ourselves to better assist our patients. The ideal scenario for me is a setting where artificial intelligence engages with both me and my patients, identifying signs of implicit bias, noticing potential misjudgments, and, more crucially, relaying that information back to me so that I might improve over time, evolving into a better individual. My concern actually correlates with this notion. These are remarkably powerful reasoning technologies, and truly, what constitutes medical education other than a means to mold the medical intellect? My anxiety is that these potent technologies will bypass numerous conventional pathways through which we know doctors develop and improve, potentially resulting in generations of physicians who lack optimal thinking skills. I don’t believe this is an inevitability, but it is certainly a concern of mine regarding the trajectory of our profession.
Nonetheless, there are substantial worries as well.
Current datasets frequently embody societal biases that perpetuate disparities in access and quality of care for underprivileged groups. If unaddressed, these data risk entrenching existing biases into increasingly powerful AI technologies that may
increasingly shapes the manner in which healthcare functions.
Another significant concern, specialists argue, is that AIs are still susceptible to “hallucination,” fabricating “facts” and presenting them as if they are genuine.
Moreover, there’s a risk that the medical field may not be sufficiently daring. The newest AI has the capability to overhaul healthcare from the ground up, but only if granted the opportunity. Misguided priorities — excessive respect for established interests, a concentration on profit rather than wellbeing — could easily diminish the AI “revolution” to a lackluster attempt at superficial modifications.
“I believe we find ourselves in a peculiar position,” Rodman commented. “We proclaim, ‘Wow, the technology is incredibly potent.’ But what actions do we take to truly transform situations? My concern, as both a practitioner and a researcher, is that if we don’t envision grand changes, if we don’t re-examine how we have structured medicine, things may not evolve significantly.”

Reinforcing the ‘unstable structure’
Five years ago, when inquired about AI in healthcare, Isaac Kohane expressed exasperation. Teenagers engrossed in social media applications were better prepared than many medical professionals. The current scenario is starkly different, he claims.
Kohane, head of the Medical School’s Department of Biomedical Informatics and editor-in-chief of the New England Journal of Medicine’s new AI initiative, describes the capabilities of the latest models as “astonishing.” To underline this point, he recalled having an early preview of OpenAI’s GPT-4. He evaluated it using a complicated case — a child born with indistinct genitalia — that could have perplexed even a seasoned endocrinologist. Kohane consulted GPT-4 about genetic origins, biochemical processes, subsequent steps in the evaluation, and even the appropriate information to share with the child’s parents. It performed excellently.
“This advanced language model was not specifically designed to function as a physician; it’s merely trained to forecast the following word,” Kohane remarked. “It could articulate as fluently about wine pairings with a vegetarian feast as it could diagnose a complicated patient. It was genuinely a remarkable leap from anything that anyone in the realm of computer science, who was honest with themselves, would have anticipated in the upcoming decade.”

Isaac Kohane, chair of Harvard Medical School’s Department of Biomedical Informatics and editor-in-chief of the New England Journal of Medicine’s new AI journal
“Having an immediate second opinion following any interaction with a clinician will positively alter the dynamics of the doctor-patient relationship.”
transcript
Transcript:
ISAAC KOHANE: I am immensely enthusiastic that AI will revolutionize the patient experience. Simply having an immediate second opinion after any consultation with a clinician will positively transform the nature of the doctor-patient relationship. Additionally, concerning what I fear might go awry, it’s that entities lacking the patient’s best interests at heart will guide the tendencies/biases or prejudices of our new AI partners.
And just in time. The U.S. healthcare system, repeatedly criticized as expensive, inefficient, and excessively focused on treatment over prevention, has been revealing weaknesses. Kohane, reflecting on a faculty member new to the department who couldn’t locate a primary care doctor, is weary of observing these issues firsthand.
“The healthcare system, which I have long claimed is flawed, is broken in quite apparent ways in Boston,” he stated. “People fret about equity issues with AI. I’m here to assert we are already facing a significant equity challenge today. Unless you have strong connections and are prepared to spend literally thousands more for concierge services, locating a timely primary care appointment will prove challenging.”
Initial fears that AI would supplant doctors have shifted to the understanding that the system requires both AI and its human workforce, Kohane indicated. Collaborating nurse practitioners and physician assistants with AI is one of several encouraging possibilities.
“It is no longer a discussion about, ‘Will AI displace doctors,’ but rather, ‘Will AI, together with a group of clinicians who may not resemble the physicians we are accustomed to, help stabilize the unstable structure that is organized medicine?’”

Creating the ideal assistant
The manner in which LLMs were introduced — to everyone simultaneously — hastened their assimilation, Kohane remarks. Physicians promptly began experimenting with monotonous yet vital tasks, such as composing prior authorization requests to insurance providers justifying the necessity of certain, typically costly, treatments.
“People simply embraced it,” Kohane noted. “Doctors were actively communicating on social media about the time they were conserving.”
Patients engaged as well, seeking virtual second opinions, similar to the child whose persistent pain was erroneously diagnosed by 17 different doctors.
over a span of three years. In the extensively covered incident, the mother of the boy inputted his medical records into ChatGPT, which proposed a condition previously unmentioned by any healthcare professional: tethered cord syndrome, wherein the spinal cord adheres within the vertebral column. When the individual moves, instead of gliding smoothly, the spinal cord is stretched, leading to discomfort. A neurosurgeon later verified the diagnosis and rectified the anatomical irregularity.
One of the anticipated advantages of utilizing AI in clinical settings, naturally, is to enhance the proficiency of doctors during the initial assessment. Increased and quicker access to patient histories, suggested diagnoses, and additional information is projected to elevate physician capabilities. Nonetheless, considerable challenges persist, as a recent study indicates.
A study published in JAMA Network Open in October contrasted diagnoses made by a single physician, a physician utilizing an LLM diagnostic instrument, and the LLM independently. The findings were unexpected, revealing negligible enhancement in accuracy for the doctors employing the LLM — 76 percent compared to 74 percent for the solo physician. More astonishingly, the LLM alone performed best, achieving a score 16 percentage points greater than the physicians working independently.
Rodman, one of the senior authors of the paper, noted that while it might be tempting to conclude that LLMs offer limited assistance to physicians, it is crucial to analyze the results more thoroughly. He stated that only 10 percent of the doctors were experienced users of LLMs prior to the study, which occurred in 2023, with the remainder receiving merely rudimentary instruction. As a result, when Rodman later examined the transcripts, most utilized the LLMs for fundamental fact-checking.
“The optimal way a physician could currently use it is for a secondary opinion, to challenge their own conclusions when faced with a complex case,” he remarked. “How might I be mistaken? What am I overlooking? What supplementary questions should I engage with? These are the strategies, we understand from psychological research, that complement human thought processes.”
Among the additional prospective advantages of AI is its ability to enhance the safety of medical practices, as stated by David Bates, co-director of the Center for Artificial Intelligence and Bioinformatics Learning Systems at Mass General Brigham. A recent investigation conducted by Bates and his team revealed that as many as one in four consultations at Massachusetts hospitals result in some form of patient injury. Numerous such occurrences can be traced back to negative drug-related incidents.
“AI should be capable of identifying medication-related problems and recognizing them much more precisely than our current capabilities allow,” commented Bates, who additionally serves as a professor of medicine at the Medical School and of health policy and management at the Harvard T.H. Chan School of Public Health.

David Bates, co-director of the Center for Artificial Intelligence and Bioinformatics Learning Systems at Mass General Brigham
“AI has a propensity to fabricate information, which is concerning, as we wish to avoid inaccuracies in individuals’ records.”
transcript
Transcript:
DAVID BATES: AI holds considerable promise. Burnout is pervasive in various sectors of medicine, particularly in primary care, and artificial intelligence can accelerate many routine tasks like documentation. Ambient scribes, in particular, are already facilitating this process. Nonetheless, there are reservations about potential failures. There are several ways that any savings in time could simply be used to amplify physician workloads. Ensuring the accuracy of medical records is also crucial, and AI has a tendency to fabricate information, which is concerning, as we wish to avoid inaccuracies in individuals’ records.
Another possibility arises from AI’s increasing proficiency in a routine area: note-taking and summarization, as noted by Bernard Chang, dean for medical education at the Medical School.
Systems for “ambient documentation” are expected to soon be capable of listening in on patient consultations, documenting everything that transpires, and generating an organized clinical note in real-time. As symptoms are discussed, the AI can propose diagnoses and treatment strategies. Subsequently, the physician can assess the summary for its accuracy.
Automating notes and summaries would serve healthcare professionals in multiple ways, Chang indicated. It would alleviate the data entry burden that is often cited as a contributor to burnout, and it would transform the doctor-patient dynamic. One of the most common grievances from patients regarding office consultations is seeing the physician focused on the computer, asking questions and noting the responses. By being relieved from the note-taking chore, physicians could interact face-to-face with patients, paving the way for deeper connections.
“It may not be the most extraordinary application of AI,” Chang remarked. “We’ve all witnessed AI accomplish something and thought, ‘Wow, that’s impressive.’ This, however, is not one of those instances. Yet, this program is being tested in various outpatient practices nationwide, and the initial outcomes are very encouraging. Physicians who feel overwhelmed and exhausted are beginning to express, ‘You know what, this tool is going to assist me.’”

Bernard Chang, Dean of Medical Education at Harvard Medical School
“I regard AI as a revolutionary instrument comparable to the advent of the internet concerning its influence on medicine and medical education.”
transcript
Transcript:
BERNARD CHANG: What invigorates me most regarding AI’s potential within medicine is that these technological innovations will enable healthcare providers to devote increased attention to the human dimensions of their practice, which is critically important, while enhancing the ability to rapidly obtain information, scrutinize extensive volumes of vital data, and establish the challenging connections imperative for contemplating rare diagnoses, less apparent treatment strategies, and ultimately the best possible care for patients. In medical education, learners can utilize AI tools to hasten their educational progress and advance swiftly from basic practice to advanced levels of cognitive evaluation on their journey to becoming exceptional physicians of the future. Whether proceedings unfold positively resides in our control. We must exercise caution regarding hallucinations and false information, bias, a dilution of fundamental learning, and an excessive dependence on technology. As a society, we must remain conscious of the environmental repercussions of the high energy demands involved. Overall, I perceive AI as a revolutionary instrument which is equivalent to the emergence of the internet regarding its impact on medicine and medical education.

The bias dilemma
Despite their capabilities, LLMs are not yet suitable to operate independently.
“The technology does not possess the adequacy to reach a safety threshold wherein a knowledgeable individual is not required,” Rodman remarked. “I comprehend where issues may have arisen. I can take an additional step with the diagnosis. I am capable of doing this because I learned from challenging experiences. During my residency, numerous mistakes were made, yet I gained knowledge from them. Our present system is remarkably inefficient, but it cultivates critical thinking. When medical students engage with systems that can automate those processes — even if they tend to outperform humans on average — how will they acquire knowledge?”
Healthcare professionals and researchers also express concerns regarding inaccurate information. Predominant data bias originates from biomedicine’s foundations in affluent Western countries, whose scientific narrative was predominantly shaped by white males studying white males, states Leo Celi, an associate professor of medicine and a physician in the Division of Pulmonary, Critical Care and Sleep Medicine at Beth Israel Deaconess Medical Center.

Leo Celi, associate professor of medicine and physician at Beth Israel Deaconess Medical Center’s Division of Pulmonary, Critical Care and Sleep Medicine
“We should create human-AI systems, rather than simply developing algorithms. We must foresee how humans will err.”
transcript
Transcript:
LEO CELI: AI could serve as the Trojan horse we’ve been anticipating to reconstruct systems from the ground up. I refer to systems for knowledge acquisition, healthcare provision, and education, all of which are significantly flawed. The objective of AI is to enhance our capabilities as critical thinkers, by placing data at the forefront and illuminating the breadth and depth of the challenges distinctly. However, we are required to design human-AI systems, not just construct algorithms. We must have the ability to foresee human errors. The designs should draw parallels with systems established for aviation, road safety, space exploration, and nuclear energy generation. We require experts in psychology,
“`
“You ought to comprehend the information prior to constructing artificial intelligence,” Celi remarked. “This provides us with a fresh outlook on the design shortcomings of outdated healthcare delivery systems and antiquated medical education frameworks. It becomes evident that the current state is remarkably poor — we acknowledged it was lacking and have come to accept it as a flawed system — that all the projected advantages of AI will crumble unless we fundamentally redesign our world.”
Celi referenced studies on inconsistencies in care between patients who speak English and those who do not during hospitalizations for diabetes. Non-English speakers receive fewer wake-up calls for blood sugar monitoring, increasing the chances of missed changes. This effect, however, is obscured because the data appears to be unbiased at first glance, yet it remains incomplete, contributing to unequal care.
“They receive one or two blood-sugar checks as opposed to the 10 checks that proficient English speakers get,” he noted. “When averaged out, the algorithms fail to recognize this data disparity. There is a significant amount of missing context that specialists may not identify as ‘data artifacts.’ This results from a societal bias inherent in the data generation process.”
Bates provided further illustrations, including a skin cancer detection device that struggles to identify cancers in individuals with darkly pigmented skin and a scheduling algorithm that inaccurately anticipated higher no-show rates for Black patients, causing overbooked appointments and extended wait periods.
“Most medical professionals are unaware that every medical apparatus we utilize is, to an extent, biased,” Celi stated. “They do not perform uniformly across all demographics because we typically prototype and optimize them on college-aged, white males. They have not been fine-tuned for an elderly ICU patient with numerous comorbidities, so what justifies the belief that the metrics they reflect are objective truths?”
The revelation of profound biases within outdated systems offers a chance to rectify these issues, Celi indicated. Consequently, an increasing number of researchers are advocating for clinical trials to include diverse populations from various geographical areas.
An example is Beth Israel’s MIMIC database, which mirrors the varied patient demographics of the hospital. The resource, managed by Celi, provides researchers with de-identified electronic medical records—notes, images, and test results—in an open-source format. It has been utilized in 10,000 studies by researchers globally and is set to expand to an additional 14 hospitals, he mentioned.

Era of Agility
Similar to clinical settings, AI models implemented in laboratories aren’t infallible, but they are paving the way for significant advancements in scientific research.
“They deliver immediate insights at the atomic level for certain molecules which remain experimentally inaccessible or would require a vast amount of time and resources to explore,” stated Marinka Zitnik, an associate professor of biomedical informatics at the Medical School. “These models provide in-silico predictions that are accurate, which scientists can build upon and utilize in their research endeavors. This indicates to me the remarkable phase we are currently experiencing.”
”It is becoming increasingly crucial to create dependable and accurate benchmarks or methods that enable us to assess how effectively AI model outputs perform in real-world scenarios.”
Marinka Zitnik
Zitnik’s laboratory has recently launched Procyon, an AI model focused on bridging knowledge gaps surrounding protein structures and their biological functions.
Until now, it has been challenging for researchers to grasp a protein’s conformation — how the elongated molecules fold and twist upon themselves in three-dimensional space. This understanding is vital because the folds and bends reveal certain parts of the molecule while concealing others, thus influencing the molecule’s chemical interactions.

Marinka Zitnik, assistant professor of biomedical informatics
“Insights from research labs don’t always translate into effective treatments, and AI could amplify this gap if it’s not designed to bridge it.”
transcript
Transcript:
MARINKA ZITNIK: I am particularly enthusiastic about AI’s capability to learn and innovate autonomously, rather than merely analyzing existing information. AI can develop new concepts, reveal concealed patterns, and suggest solutions that humans might overlook. In the fields of biomedical research and drug formulation, this implies that AI could create novel molecules, foresee how these molecules interact with biological systems, and tailor treatments to patients with enhanced accuracy. By consolidating information across genetics, proteins, and clinical results, AI can hasten discoveries in ways that were once unattainable. A significant challenge, however, is that AI models often concentrate on issues that have already been thoroughly examined while other critical areas receive insufficient attention. If we are not vigilant, medical breakthroughs may…
become focused in well-known domains, while other factors remain inadequately explored, not due to their lesser significance, but because of a scarcity of existing understanding to direct AI systems. Additionally, AI-enhanced drug formulation and therapy suggestions often hinge on experimental results obtained in research laboratories, which may not entirely reflect the intricacies of actual patients. Findings from research establishments do not always convert to successful treatments, and AI could exacerbate this disparity if it is not devised to overcome it. The potential lies in developing AI that uncovers new insights and ensures these insights foster substantial progress, introducing innovation to the sectors where it is most required.
Currently, forecasting a protein’s conformation — down to nearly each atom — based on its established sequence of amino acids is achievable, Zitnik stated. The primary hurdle is correlating those configurations with their roles and characteristics across diverse biological contexts and ailments. Approximately 20 percent of human proteins have vaguely defined roles, and an astonishing proportion of research — 95 percent — is focused on merely 5,000 well-explored proteins.
“We are tackling this void by correlating molecular sequences and structures with functional annotations to predict protein characteristics, moving the discipline closer to being able to in-silico forecast functionalities for each protein,” Zitnik stated.
A long-term aspiration for AI in the laboratory is the creation of “AI researchers” that operate as research support, equipped with access to the comprehensive corpus of scientific writings, the capability to amalgamate that knowledge with experimental outcomes, and the potential to propose subsequent actions. These systems could mature into genuine partners, Zitnik remarked, pointing out that some models have already formulated basic hypotheses. Her laboratory utilized Procyon, for instance, to discover domains in the maltase glucoamylase protein that interact with miglitol, a medication prescribed for Type 2 diabetes. In an alternative endeavor, the group demonstrated that Procyon could functionally annotate inadequately characterized proteins associated with Parkinson’s disease. The extensive variety of the tool’s functionalities is feasible because it was trained on extensive experimental databases and the full scientific literature, resources far surpassing what humans can read and scrutinize, Zitnik noted.
The classroom precedes the laboratory, and the AI paradigm of adaptability, creativity, and continuous learning is also being implemented in educational settings. The Medical School has rolled out a course addressing AI in healthcare; incorporated a Ph.D. specialization on AI in medicine; is developing a “tutor bot” to offer supplementary resources beyond lectures; and is creating a virtual patient for students to practice on before their initial nerve-wracking interaction with real patients. Simultaneously, Rodman is spearheading a steering committee on the application of generative AI in medical education.
These undertakings mark a solid foundation, he remarked. Nonetheless, the swift advancement of AI technology complicates efforts to equip students for careers that will extend over 30 years.
“The Harvard perspective, which aligns with my stance as well, is that we can provide individuals with fundamental knowledge, but we must also foster adaptability and prepare them for a future that evolves rapidly,” Rodman asserted. “Perhaps the most effective strategy we can adopt is to ready people to anticipate the unforeseen.”