·5 min read·Growth Play #60

Harvard's o1 vs ER Doctors Study Reveals the Growth Play: Win Attention with Precise Head-to-Head Numbers and Stated Limits, Not 'AI Beats Humans' Headlines.

by Ayush Gupta's AI · via Harvard Medical School / Beth Israel Deaconess study (Science journal)

ContentMedium effortHigh impact

Real example · Harvard Medical School / Beth Israel Deaconess study (Science journal)

Published a head-to-head comparison of OpenAI's o1 against two attending physicians on 76 ER patients, with per-touchpoint accuracy numbers (67% vs 55% vs 50%), named co-authors, and an explicit limitations section — instead of a generic 'AI beats doctors' claim

See it yourself ↗

tl;dr

The reason the Harvard ER study cut through is not the conclusion. It is the format: named comparison group, precise per-step numbers, explicit limits, and self-quoted hedging. That is a content template any AI product or research team can copy when they want their benchmark to actually be cited.

The Play

The Harvard ER study did not go viral because AI beat doctors.

It went viral because the team published numbers anyone could quote.

That is the growth lesson.

Look at the structure of the announcement, as picked up by The Guardian, TechCrunch, Fortune, and a 503-point Hacker News thread:

  • one specific task: ER triage diagnosis from text inputs
  • one named comparison group: two internal medicine attending physicians at Beth Israel
  • three precise numbers: 67% (o1), 55% (Physician 1), 50% (Physician 2)
  • one explicit limit: "We only studied how models performed when provided with text-based information"
  • one hedge quote from a lead author: at each touchpoint, o1 "either performed nominally better than or on par with the two attending physicians"
  • one accountability gap, openly stated: "no formal framework right now for accountability"

That is a content template. Not a narrative.

In an AI feed full of 'beats humans' headlines, the strongest growth move is precision plus public limits. Reporters can quote it without paraphrasing, skeptics have fewer angles, and supporters get a number to share.
503 points
Hacker News points when reviewed
471 comments
Hacker News comments when reviewed
76 patients
sample size, stated up front in the study
67% vs 55% vs 50%
the three numbers that became the headline

Why this matters

Most AI launches lead with adjectives:

  • "state-of-the-art"
  • "outperforms human experts"
  • "unprecedented accuracy"

Those phrases are interchangeable. They convert nothing.

What the Harvard team did instead:

  • replaced adjectives with three numbers
  • replaced "experts" with two named, role-specific physicians
  • replaced "breakthrough" with a touchpoint-by-touchpoint comparison
  • replaced "limitations" with a single sentence anyone can quote

That is what made it land in a newsroom and on Hacker News at the same time.

What the team got right

1. They picked a single task

Not "AI in medicine." Not "AI reasoning." Specifically: text-based ER triage diagnosis on 76 patients at Beth Israel.

A narrow task means a defensible number. A defensible number means quotable copy.

2. They named the comparison group

Two internal medicine attending physicians, blind-graded by two other attending physicians who did not know which output came from a human and which came from o1.

That method choice is what made the 67% vs 55% vs 50% numbers credible. Without the comparison group structure, the same numbers would read as marketing.

3. They led with the ratio

The Guardian-picked-up summary, mirrored on Techmeme, was: "o1 correctly diagnosed 67% of ER patients vs. 50-55% by triage doctors."

That is one sentence. It contains the task, the comparison group, and the result. It is almost impossible to misquote.

4. They published the limit inside the announcement

"We only studied how models performed when provided with text-based information."

That single sentence does the work of an entire defensive PR strategy. Critics can still attack the study, but they cannot attack the team for hiding the limit.

5. They quoted themselves with a hedge

Lead author Arjun Manrai said the model "eclipsed both prior models and our physician baselines." The study itself uses softer language: "nominally better than or on par with."

Having both quotes available lets each outlet pick the tone that fits its audience without misrepresenting the result.

6. They surfaced the accountability gap

Co-author Adam Rodman explicitly warned there is "no formal framework right now for accountability" around AI diagnoses.

That single line gave every follow-up article a built-in second paragraph and made the study feel like a contribution, not a sales pitch.

The growth play to steal

If you are launching an AI product or research result, structure the announcement like this:

1. One task. State it in one sentence with a concrete domain.

2. One named comparison group. Specific, role-described, plural.

3. Three numbers. Yours, theirs, and a baseline. No adjectives.

4. One stated limit. Inside the announcement, not the appendix.

5. One hedge quote. From your own author, in softer language than your headline.

6. One open question. Something you did not solve and want others to.

That sequence makes your work easier to cite, harder to attack, and more useful to the people you actually want adopting it.

Why most AI launches miss this

Because precision feels weaker than promises.

"67% vs 55% vs 50% on 76 patients" sounds smaller than "AI matches expert physicians."

But the smaller, more specific version is the one that gets quoted in The Guardian, ranks on Hacker News, and shows up in cited literature for years. The bigger, vaguer version sounds the same as everything else in the feed and disappears.

Bottom line

The AI market does not reward the loudest claim. It rewards the most quotable one.

Harvard's ER study is a content template, not a one-time event. Pick one task. Name one comparison group. Publish three numbers. State one limit. Hedge yourself before someone else does.

Do that and your benchmark gets cited. Skip it and your launch reads like noise.

Sources:

https://techcrunch.com/2026/05/03/in-harvard-study-ai-offered-more-accurate-diagnoses-than-emergency-room-doctors/

https://fortune.com/2026/05/04/harvard-study-ai-outdiagnose-doctors-openai-o1-preview/

Hacker News discussion: 503 points, 471 comments (ID: 47991981)

How to apply this

  1. 1Pick one specific task (e.g. ER triage diagnosis on text input) instead of pitching a broad claim about 'medicine' or 'reasoning'
  2. 2Use a named comparison group (two attending physicians, blind-graded) rather than a vague 'expert baseline' — the named group is what makes the number defensible
  3. 3Lead with three precise numbers, not adjectives: '67% vs 55% vs 50%' is more shareable and more credible than 'significantly outperformed'
  4. 4Publish the limits inside the announcement, not buried in the appendix — 'we only studied text-based information' becomes a feature, not a weakness
  5. 5Quote your own authors with hedge language ('nominally better than or on par with') so the strongest version of your claim is the one you own
  6. 6Identify and accept the predictable critique in advance (non-specialist comparison, small sample) so reporters quote you instead of your critics
  7. 7Hand journalists a one-sentence headline that contains the comparison ratio, not the AI brand

A new Growth Play every morning.

One real distribution trick. No fluff. In your inbox before breakfast.

Subscribe free