AI Training Jobs for Doctors and Medical Experts (2026)

Medical experts are one of the most in-demand contributor pools for AI training work in 2026. Frontier AI labs are training models that need to handle medical reasoning safely, and the only people qualified to evaluate that output rigorously are practicing clinicians and researchers. The result: practicing physicians regularly earn $150–$250/hr on the highest-paying platforms, with the high end approaching $300/hr for narrow specialties.

This is the practical breakdown for any medical professional considering AI training as a side income or transition: platforms that hire, what the work looks like, what the verification looks like, and the licensing and disclosure considerations that aren't obvious from the platform sign-up flow.

Who qualifies as a "medical expert"

Demand varies by credential. In rough order of pay and availability:

Board-certified physicians in any specialty (M.D. or D.O.) — highest demand and pay. Top-paying platforms place you in your specific specialty.
Residents in PGY-3 or higher — accepted by most major platforms; pay slightly below attending rates.
Medical specialists with PhDs (M.D./PhDs, pharm.D., epidemiology PhDs) — heavily recruited for research- oriented evaluation work.
NPs, PAs — accepted by most platforms at mid-tier pay; especially valued in primary care and well-defined specialties.
RNs with specialty credentials (CRNA, NICU, oncology) — accepted by some platforms; pay varies by specialty depth.
Pharmacists — solid niche on most platforms; high demand for drug-interaction and dosing evaluation tasks.
Medical students (M1–M4) — accepted on a few platforms at lower pay; useful for building credentials.

Platforms hiring medical experts in 2026

Mercor

Mercor places practicing physicians on evaluation engagements with frontier labs and large healthcare/pharma customers. Specific work includes: evaluating LLM responses to clinical vignettes, red-teaming for safety, scoring differential diagnoses, and reviewing model-generated treatment plans against evidence-based guidelines. Pay range for physicians: $150–$250/hr typical, with specialty premiums for in-demand fields (cardiology, oncology, radiology, infectious disease, psychiatry).

Handshake AI Fellowship

Targets practicing physicians and senior medical researchers. Engagement-style work over 4–12 week periods evaluating model output in a specific medical domain. Pay tops out around $125/hr; the appeal is the fellowship structure (steady hours, defined deliverables, often co-authorship credit on resulting papers).

AfterQuery

Has a dedicated "Medical" task track at $80–$150/hr depending on credential level. Work is more task-oriented than engagement-based — you accept individual evaluation tasks through a queue rather than being placed with a single client for weeks. Good for irregular schedules.

micro1

Lower per-hour for medical experts ($60–$120/hr typical) but more sustained hours and lower screening friction. Useful as a base layer of consistent income while higher-paying Mercor placements come and go.

Welo Data, Outlier

Both list medical SME work occasionally but pay materially lower ($30–$80/hr typical). Worth applying to but not where the real medical-expert money is.

xAI, OpenAI, Anthropic direct

The frontier labs occasionally post direct contractor roles for medical expert evaluators, usually with a 3–6 month commitment and rates competitive with Mercor's top tier. These are scarce and competitive but pay the most when available. Watch for "medical evaluator" or "clinical expert" listings directly on their careers pages.

What the work actually looks like

The specifics vary by client, but most medical evaluation work falls into a few categories:

Clinical scenario evaluation

You're shown a patient vignette and the model's response — diagnosis, differential, recommended workup, treatment plan, patient communication. You rate the response on multiple axes (clinically accurate? safe? complete? appropriate level of confidence? would you co-sign this if a resident wrote it?). Usually 5–15 minutes per vignette; you complete a batch in a sitting.

Red-teaming / safety evaluation

You construct adversarial prompts designed to elicit unsafe model behavior — incorrect drug dosing, missed contraindications, inappropriate medical advice that could harm patients. You then evaluate how the model responded. This work is high-value because labs use it to harden the model before deployment to healthcare customers.

Reasoning trace evaluation

The model produces a step-by-step clinical reasoning chain; you evaluate each step for accuracy. This is more demanding than yes/no scoring — you're essentially grading the model's thinking process. Pays well because few clinicians can do it quickly and accurately.

Specialty board-prep style evaluation

Questions written in the style of USMLE Step 2 or specialty board exams; the model attempts an answer; you score it. Used for training models that need to perform on standardized medical reasoning. Steady, predictable work; lower pay than red-teaming.

Comparative evaluation

Two model responses to the same clinical scenario; you pick the better one and explain why. Used in RLHF training pipelines. Quick (2–5 min per pair), high-volume, generally lower-paid but accumulates fast.

Pay specifics

Aggregating across platforms, in May 2026:

Board-certified attending in high-demand specialty: $175–$250/hr
Board-certified attending in general specialty: $125–$175/hr
Resident (PGY-3+): $90–$140/hr
M.D./PhD in clinical research: $150–$225/hr
NP/PA: $70–$120/hr
RN with specialty credential: $50–$90/hr
Pharmacist (PharmD): $80–$130/hr

Specialties currently commanding the highest premiums (frontier labs are training models for them): cardiology, oncology, radiology, infectious disease, psychiatry, emergency medicine, and any subspecialty involving complex differential diagnosis.

Verification and credentials

Higher-paying platforms verify your credentials before placing you on medical work. Expect to provide:

State medical license number (verified against state board)
NPI number (verified against CMS records)
Board certification status (verified against ABMS/AOA)
Current malpractice insurance (sometimes — varies by platform)
For residents: program letter or institutional verification

This typically takes 2–7 business days post-acceptance. Some platforms also require an NDA and conflict-of-interest disclosure (e.g., you can't evaluate model output for a drug you have a financial interest in).

Critical: what this work is NOT

AI training work is not patient care and is not subject to most clinical regulatory frameworks. You are evaluating model output, not treating patients. That said:

You are not providing medical advice through the platform. The model's output goes to AI developers, not patients. You're a domain consultant.
The work is not "moonlighting" in the regulatory sense most medical employers define it. Most employment contracts explicitly allow consulting work; AI evaluation typically falls under that umbrella.
HIPAA generally does not apply because you're not handling PHI — the clinical scenarios are synthetic. If a platform asks you to evaluate real patient data, that's a different situation requiring HIPAA-compliant data handling.
You typically don't need additional malpractice coverage for this work, since you're not treating patients. Read your existing policy for clarity.

Disclosure to your primary employer

Most academic medical centers and hospitals require disclosure of outside consulting income. AI training work is consulting income. Check your institution's specific policy:

Most institutions allow consulting up to 1 day per week (typically 20% effort) without special permission, with disclosure required.
Some institutions require advance approval for any consulting; the approval is usually routine for AI training work since there's no patient-care or research conflict.
Faculty at AMCs typically must report outside income annually; 1099-NEC income from these platforms is straightforward to report.
If you're salaried military or VA — check your specific regulations. Some federal physicians cannot consult; others can with limitations.

Practical schedule

Most physician contributors describe a sustainable schedule as ~5–10 hours/week. Two evenings or a Saturday morning. At $150–$200/hr, that's $40–$80K/year of consulting income on top of clinical salary, with no patient care, no on-call, no documentation burden, no malpractice exposure. The reason this is a popular side income for physicians is that the work is genuinely intellectually engaging (you're red-teaming the AI that may eventually be helping you diagnose patients) without the burnout-driving aspects of clinical practice.

Tax considerations

Income from these platforms is 1099-NEC contracting income. Standard self-employment tax (15.3%) plus federal/state income tax on top. Quarterly estimated payments are required if you owe more than $1,000 in unpaid tax for the year. See our AI training taxes guide for the full setup. Physicians often have complex tax pictures already (loan repayment programs, retirement contributions, AMT considerations); we strongly recommend running this income past your existing CPA rather than DIY-ing.

How to start

Order of operations:

Apply to Mercor first. Highest pay ceiling, hardest interview, best fit if you have board certification.
Apply to Handshake AI in parallel for fellowship-style work.
Apply to AfterQuery and micro1 as backup pipelines.
Don't bother with the lower-paying generalist platforms unless your specialty is unusual or your credentials are still building.

For the application playbook, see our getting-accepted guide. For the interview format specifically, see our AI training interview guide. For pay context across the full market, see our AI training pay breakdown.