Most people think you need to be a full-time OpenAI / Anthropic / Google DeepMind researcher to work on frontier ML problems. That's been less true every year, and in 2026 it's pretty thoroughly false. The labs themselves outsource a meaningful slice of their evaluation, dataset-curation, and specialized research work to PhD-level contractors via platforms like AfterQuery, Mercor, and micro1.
The catalog of "research-level" gigs in 2026 looks nothing like the generic "data annotation" work of 2023. These are tightly scoped contracts where the labs need someone who can read a paper, propose an experiment, and produce a write-up that holds up to internal review. Pay reflects that — $100–$250/hr is normal for the right credential mix, $300+ isn't rare.
Here's what's actually open right now, organized by research area.
Mechanistic interpretability (LLMs)
The fastest-growing research category. Labs are racing to understand what's happening inside their models — circuits, induction heads, feature visualization, sparse autoencoders. Contracted research roles typically involve replicating recent papers on smaller models, probing specific circuits, and writing detailed walkthroughs of model internals.
Background that lands the work: PhD in ML / cognitive science / related, with mechanistic-interpretability publications or demonstrable code reproductions (e.g. on the Apollo Research, Anthropic interpretability, or EleutherAI workstreams).
World models & model-based RL
Latent-dynamics learning, video-prediction architectures, model- based planning. The applied side is training data + evaluation for agents that need to simulate consequences of their actions in physical or game environments. Hot area for robotics + game AI applications.
Background: PhD in ML / robotics / control theory, RL publications. Familiarity with Dreamer, MuZero, EfficientZero, or recent world-model architectures.
Robot transfer learning (PhD-required)
Sim-to-real transfer, cross-embodiment learning, domain randomization. Labs working on general-purpose robot policies need contributors who've done the actual sim-to-real work, not just read about it.
Background: PhD in robotics / ML, hands-on experience with physics simulators (MuJoCo, IsaacSim, Brax), robot-deployment publications.
Chip design ML
ML for chip placement, routing, RTL synthesis, and verification — the Google "Chip Design with Deep RL" line of work and its follow-ons. EDA + ML hybrid backgrounds get unusual leverage here because the supply is genuinely tiny.
Background: chip-design industry experience (Synopsys, Cadence, an EDA team at a hyperscaler) OR an ML PhD with hardware- adjacent publications. Cash rates are some of the highest in the contract-ML market.
Image generation & diffusion models
Two related but distinct workstreams. Image-generation work focuses on prompt-following evaluation, compositionality, long-form scene coherence. Diffusion-model work goes deeper — scheduling, conditioning, distillation, flow-matching variants.
Background: ML PhD or industry researcher with diffusion / score-matching / DDPM publications or production deployments.
Image generation role → · Diffusion models role →
Atmospheric, climate, and geologic modeling
Earth-system science meets ML. The labs are training agents for weather prediction, climate downscaling, attribution studies, and subsurface (oil/gas/geothermal) modeling.
Background: PhD in atmospheric science / climate / geology / Earth-system modeling with computational publications. WRF, CESM, MPAS, or basin-modeling experience plus ML fluency.
Atmospheric modeling → · Climate modeling → · Geologic modeling →
Neuromorphic computing
Spiking neural networks, event-driven computation, hardware- software co-design for Intel Loihi / IBM TrueNorth / academic chips. Tiny field, very high rates for the right credential.
Background: PhD in neuromorphic computing, computational neuroscience with hardware experience, or chip-architecture research with SNN publications.
Network science & graph ML
Graph neural networks, dynamical-network models, community detection at scale, spreading dynamics. Labs working on recommendation systems, fraud detection, biological networks, and social-graph reasoning use this expertise.
Background: PhD in network science, applied math, or ML with graph-related publications.
Domain-bridging ML roles
A growing class of contracts that combine an academic domain with ML — labs need contributors who can speak both languages fluently. Currently open:
- Computational materials science — DFT, MD simulations, property prediction at scale. Apply →
- Finance NLP — earnings calls, filings, news. ML engineering background plus finance fluency. Apply →
- Development economics + ML — RCTs, impact evaluation, policy modeling. Apply →
- Computational behavioral modeling — agent models, cognitive simulations. Apply →
What pay actually looks like
Frontier-ML research contracts cluster in three tiers:
- Tier 1 — ultra-niche (chip design ML, neuromorphic, PhD-required robotics): $150–$300/hr+
- Tier 2 — top-of-field PhD work (mechanistic interpretability, world models, advanced diffusion): $100–$200/hr
- Tier 3 — established ML research with strong publications: $80–$150/hr
Volume is steady but not huge — frontier projects have small cohorts (typically 5–30 contributors per workstream) and labs rotate cohorts every 3–6 months. Expect to be in active rotation across 1–2 labs at a time if you land in the upper tiers.
How to land it
For all of the above, the application path is similar — submit on the platform, get a 30–45 minute technical interview with an AI-led screener, then a deeper interview with a human researcher from the lab. The detailed playbook for the AI-screen is in the AI training interview guide.
Lead with publications (Google Scholar URL helps), code artifacts (GitHub repos, paper-replication notebooks), and specific tools you've used in deployment (not just coursework). The bar for these tiers is the same bar industry research labs use for their own hires — frame everything accordingly.
Adjacent reading: the PhD-track AI training guide covers the broader market context, and the four-way featured platforms roundup compares Mercor, micro1, Turing, and Handshake AI side-by-side for high-credential work.
