Mirrai Careers
Resume BuilderCareer Test
InsightsPricing
Get Started Free
Jobs/Senior Software Engineer - AI Interaction Evaluator (Codex / Claude Code, up to $200/hr)

Senior Software Engineer - AI Interaction Evaluator (Codex / Claude Code, up to $200/hr)

g2i

Miami Remote Contract Posted 2w ago
Apply on company site
SENIOR AI INTERACTION EVALUATOR (CODEX / CLAUDE CODE) Contract | $50-200/hr | 10+ hrs/week | Project-based Roles open on a rolling basis - apply to join the talent bench and we’ll reach out when one matches. Expect 40+ hrs once a project starts; timing depends on availability, but we move people in at the earliest genuine opportunity. These roles are currently filled but we hire on a rolling basis as new projects open up. Apply now to join our talent bench — qualified candidates will be contacted directly when roles become available. Check out this Loom video for more details! We’re looking for highly experienced software engineer (SR+) to help evaluate the quality of interactions with modern coding agents such as OpenAI Codex and Claude Code. This is not a traditional engineering role. You won’t be writing production code. You’ll be evaluating something harder: whether the model thinks like a great engineer. WHAT THIS ROLE ACTUALLY IS You will assess how AI coding agents behave in real-world scenarios — focusing on: * Whether the response makes sense * Whether the preamble and reasoning are useful * Whether the output reflects strong engineering judgment * Whether the interaction feels right to an experienced developer This role is about engineering taste — not syntax correctness. WHAT YOU’LL BE DOING * Evaluate AI-generated coding interactions end-to-end * Judge whether outputs are: * Useful * Correct (at a high level) * Aligned with how a strong engineer would think * Assess the quality of explanations and reasoning, not just code * Distinguish between different levels of response quality (e.g. what makes something a 2 vs 4) * Provide clear, opinionated feedback on: * What worked * What didn’t * What felt “off” or misleading * Help define what great looks like when interacting with tools like Cursor WHAT WE MEAN BY “TASTE” We’re specifically looking for engineers who can answer questions like: * Does this feel like something a strong engineer would actually say? * Is this explanation helpful, or just technically correct? * Is the model guiding the user well, or just dumping output? * Would this interaction build or erode trust? You should be comfortable making subjective but rigorous judgments. WHO YOU ARE * Staff / Principal-level engineer (or equivalent experience) * Strong background in one of the below: * TypeScript / JavaScript * Python * Hands-on experience using: * OpenAI Codex * Claude Code * Cursor * Deep familiarity with modern AI-assisted dev workflows * Able to evaluate code without needing to fully execute or deeply review every line * Comfortable giving direct, opinionated feedback * High bar for what “good engineering” looks like NICE TO HAVE * Experience with tools like Cursor or similar AI-first IDEs * Prior exposure to prompt design or evaluation workflows * Experience mentoring senior engineers or defining engineering standards ENGAGEMENT DETAILS * US and Canada up to $200/hr * EU and Latam up to $150/hr * Other locations up to $100/hr * Hours: ~10–20 hours/week * Duration: Ongoing — project-based * Process: * Take-home evaluation exercise * One behavioral interview

See how well you match this job

Upload your resume and we’ll score your fit for this role and 6 similar roles — then tailor your CV to it with AI. Free, no credit card.

Check your match

Similar jobs

  • Senior/Staff AI Engineer — Agentic Development (Next.js/Node.js)

    g2i

    Remote
  • Data Annotation Specialist, Software Engineering

    Cohere

    Remote
  • Software Engineer - AI Trainer - Remote - 8/20 hrs week - Freelance

    10xteam

    Remote€125k–€229k
  • Principal Software Engineer - AI Trainer - Freelance - 8-20hrs/week - Remote

    10xteam

    Remote€193k–€331k
  • Applied AI Engineer, Codex Core Agent

    OpenAI

    San Francisco$230k–$385k
  • Full Stack Software Engineer, Codex

    OpenAI

    Remote$255k–$405k
Apply on company site

Want more roles like this? Browse fresh jobs or tailor your resume with AI.

Mirrai Careers

AI-powered career platform: build resumes, match jobs, and plan your career.

Product

  • All Tools
  • Resume Builder
  • Career Test
  • Pricing

Legal

  • Privacy Policy
  • Terms of Service
  • Fair Use Policy

Company

MIRRAI CHAT LTD (Company No. 16403306)

71-75 Shelton Street, Covent Garden

London, WC2H 9JQ, UNITED KINGDOM

[email protected]

© 2026 Mirrai Careers. All rights reserved.