Mirrai Careers
Resume BuilderCareer Test
InsightsPricing
Get Started Free
Jobs/Senior Backend Engineer — Customer Support Platform | Remote LATAM Only

Senior Backend Engineer — Customer Support Platform | Remote LATAM Only

agenticdream

Posted 2d ago
Apply on company site
The Senior Backend Engineer owns the services that keep Strata's customers unblocked — the support, exception-handling, and remediation backend that sits behind every customer-facing interaction with the Strata platform. When a tenant's purchase order fails validation, a document lands in the Expert-in-the-Loop queue, or a customer asks "what happened to my order?", the answer comes from the systems you build. This is not a generic CRUD role. Strata is an agentic operating layer that ingests business documents (POs, ACKs, Invoices, Quotes), extracts and validates them, and delivers clean data downstream to OrderBahn and ERP systems. You will build the support and operations backend around that pipeline: the exception/HITL queue services, the customer-facing status and audit APIs, the reprocessing and replay tooling support engineers use to remediate stuck documents, the ticketing/CRM integrations, and the per-tenant configuration services. Your work is measured against hard reliability and data-quality bars — ≥99.9% availability, ≤0.5 P1 incidents/week, MTTR P1 ≤30 min, and ≥99.5% field-level data accuracy — so you build for resilience, observability, and graceful failure from day one. You will work in AvantoDev's standard backend stack (NestJS/TypeScript and FastAPI/Python, PostgreSQL, AWS, SQS), integrate with the agent layer through MCP servers, and collaborate closely with the SRE team, the Context Engineering team, Customer Success, and the Head PM<!-- notionvc: a8766d6c-88d0-4c5e-a47a-663cfbfa0fcf --> What You'll Build * Backend for the Expert-in-the-Loop (HITL) queue — APIs that surface low-confidence documents, capture support/expert decisions, and resume the paused agent workflow via the SQS-backed control plane. * Reprocessing & replay tooling — services that let support safely re-run a document through the pipeline (full or targeted re-extraction), with idempotency and audit guarantees. * Exception triage APIs — classification, assignment, SLA tracking, and auto-resolution hooks (target: ≥70% auto-resolution, ≤2% exception rate). Customer-Facing Status & Audit APIs * Document lifecycle / status APIs backed by the OpenSearch state machine (FORMAT_DETECTED → PRIMARY_EXTRACTED → DOC_CLASSIFIED → SCHEMA_MATCHED → RECOVERY_EVALUATED → routing), exposing where any document is and why. * Audit-trail APIs — full, per-tenant history of every decision, confidence score, and routing action for support investigation and customer transparency. Integrations & Tenant Configuration * Ticketing / CRM integrations (e.g., support desk, customer comms) wired to pipeline events so issues are created, updated, and resolved automatically. * Per-tenant configuration services — schema/alias overrides, tolerance rules, routing thresholds, and notification preferences, exposed through governed APIs (not ad-hoc DB edits). * Delivery/bridge services between Strata and downstream systems (OrderBahn, ERP) with reconciliation and retry semantics. MCP & Agent Integration * Build and consume MCP servers (FastAPI-based) so support tooling and agents invoke the same governed capabilities (validation, lookup, reprocessing) rather than duplicating logic. What You'll Do Day-to-Day * Design and implement scalable APIs in NestJS/TypeScript and/or FastAPI/Python using Domain-Driven Design (DDD), with robust validation, auth, error handling, and OpenAPI docs. * Implement event-driven workflows over SQS (Standard + FIFO) with DLQ patterns, exponential backoff, and idempotent processing. * Model and optimize PostgreSQL schemas (Aurora) with migrations, indexing, and strict tenant isolation / row-level security. Reliability & Operability * Build every service to be observable by default — structured logs, metrics, and traces with X-Correlation-ID / X-Trace-ID propagation (100% coverage is an org KPI). * Implement health checks, circuit breakers, timeouts, retries, and graceful degradation so a downstream agent or OCR engine failure never takes down support tooling. * Write runbooks for the services you own and participate in the on-call rotation alongside SRE. Quality & Security * Maintain strong test coverage (pytest / Jest, integration tests, moto/localstack, SuperTest, e2e tests) and contribute to CI/CD via CodePipeline. * Enforce security bars: 0 critical/high vulns, per-tenant rate limiting, OAuth2/equivalent auth on 100% of endpoints, and ≥95% audit-log completeness toward SOC2 readiness. Collaboration * Partner with SRE on SLOs, dashboards, and incident response; with Context Engineering on MCP/agent contracts; and with Customer Success on what support actually needs. Minimum Qualifications * 6+ years backend engineering in production, shipping and operating real services (not just prototypes). * Strong in at least one, comfortable in both: Node.js/TypeScript (NestJS or equivalent) and Python (FastAPI). REST API design, validation, auth, and clean error handling. * Deep PostgreSQL — schema design, migrations, query optimization, indexing, and multi-tenant isolation / row-level security. * Event-driven & async patterns — message queues (SQS, Kafka or equivalent), DLQs, retries, idempotency, and designing for partial failure. * AWS proficiency — Lambda, ECS/Fargate, S3, SQS, API Gateway, RDS/Aurora. You can deploy and operate what you build. * Reliability mindset — you design for SLOs, instrument for observability (structured logs/metrics/traces, correlation IDs), and have carried a pager. * Testing discipline — unit + integration + e2e testing (pytest/Jest, moto/localstack, SuperTest), and CI/CD experience. * Security awareness — authn/authz, rate limiting, input validation, secrets management, and audit logging. * English proficiency: B2+ required (C1 preferred). You'll write docs/runbooks, join architecture reviews, and coordinate during incidents. Nice to Have * Experience building support / operations tooling — ticketing integrations, exception queues, reprocessing/replay, admin consoles. * Familiarity with the Model Context Protocol (MCP) and exposing services as agent-callable tools. * Exposure to agentic / LLM pipelines and HITL (Human-in-the-Loop) patterns (SQS-backed pause/resume). * OpenSearch / Elasticsearch for state tracking and operational queries. * Experience with ERP / order-management integrations (OrderBahn, NetSuite, or similar) and reconciliation. * Familiarity with DORA metrics and a high-deployment-frequency, low-change-failure delivery culture. * Background in commercial furniture, logistics, distribution, or manufacturing operations. * Terraform / IaC familiarity for owning your service infrastructure.

See how well you match this job

Upload your resume and we’ll score your fit for this role and 6 similar roles — then tailor your CV to it with AI. Free, no credit card.

Check your match

Similar jobs

  • Senior Backend Platform Engineer

    conversica

    Remote
  • Senior Backend Engineer

    chattermill

    Remote
  • Staff Backend Platform Engineer

    conversica

    Remote
  • Senior Back-End Software Engineer

    axcera

  • (Senior) Backend Engineer

    bunch

    Remote
  • Senior Backend Engineer

    astra

    Remote$190k–$230k
Apply on company site

Want more roles like this? Browse fresh jobs or tailor your resume with AI.

Mirrai Careers

AI-powered career platform: build resumes, match jobs, and plan your career.

Product

  • All Tools
  • Resume Builder
  • Career Test
  • Pricing

Legal

  • Privacy Policy
  • Terms of Service
  • Fair Use Policy

Company

MIRRAI CHAT LTD (Company No. 16403306)

71-75 Shelton Street, Covent Garden

London, WC2H 9JQ, UNITED KINGDOM

[email protected]

© 2026 Mirrai Careers. All rights reserved.