QA Lead
Quality Assurance
India · Remote
Firmable is the market-leading B2B sales intelligence platform in Asia pacific — and we're scaling that success globally at pace. Backed by leading investors and growing 2,000+ customers strong, we exist to give sales teams an unfair advantage: the deepest company and people data of any platform, enriched with real-time signals, served at the right moment by intelligent agents.
We're not building another search box. We're building the engine that tells a salesperson exactly who to call, why, and what to say — before they even ask.
The Role
This isn't a traditional QA role where you write test plans and chase tickets. As Data QA Lead at Firmable, you'll own data quality across our platform end-to-end — designing the systems that decide whether the data flowing in is trustworthy enough to ship.
You'll spend your time deep in Snowflake, Python, and LLMs — architecting LLM-based quality checks, building the eval harnesses that hold them accountable, designing the skills library that other agents and teammates depend on, and shipping agentic remediation pipelines where rules, LLMs, and humans each do what they're best at.
This role has deep technical ownership. You'll make the hard rule-vs-LLM calls, build the production systems behind them, and translate findings into recommendations that change what we build.
What You'll Own
Data Analysis & Product Intelligence
Investigate complex datasets using SQL, Snowflake, Python, and LLMs — directing agents to do the heavy lifting while you frame the question and judge the answer
Unlock insights in large-scale B2B datasets that shape product direction and commercial strategy
Build scalable, reusable datasets — and the skills that make them queryable by any agent or teammate
Translate findings into recommendations that move metrics, not slides that summarise what happened
Data Quality & Source Management
Own data quality end-to-end across rule-based and LLM-based checks, side by side — design the checks, run the evals, monitor drift, fix root causes
Make the rule-vs-LLM judgement call on every check: deterministic logic where rules win, LLMs where semantic, contextual, or entity-resolution nuance is needed — and justify the split
Assess and onboard new data sources: coverage, freshness, accuracy, and where LLM judges add lift over deterministic profiling
Track down the hardest data bugs and fix them at the root, partnering with engineering and product
Scrape or source supplementary data when it sharpens insights or enriches the product
AI-Powered Analysis & Automation
Architect LLM-based quality checks with explicit rubrics, structured outputs, and labelled eval sets — precision/recall measured, not vibed
Build and maintain the skills library (SKILL.md specs) that powers recurring workflows — quality checks, remediation proposals, dataset onboarding — versioned, documented, and invocable by any agent or teammate
Ship agentic remediation pipelines: triage with rules, escalate to LLMs, propose fixes, log every call (prompt version, model, cost, latency, decision), surface human-review queues
Own the eval and observability scaffolding — prompt versioning, traces, drift detection when a vendor silently updates claude-sonnet-latest, cost ceilings with token math behind them
Set the model-choice playbook — Haiku for cheap classification, Sonnet for nuanced judgement, frontier models for hard edge cases — and revise it as model economics shift
Cross-Functional Support
Be the technical point of escalation for data quality across product, engineering, and go-to-market — the person trusted when a number is questioned
Partner with product and engineering on architecture decisions where data quality is in the loop
Build the monitoring surface stakeholders operate against: DQ trend by check type (rule vs. LLM), top failing reasons surfaced by LLM judges, cost and latency over time, human-review backlog
What We're Looking For
Must Haves
6+ years in data quality, data engineering, or analytics, with a strong focus on data quality systems
Expert-level SQL and Snowflake — complex queries, performance tuning, warehouse design, daily comfort across very large datasets
Strong Python skills — pandas, numpy, scripting, automation, and production-grade code. You write systems, not notebooks.
Shipped real work with agentic IDEs — Claude Code, Cursor, or equivalent. Not "tried it" — built and merged real systems with it.
Deep, demonstrable expertise building agents, skills, and tool-calling pipelines — you've architected agent workflows, written SKILL.md specs others depend on, and built tool-calling systems running in production. You can show us the repos.
You operate LLMs as production systems — you've designed eval harnesses, run labelled eval sets, versioned prompts, logged traces, debugged judges on precision/recall, and detected drift on vendor model updates
Sharp judgement on rules vs. LLMs — you reach for deterministic logic when it's the right tool and don't default to an LLM because it feels modern
Deep knowledge of data quality principles — validation, monitoring, observability, lineage, and the instinct to chase issues from symptom to root cause across pipelines
Proven skill with BI and data visualisation — Tableau, Looker, Power BI, or equivalent
Comfort working cross-functionally with product, engineering, sales, and marketing
A product mindset — you care about how data drives customer value and revenue, not just whether the pipeline ran
Highly Valued
Experience with data warehousing and relational modelling; NoSQL familiarity a plus
Experience with web scraping frameworks and best practices
Familiarity with cloud platforms — AWS, GCP, or Azure
Experience assessing and onboarding third-party data sources at scale
Background in B2B data, entity resolution, or structured/semi-structured datasets
Understanding of data privacy and compliance considerations
How We Build
AI-Native, Not AI-Assisted
Firmable is built on an AI-native engineering philosophy — and we mean it literally. AI is not a productivity tool bolted onto traditional analyst work. AI is the workflow. Every analyst at Firmable operates with fully agentic development, evals, traces, and AI-powered review pipelines as their default mode of working.
This means:
Agentic development: checks, datasets, and pipelines are designed, scaffolded, and iterated with AI agents doing the heavy lifting — you direct, review, and elevate
Skills over scripts: recurring workflows are packaged as versioned SKILL.md specs that any teammate or agent can load and run
Evals as a default, not an afterthought: every LLM check ships with a labelled eval set, measured precision/recall, and a prompt version you can roll back
Traces and observability from day one: every LLM call is logged with prompt version, model, cost, latency, and decision — retrofitting this later is not the plan
Continuous AI feedback loops: model drift, prompt regression, and cost ceilings are monitored the same way pipeline health is
If you're not already working this way, this role will require a rapid and genuine mindset shift. We're not looking for people who are open to AI-native work — we're looking for people who already live it.
The Operating Environment
Firmable runs lean and ships fast — intentionally small teams, no layers, minimal process, and a weekly release cadence moving toward daily. Teams own their stack end to end: you design it, you build it, you ship it, you run it.
This is a startup-to-scaleup environment and it comes with real expectations. There are no fixed hours. The pace is high, the team is always building, and when something matters it gets done. In return, you get genuine ownership, a seat at the table on every major architecture decision, and the opportunity to build something that doesn't exist anywhere else in the market.
Why This Role
Own data quality at the heart of one of the fastest-growing B2B intelligence platforms in APAC — every check you build, every skill you ship, every dataset you onboard reaches every Firmable customer
Greenfield AI-native scaffolding — the eval harnesses, skills library, and agentic quality pipelines are largely unbuilt; you'll shape them
Work at the frontier — LLM-as-judge at production scale, agentic remediation, drift detection on vendor models, and rule-vs-LLM orchestration are genuinely hard, genuinely novel problems
Small team, massive leverage — your work reaches every Firmable customer, every day
Competitive base + meaningful equity — we balance strong compensation with a share in the upside we're building toward