AI Service Build Computer Vision ML Systems

When no model exists for
the problem, you build one.

How on-screen human representation was transformed from invisible to measurable — across tens of hundreds of thousands of video ads.

Built from scratch — no off-the-shelf model

10K+ videos per trigger

Zero manual intervention

Live in client dashboards

Engagement Snapshot

Sector	Ad Tech / Media Intelligence
Service	Video processing pipeline for facial & body analysis
Scale	Hundreds to tens of thousands of videos per client per year
Outcome	Novel CV pipeline → production-grade D&I intelligence product

Key Highlights

→ Built from scratch — AI models built from scratch, no off-the-shelf substitute

→ End-to-end ownership — data collection, annotation, model training, productisation

→ Scalable GPU pipeline — capable of processing tens of thousands of videos per trigger

→ Time-series insight — brands track representation across their entire ad portfolio over time

→ Zero manual intervention — fully automated once deployed

Context & Challenge

The ask was deceptively simple: help brands understand who is represented in their ads, across skin tone, body type, age, gender, demographic diversity. The reality was technically brutal. No off-the-shelf model existed for this. Academic computer vision research didn't map cleanly to broadcast ad footage. And the system needed to scale — processing video volumes that ranged from hundreds to tens of thousands per client per year, without manual intervention.

Before this work, representation monitoring was either manual, incomplete, anecdotal, or simply not happening. The data didn't exist. The tooling didn't exist. The definitions didn't fully exist either — which turned out to be the first real problem to solve.

Approach & Solution

Define it. Prove it. Build it. Run it.

Stage 01 — Hypothesis

Define the problem before the model.

The first challenge wasn't technical — it was conceptual. What does "skin tone" mean consistently enough to train a detector on? Working like an Architecture R&D Sprint, we established precise, reproducible definitions for each attribute — skin tone, body type, age, gender — then scoped the data strategy, model approach, and pipeline architecture needed to deliver them at broadcast scale. The output was a plan we could pressure-test against real footage before committing to a full build.

Stage 02 — Experiment

Prove it against real broadcast footage.

Before production, we validated the approach on real conditions and real numbers. We assessed the available data, tried candidate model approaches against actual broadcast ad footage, and built prototype detectors — testing them across the lighting, video quality, and camera angles typical of ad production. This proved feasibility, surfaced the edge cases early, and confirmed the definitions could survive annotation at scale.

Stage 03 — Formulation

Custom models, built to production grade.

With the approach validated, we built the production models entirely in-house — skin tone and body type detection among them — owning data collection, annotation pipeline design, model training, and iterative refinement. Around the models we built the production service layer and deployment pipeline, with observability built in and the GPU cost envelope locked so inference stayed economical at scale.

Stage 04 — Execution

Live, at broadcast scale, with zero manual workflow.

We deployed a GPU-optimised pipeline directly into ExtremeReach's existing infrastructure — processing tens of thousands of videos per batch trigger and feeding structured results into their client-facing insights product, XR IQ. The service runs with zero manual intervention, and ongoing model and application maintenance keeps it honest as conditions evolve.

✓Automated ingestion and processing pipeline

✓GPU resource management for cost-efficient inference at scale

✓Structured output feeding client dashboards with time-series tracking

✓Full integration with XR IQ — zero additional manual workflow for clients

Results & Impact

What changed.

Before Ferrous Labs	After Ferrous Labs
No model existed for skin tone or body type detection at broadcast fidelity	Novel CV capabilities — not available from any third-party provider
Representation monitoring was manual, anecdotal, or absent	Production-ready D&I intelligence pipeline deployed at broadcast scale
No scalable way to process client video volumes	Brands gain quantified, time-series insight into representation across their ad portfolios
Brands had no data on who appeared in their own advertising	Automated, scalable — zero manual intervention per video batch

The hardest part wasn't the model — it was defining what 'skin tone' means consistently enough to train one. We built the definition and the detector simultaneously.

Ferrous Labs engineering note

Technology

Stack

PyTorch TensorFlow AWS SageMaker Databricks AWS ECS MLFlow Docker

Got a problem no model exists for?

Talk to a co-founder.

If you're at the edge of what production AI can do, we've delivered novel CV capabilities from scratch. Book a discovery call.

Talk to a co-founder Find your best starting point

When no model exists forthe problem, you build one.