2.5 Years and Hundreds of People → Two Agents

📞

The Before

In 2018, we launched a marketplace for experiences. Quality was the whole game.

So we vetted every single host with a phone call. Someone on the team would work through about ten attributes trying to figure out if this person could deliver. Things like depth of expertise, ability to engage a group, connection to the place.

Before going live, hosts even had to attend an in-person event to learn how to create magic for guests.

🎙️

Phone call with every single host. ~10 attributes assessed manually.

🎪

In-person event required before going live. Learn the hero's journey.

🔥

Constant internal debates. What counts as a quality experience? Ten people, ten answers.

Thoughtful? Yes. Scalable? Absolutely not. Expensive, inconsistent, and impossible to standardize.

🔧

The Rubric

I listened to leadership, studied the patterns, and designed what became our quality rubric. Three core standards. Five-point scale for each. Simple. Structured. Repeatable.

We kept the phone call at first but made it structured. Market managers asked the three questions and documented responses.

Results started clustering into a real bell curve. The framework was working.

Product resources were scarce. So before building anything into the platform, we validated with an MVP: a Google Form. Hosts self-reported. A human QA'd the responses.

The bell curve held. The P95 experiences were absolutely incredible.

📐

The Thresholds

Once we had the distribution, we could make real decisions. The bell curve gave us the data to set clear thresholds.

✅

Publish

Score high enough? You go live on the marketplace.

📊

Stay Live

Post-launch review performance determines if you remain on the platform.

🚪

Remove

Drop below the performance threshold? You're off the marketplace.

Then we built the feedback loop. Post-experience reviews validated the upfront vetting. Reviews drove customized education back to hosts. A circular quality engine.

Start to finish: roughly 2.5 years. Hundreds of people operating the machine.

Then (2018)

~2.5 yrs to build the system

100s of people operating it

0 market-level nuance

Now (2025)

Weeks to build the system

1-2 people per 1,000 experiences

∞ market-level orchestration

🤖

How I'd Rebuild It Today

Three agents. A focused human team where it matters most.

🔍

Vetting Agent

Reviews host applications AND runs digital searches to validate claims. When there's a variance it can't resolve, it flags a human. Most of the time the human just pushes it forward.

⭐

Quality Agent

Analyzes incoming reviews. Identifies what guests actually value. Feeds signal back to the vetting agent so "excellent" evolves in real time.

🤝

Remediation + Removal Agent

Identifies underperforming experiences, surfaces the data, and recommends action. But this is where the human team lives. Removing someone from a marketplace is a real decision with real impact on someone's livelihood. You do it with dignity. You do it well. You make sure you're making the right call. This agent arms the team with everything they need. The team makes the final call.

🌍

The real unlock: market-level orchestration. Our 2018 system couldn't account for nuance. A cooking class in Tokyo has different quality markers than one in Austin. International travelers want different things than locals. We knew that intuitively but couldn't operationalize it. With these three agents and an orchestration layer, each market gets its own evolving quality standard. Not static. Alive.

The system we spent 2.5 years discovering through iteration would now be the starting architecture. Not the end state.

💬

What system did you spend years building by hand that could now be three agents and an orchestration layer?