There is a difference between an AI that demos well and an AI that works in production.
The first one is easy. Show a clean demo on a clean problem, with a clean dataset, in a clean meeting. Cherry-pick the prompt. Cherry-pick the timing. The customer leaves convinced.
The second one is hard. Run the same agent for ninety days on real customer data, real edge cases, real exceptions, real angry people. Most demos do not survive that. Most do not even survive thirty days.
We learned this because we ran the experiment on ourselves.
We operate a portfolio of thirty brands across firearms, real estate, services, content, and products. We did this on purpose. We needed real businesses, with real customers, with real consequences, to test the agents we wanted to sell.
You can see them at jarvis.thnk.biz/factory. Live dashboard. Updated every five minutes. Real revenue numbers and real status indicators behind a login.
Every agent in our catalogue has been running in that portfolio for at least six months before we sell it to anyone else. Some of them for two years.
Three things, mostly.
First, the agents that survive are the ones with narrow, well-defined jobs. Read email and draft replies in my voice survives. Be my chief of staff usually does not — at least not at the start. The scope has to be small enough that the agent can be excellent at it. Breadth is the enemy of reliability.
Second, training data wins. An agent trained on twelve months of your actual email outperforms a generic agent by an order of magnitude. Not 10% better. 10x better. The operators who try to skip the training data step always end up doing it later, after the agent has embarrassed them.
Third, humans stay in the loop for longer than vendors will tell you. Most agents reach genuine autonomy at week six or eight, not week one. The first six weeks, the operator approves every action. After that, the agent acts and flags only the ambiguous. Skipping the approval phase is how clients end up firing us by month two.
We get pressure to add agents to our catalogue faster. Every operator we meet has a job they want automated.
We add about one agent per quarter to the standard catalogue. Not because we cannot build faster — we can — but because we will not ship something we have not survived ourselves. The Receipt & Expense Capturer was in beta in our own bookkeeping for four months before we shipped it. The Voicemail Triage agent ran on our phones for six months before any client got it.
That is what eating our own cooking actually looks like. Slow.
If you want fast and broken, the AI industry has plenty of options. If you want narrow and reliable and shipped after we have personally survived it, talk to us.
— Randall Gorham · Founder, THNK
90 seconds. Seven questions. We tell you exactly which three agents we would deploy first and what the math looks like.
Take the Audit → More essays →