The race to AI-ify products has never been louder. Every incumbent has a generative roadmap. Every competitor has shipped something. Every customer wants to know what yours does, and some are considering DIY.
In that moment, the temptation is to mistake polish for product.
Here’s what I’m learning, building three AI products this year — Pathfinder for small businesses navigating government contracting, an AI strategy framework for product leaders, and one more in development:
When polish is cheap, attention to foundations matters more, not less.
The outcome the customer actually values. The jobs your product simplifies while earning trust. The papercuts they accept every morning. The undifferentiated, acutely essential work they do without thinking. The workarounds they invent that you never see.
Building is faster than ever. Earning trust is not.
Teresa Torres calls the disciplined version of curiosity continuous discovery — at least one customer conversation a week, every week. The discipline is in the listening that comes after.
Continuous discovery means shipping fast enough to learn from real users. In a regulated domain, that speed bumps into a hard ceiling — the trust you’ve earned with the real people on the other side. Something has to give. It is not the customer’s trust.
That tension forces an architectural choice most consumer products never face: what do you let AI do, and what do you refuse?
I’m building Pathfinder for small businesses pursuing government contracts. The output goes to certification officers who decide whether the business is ready to apply. Vendors hand the screen to the officer. The officer asks where a number came from.
The default AI product pattern in 2026 is to throw an LLM at the whole problem. Let it figure out who qualifies, interpret the policy, generate the recommendation. Fix the rest with prompt engineering, RAG, and iteration.
When I asked AI to architect Pathfinder, it gave me exactly that.
I said no.
Eligibility logic is deterministic. Pass/fail/borderline rules written as code, each tagged to a specific clause in the source policy. The LLM has two narrow jobs: retrieve the right policy passage, and translate legalese into plain English. It never decides who qualifies. It never guesses which rule applies.
The trust signal is not “the AI is usually right.” It is “every answer traces back to a source you can audit.” That is a different product.
What the architecture bought me
A few weeks into testing, a vendor flagged that Pathfinder had sent them to the wrong bidding portal. They followed the link and hit a dead end.
The fix was a one-line constant change. Not a prompt rewrite. Not a fine-tune. Not RAG re-indexing and hoping. One line, deployed in minutes, with a test case that lives in the eval set forever.
That is what determinism in the right places buys you. Every failure becomes a finite, fixable thing. You stop hoping the model learns and start knowing the system is correct.
It also unlocked the eval framework. I am working with a certification officer to validate a 50-question golden set against the deterministic outputs. Because each answer cites the policy chunk that produced it, his corrections are not fuzzy — they point at exactly the rule that needs updating. The discipline that started as a defensive architectural choice became the engine of the product’s accuracy.
That is the speed-to-trust loop. It only works because I left certain decisions on the human side of the line.
What I am learning more broadly
The senior AI product calls I admire most are not the heavy ones or the light ones. They are the boundary calls — what the model is allowed to do, what it is guarded against.
I see it most clearly in builders who have shipped for years in regulated industries. They have seen enough hallucinations cost real people real things that they reach for determinism without flinching, and use AI selectively, almost protectively.
That stance is the one I am trying to develop.
A practice I’ve started: use AI for the synthesis (customer feedback, meeting notes, draft one of anything). Then journal three things at the end of the day —
- Where I agreed with what it gave me
- Where I countered it
- Where I actually listened: to a customer, a stakeholder, a quiet signal in the data
That third one is the work.
What’s an instance where you disagreed with — or took a different approach than — an AI recommendation? I’d love to hear yours.