Field Notes

The Expert's Dilemma: Truth-Seeking, Not Truth-Bearing

One of my former Data Science managers described the role of a Data Scientist as "Truth Seeking." He didn't mention "Truth bearing," and I appreciated the nuance, now more than ever. This used to conjure in me an image of the mythical Hamsa (swan), which can separate milk from water — a symbol of separating truth from falsehood. But what if there's no absolute truth, or we are not omniscient enough to know it? What if there are multiple truths?

This dilemma isn't just philosophical; we face it in every meeting, news report, and scientific paper. Whom do we trust? The expert validated by social proof, the one who confirms our own biases, or simply the loudest voice in the room? The mythical Hamsa had it easy with just milk and water. Our challenge is to find signal in a sea of convincing and conflicting "truths." We call it the expert's dilemma.

The Expert's Dilemma

The situation is defined by a few key problems:

No "Ground Truth": You have multiple expert opinions but no definitive answer key to check them against.
Variable Reliability: Experts have different levels of accuracy, and some are only reliable on specific topics.
Human Bias: We naturally gravitate towards sources that validate our own views (confirmation bias).

Fortunately, the solution to this was proposed by Dawid and Skene in 1979.

The Dawid-Skene Solution

Instead of just counting votes, it treats the true answer and each expert's reliability as unknown variables, solving for both simultaneously.

The expert report card & the feedback loop: The algorithm builds a "report card" (error matrix) for each expert, and an Expectation-Maximization algorithm guesses the truth, grades the experts, and uses those grades to refine its truth-guess until the answers stabilize.
The punchline: The final output isn't a majority vote, but a probabilistic estimate of the truth, weighting each opinion by the expert's statistically proven reliability.

So where does this leave the "Truth Seeker"? While models like Dawid-Skene offer a powerful lens, and a family of inter-rater reliability metrics exists — from Cohen's Kappa to Gwet's AC1 — the journey doesn't always require statistical sophistication. As someone said, "beauty lies in simplicity." Recently, while working on estimating how well experts agree on evaluating AI conversation quality, a colleague I admire suggested I simply start with a confusion matrix showing the % of agreement between our raters. The result was illuminating. It was a simple chart that stakeholders instantly understood — a tool that, like the mythical Hamsa, finally let us separate the signal from the noise. It's a powerful reminder that our job is not just to wield complex tools, but to find the simplest path to the truth.

Originally posted on LinkedIn.

Stepping Out of the Deep Work Chamber

When my architect asked for the vision behind the house I'm building in India, I sketched a lighthouse. Five floors. One central column. The bottom floor was a gallery — open, social, collaborative. The top floor was a sealed chamber for uninterrupted deep work. I was low-key proud of

We'd Better Build Some Damn Good Brakes

Waymo is objectively better than an average human driver. Data shows it reduces injury-causing crashes by over 80%. Yet, if an autonomous vehicle makes a single mistake, it's front-page news. We demand near-perfection from machines while accepting massive error rates from humans. And rightfully so. We should apply

My Poster at the Google Data Science Summit

Grateful to have had my poster featured at the 2026 Google Data Science Summit in Sunnyvale today. Autorater Context Enrichment tackles something I've been thinking about for a while: static prompts don't age well. Autoraters built on fixed grounding miss nuance as products evolve, and the

No Plan Survives Contact with the Data: Why AI Blueprints Need a Commander's Intent

A few months ago, during a coffee chat, a very smart AI mentor at Google told me something: "The best way to think about these agents," she said,"is like a very smart intern. Give them enough context, define the workflow, and they will execute the code

Read more

Stepping Out of the Deep Work Chamber

We'd Better Build Some Damn Good Brakes

My Poster at the Google Data Science Summit

No Plan Survives Contact with the Data: Why AI Blueprints Need a Commander's Intent