The Expert's Dilemma: Truth-Seeking, Not Truth-Bearing

Share
The Expert's Dilemma: Truth-Seeking, Not Truth-Bearing

One of my former Data Science managers described the role of a Data Scientist as "Truth Seeking." He didn't mention "Truth bearing," and I appreciated the nuance, now more than ever. This used to conjure in me an image of the mythical Hamsa (swan), which can separate milk from water — a symbol of separating truth from falsehood. But what if there's no absolute truth, or we are not omniscient enough to know it? What if there are multiple truths?

This dilemma isn't just philosophical; we face it in every meeting, news report, and scientific paper. Whom do we trust? The expert validated by social proof, the one who confirms our own biases, or simply the loudest voice in the room? The mythical Hamsa had it easy with just milk and water. Our challenge is to find signal in a sea of convincing and conflicting "truths." We call it the expert's dilemma.

The Expert's Dilemma

The situation is defined by a few key problems:

  • No "Ground Truth": You have multiple expert opinions but no definitive answer key to check them against.
  • Variable Reliability: Experts have different levels of accuracy, and some are only reliable on specific topics.
  • Human Bias: We naturally gravitate towards sources that validate our own views (confirmation bias).

Fortunately, the solution to this was proposed by Dawid and Skene in 1979.

The Dawid-Skene Solution

Instead of just counting votes, it treats the true answer and each expert's reliability as unknown variables, solving for both simultaneously.

  • The expert report card & the feedback loop: The algorithm builds a "report card" (error matrix) for each expert, and an Expectation-Maximization algorithm guesses the truth, grades the experts, and uses those grades to refine its truth-guess until the answers stabilize.
  • The punchline: The final output isn't a majority vote, but a probabilistic estimate of the truth, weighting each opinion by the expert's statistically proven reliability.

So where does this leave the "Truth Seeker"? While models like Dawid-Skene offer a powerful lens, and a family of inter-rater reliability metrics exists — from Cohen's Kappa to Gwet's AC1 — the journey doesn't always require statistical sophistication. As someone said, "beauty lies in simplicity." Recently, while working on estimating how well experts agree on evaluating AI conversation quality, a colleague I admire suggested I simply start with a confusion matrix showing the % of agreement between our raters. The result was illuminating. It was a simple chart that stakeholders instantly understood — a tool that, like the mythical Hamsa, finally let us separate the signal from the noise. It's a powerful reminder that our job is not just to wield complex tools, but to find the simplest path to the truth.


Originally posted on LinkedIn.

Read more