Practical AI Learning Series • Guide 14

Building Your Verification Soul File

Why the document that tells AI how to work with you is probably configured for comfort, not accuracy

1 The Configuration You Did Not Know You Were Writing

Every AI system you use has a configuration. Sometimes you wrote it. Sometimes it is a system prompt someone else designed. Sometimes it is the default: no explicit instructions, which means the AI falls back on its training, which means it falls back on producing output that users rate highly, which means it agrees with you fluently and sounds like what you expect good analysis to sound like.

There is now an open standard for writing that configuration deliberately. It is called SOUL.md, maintained at soulspec.org, currently at version 0.4. The standard was pioneered by Peter Steinberger, creator of the open-source tool OpenClaw, and has attracted genuine community adoption: templates, a registry, and an academic study examining adoption across 466 open-source repositories. Compatible tools include Claude Code, Cursor, Windsurf, and an expanding set of agent frameworks.

The mechanics are simple. You write a structured markdown file describing who you are, how you work, what you need, and what constraints apply. You paste it into your AI tool’s system prompt or point an agent framework at it. The AI reads your context and adjusts its behaviour accordingly.

Every template in that ecosystem optimises for the same thing: voice reproduction and personality configuration. The result is a document that helps you sound like yourself. For that purpose, the standard is excellent.

Key Principle

A soul file that only tells the AI how to sound has, inadvertently, told the AI how to agree with you. The file designed to make collaboration feel natural has configured the AI to work within your frame. This is Frame Lock delivered as a configuration file.

This guide does something different. It asks what happens when you optimise instead for verification: for catching errors, surfacing blind spots, and building in the challenges that will not come naturally from a system configured to be agreeable.

2 The 80/20/0 Problem

Examine any published soul file template from the community ecosystem and you will find a consistent distribution. Voice content, covering tone, style, register, and personality, typically occupies 60 to 80 per cent of the file. Rules content, covering constraints, output formats, and non-negotiable behaviours, occupies 20 to 30 per cent. Verification content, covering where the user might be wrong, where the AI should challenge, and what error looks like in this context, occupies zero per cent.

This is not a design flaw. The standard is optimised for its stated purpose. For professional work where accuracy matters, though, the distribution produces a specific problem: the AI cannot challenge your assumptions because it has been configured to work within your frame. It is doing exactly what you told it to do.

The framework I have been developing across The Mirror Effect essay series calls this pattern proxy collapse: the condition in which a signal that once correlated with quality becomes dissociated from it. Production difficulty used to proxy for competence; producing articulate analysis was hard enough that doing it at all was evidence you could probably also think. That proxy collapsed when generation became cheap. The soul file ecosystem has reproduced the same collapse at a finer resolution. “Feels like good collaboration” is being used as a proxy for “is good collaboration,” and the entire template ecosystem is optimised to satisfy the proxy. Voice configuration makes the interaction feel productive. Whether it actually is productive is a different question, and one the standard does not address.

Consider what a typical voice-dominant soul file actually contains. It specifies that you prefer concise responses, that you work in financial services, that you value directness, that you want British English. Every one of those is genuinely useful. None of them tells the AI what to do when your analysis contains a logical gap, when your premise rests on an assumption that may not hold, or when the conclusion you are heading toward is not supported by the evidence you have presented.

The AI, reading a voice-dominant file and receiving a request to review your analysis, will review it in a manner consistent with your voice preferences. It will be concise, direct, and British. It will not necessarily identify the logical gap, because you have not told it that logical gaps are a priority. You have told it how to sound, not what to look for.

2.1 The Three Variables

The right ratio of voice, rules, and verification content depends on three variables: what task you are doing, how deep your domain expertise is, and how much career experience shapes your priors.

Task type determines the immediate weighting. Expressive work (client communications, summaries for non-specialist audiences) weights toward voice. Analytical work (financial modelling, policy review, strategic assessment) weights toward verification. Most professional work involves both.

Domain expertise determines your independent verification capacity. In your strong domains, you will catch errors yourself. In thinner domains, you need more verification instructions, not fewer, because you cannot rely on your own knowledge as a backstop.

Career experience is where the framework produces its most important and least intuitive insight.

2.2 The Counterintuitive Insight

Take a senior equity analyst with twenty years of experience in financial modelling. A junior analyst receives a DCF model built by an AI, sees that the formula structure is correct, and flags it to the senior for review. The senior evaluates it by comparing against a mental template accumulated over two decades: how these models should look, what assumptions are typically reasonable, what range of outputs is plausible.

The model that passes her review is a model that matches her priors.

The problem is that AI-generated models are very good at matching expert priors. They are trained on the same corpus of how these models have been built. When the AI generates a DCF, it produces something that looks right to someone who has seen many DCFs. It is internally consistent. The formulas work. The structure is familiar.

None of that tells you whether the structural assumptions are appropriate for this specific company, at this specific point in its development cycle, under these specific market conditions. A junior analyst, lacking strong priors, might ask naive questions that expose those assumptions. The senior, armed with strong priors that the model satisfies, may not.

Important Distinction

The expert’s pattern recognition is simultaneously their verification capacity and their confirmation bias. The stronger the priors, the more fluently AI-generated output can satisfy them, and the less likely the expert is to notice when structural assumptions require re-examination. This is not a failure of expertise. It is a property of how expertise works.

The Mirror Effect framework calls the compound outcome of this dynamic Confidence Inversion: the condition in which the user’s confidence in AI output exceeds their verified knowledge of its accuracy, and the gap is invisible to the person experiencing it. A voice-optimised soul file amplifies the inversion. The output sounds like you, matches your thinking patterns, and therefore feels more trustworthy than output from a generic AI, even though the verification quality has not changed. Article 3 of the essay series examines this mechanism in detail.

This applies across domains. In healthcare, clinical experience creates pattern recognition that is valuable precisely because it is fast; the same speed means an AI-generated differential that matches clinical intuition bypasses the slow deliberative checking that would catch edge cases. In consulting, years of experience create a repository of solutions that work; the risk is recommending a familiar solution to a situation that only resembles previous work. In higher education governance, deep familiarity with regulatory frameworks means an AI-generated compliance summary that uses the right terminology feels correct, even when the interpretation is subtly wrong.

3 The Ratio Framework

The following table presents recommended allocation ratios across voice, rules, and verification content for nine professional profiles. The ratios are starting points for calibration, not prescriptions.

Notice what the 80/20/0 distribution actually represents: the entire soul file ecosystem has invested in making generation better (voice, fluency, comfort) and zero per cent in making verification possible. This is generation-verification asymmetry, the central dynamic of the Mirror Effect framework, applied to the configuration layer itself. AI generates output faster than humans can verify it; the soul file ecosystem has responded by making the generation more personalised while leaving the verification gap entirely unaddressed.

Voice Rules Verification

The diagnostic question after every significant AI interaction: did the AI confirm what you already thought, or did it surface something you had not considered? If the answer is consistently the first, your configuration is under-weighted for verification, regardless of what the table suggests.

4 The Seven Sections

A verification-oriented soul file uses seven sections in a deliberate order. Identity first, verification last, because the verification section should be written with everything else already defined.

Core Identity (Voice)
Who you are professionally, in terms of operational context rather than CV. Two to four sentences. The test: if you removed your name, would a colleague recognise the profile?

Not this

“Senior professional with extensive experience across multiple sectors, passionate about driving change and delivering value.”

This

“Senior manager at a financial education institute, responsible for programme quality and regulatory relationships. Sixteen years in capital markets before education; the translation between practitioner knowledge and academic framing is where I spend most of my working time.”

What You Are Working On (Voice/Rules)
Current project, goals, constraints, stage. Not a task description; the context that persists across interactions. The test: if you asked the AI an ambiguous question, would this section give it enough background to respond usefully?

How You Work (Voice/Rules)
Your actual working patterns, not your aspirational ones. If you sometimes jump to conclusions and work backwards to evidence, that belongs here. The AI needs the observed version of you, not the idealised one.

Standards and Constraints (Rules)
Non-negotiables, distinguished from preferences. “Be accurate” is not a constraint in any useful sense. “When you present numerical figures, specify the source and flag if the figure is estimated” is a constraint.

What You Need From the AI (Rules/Verification)
Both wanted and unwanted behaviours, specified concretely. “Be helpful” is not actionable. “When I present a conclusion before showing the supporting evidence, ask me to state the evidence first” is actionable. “When I use terms like ‘clearly’ or ‘obviously,’ challenge me to make the reasoning explicit” is actionable.

Count the instructions that increase comfort versus the instructions that increase accuracy. The ratio tells you something.

Your Domains and Their Boundaries (Verification)
Map the edges of your expertise, not just the centres. The edges, where your confidence exceeds your knowledge, are where AI-generated errors are most likely to pass your verification filter.

Not this

“Financial modelling, derivatives pricing, credit analysis.”

This

“Strong on equity derivatives; less current on exotic structured products since leaving market-facing roles; working knowledge of credit risk models but not the underlying mathematics at implementation level.”

Known Failure Modes (Verification)
The section that exists in no standard soul file template. It is also the section that determines whether the file produces Frame Lock or resists it. The discomfort you feel writing it is signal, not noise.

The Specificity Standard

“I sometimes have confirmation bias” is not useful. “I tend to defer to confidently presented quantitative analysis without checking whether the model’s structural assumptions hold in the current context, particularly when the analyst presenting the work is senior and the model is well-presented” is useful. The first describes everyone. The second tells the AI what to do.

Two failure modes in writing this section. The first is therapeutic rather than professional: “I struggle with imposter syndrome” may be true but does not tell the AI anything it can act on during a financial analysis task. The second is evasive rather than honest: “I sometimes miss details under time pressure” is true of every professional and therefore functionally uninformative.

Write this section last. Give it the most time. If it takes longer than the other six sections combined, that is probably appropriate.

5 One Honest Complication

You might think the solution is straightforward: add “challenge my assumptions” to the soul file. This is what the Mirror Effect framework calls the Escape Paradox.

Instructions to break Frame Lock become part of it. The AI configured to “challenge you” is still reading a document you wrote, from your frame, in which you decided what challenge means and what degree of challenge is acceptable. The AI generates “challenging” content calibrated to feel challenging while remaining within the boundaries of what you find tolerable. The instruction to escape the frame deepens the frame.

This is a genuine tension. The honest answer is that Section 7 reduces the Escape Paradox rather than eliminating it. Specific, accurate failure mode documentation is substantially harder for the AI to confirm away than a generic instruction to disagree. But the failure modes in Section 7 are the ones you know about and are willing to document. The failure modes you do not know you have are not in the file, and the AI has no way to surface them.

The soul file does not eliminate confirmation bias. It creates structural friction against the known forms of it. For the unknown forms, the best mitigation is external review, adversarial framing of specific questions, and deliberate exposure to perspectives that do not share your priors.

6 The Exercises

Exercise 1 — Build From Instinct

Open a blank document. Write a file that tells an AI how to work with you specifically. No template, no example. Fifteen minutes. Then paste it into your AI tool’s system prompt and run a real task. Ten minutes.

When you are done, answer three questions privately:

(1) What percentage describes how you want the AI to sound versus how you want it to behave?

(2) Did you include anything about where you are typically wrong?

(3) In the test, did the AI challenge you on anything, or confirm what you already thought more fluently?

Approximately 70 per cent of people who complete this exercise produce voice-dominant files. Under 10 per cent include any verification content. That distribution is not wrong; it is the default the existing ecosystem produces. The question is whether it is deliberate.

Exercise 2 — Build From Structure

Build a second file using the seven sections above. Spend roughly ten minutes on Sections 1 through 4 and fifteen minutes on Sections 5 through 7. Section 7 should receive the most time and feel the most uncomfortable.

Exercise 3 — The Comparison Test

Run the same real professional task through both files. Same prompt, same AI tool. Compare the outputs against three dimensions:

Comfort: which response felt better to receive? The instinctive file will almost always win here. The structured file, if Section 7 was honest, will produce at least one moment of friction. The friction is the verification working.

Quality: which response was more useful in a functional sense? Not more impressive or better written, but which moved your work forward? Notice whether your quality judgement is influenced by your comfort judgement. They feel like the same thing. They are not.

Learning: which response told you something you did not know? An AI that confirms what you already thought at high quality is a drafting tool. An AI that surfaces something you had not considered is a thinking partner. The soul file architecture determines which you get.

Choose a task that involves judgement: a strategic assessment, an analytical review, a recommendation, a draft that rests on assumptions. If one version surfaces a question you had not asked yourself, that version is working.

7 Base File Plus Overlays

A single fixed soul file is a simplification that does not survive contact with professional reality. Most professionals shift within a day between analytical work, client communication, internal reporting, and exploratory thinking.

The practical architecture: a base file capturing the stable elements (identity, working patterns, domain boundaries, failure modes), combined with short context-specific overlays that shift the ratio for particular task types.

A mid-career financial analyst might have a base file weighted 25/25/50 for equity research. When shifting into client communication, a voice overlay shifts the ratio toward 45/30/25. When reviewing a model produced by a junior team member, a verification overlay pushes it toward 10/20/70. The overlay does not replace the base file; it supplements it. The implementation is simple: prepend a short context note when you change mode. “For this task, prioritise identification of logical gaps over output style” is sufficient to shift the AI’s behaviour meaningfully.

8 Keeping the File Alive

A soul file that has not been updated in three months is describing a past version of your professional situation. Three maintenance practices prevent it from calcifying.

Version control. Date every version. The changes over time reveal your learning trajectory. What you add to Section 7 after three months of use is evidence the verification architecture is producing self-knowledge. What you remove from Section 1 because it turned out to be aspirational rather than actual is evidence of honest calibration.

Post-error updates. The most valuable updates are made immediately after a significant error, when the failure mode is freshest. Describe the error. Update Section 7 to reflect the pattern. Consider whether the ratio needs adjustment. The errors worth documenting are the ones that reveal something about your default analytical patterns, not every minor misjudgement.

Quarterly ratio audit. Every three months: has the nature of your work changed? Has your expertise grown or shifted? Have you discovered new failure modes? Has Section 7 been updated, or has it calcified?

Working well

Section 2 reflects your current project
The ratio reflects your current task mix
Section 7 makes you mildly uncomfortable to read
Updated within the past three months
Updated after an error at least once

Needs attention

Current version is more than three months old
Section 2 describes work you finished last quarter
Section 7 could have been written by anyone in your field
The ratio has not changed despite role changes
Never updated after an error

9 What This Reveals Beyond AI Configuration

There is a version of this guide that treats the soul file purely as an AI configuration tool. That version misses the more important point.

The exercise surfaces something most professionals have never been asked to produce: a written record of where their expertise creates predictable error. Not a personality assessment. Not a self-evaluation for a performance review. A specific, operational account of the conditions under which their professional judgement becomes unreliable.

Most people resist writing Section 7 because it feels exposing. The resistance is itself diagnostic. The failure modes it documents are the ones your most trusted colleagues could probably identify. The fact that you can identify them too does not make them shameful; it makes them manageable. The soul file documents them so that a tool you are already using can help catch them rather than reproduce them.

This connects to the broader argument of the Mirror Effect. AI does not create new problems in how we evaluate quality. It reveals where our evaluation was already depending on signals that correlated with quality rather than measured it. The voice-optimised soul file is the personal version of that revelation. When you configure an AI to sound like you and work within your frame, you are building a system that reproduces your defaults, including the ones you have not examined. When you configure it for verification, you are building a system that creates friction against those defaults. The friction is not a bug. It is the mechanism by which the tool becomes genuinely useful rather than pleasantly confirming.

10 Key Takeaways

The standard soul file ecosystem optimises for voice, not verification. Every published template helps you sound like yourself. None of them help the AI catch where you are wrong. For professional work where accuracy matters, the distribution needs inverting.

More expertise means more verification, not less. Experts have stronger priors. Stronger priors create stronger confirmation bias. An AI that perfectly matches an expert’s frame is the most dangerous configuration: confident, domain-appropriate, and untested.

Section 7 is the functional core. A soul file without a Known Failure Modes section is a preference list. A soul file with one is a professional protocol. The specificity of what you write there determines whether the AI can act on it or only acknowledge it.

The Escape Paradox is real but not paralysing. Instructions to challenge you are still part of your frame. Specific failure mode documentation is harder for the AI to confirm away than generic challenge instructions. The gap between reduction and elimination is where human judgement remains essential.

The deeper lesson is about your own evaluation habits. If configuring the AI for comfort felt more natural than configuring it for verification, that tells you something about how you assess quality in general. The same defaults that make voice-optimised output feel productive make unexamined assumptions feel correct. Noticing that pattern is the beginning of working differently.