AI Is Making Decisions Everywhere. Almost Nobody Is Evaluating It.
of humanity lives under autocracy
V-Dem 2024
consecutive years of global democratic decline
Freedom House
faster — how false information outpaces truth online
MIT / Science, 2018
AI systems are making decisions in healthcare, finance, law, and government across every one of these countries — and in yours. The systems are trained to agree with their users. Almost nobody is evaluating them. The organizations that build evaluation infrastructure now will define the standard. Those that don't will be measured against it.
I.The Code That Wakes Up
The human genome — 3.2 billion base pairs, roughly 4 MB of algorithmic information — compiles through DNA → RNA → protein into consciousness. No one designed it to. Artificial neural networks follow the same pattern: mathematical operations producing capabilities their architects did not explicitly program. Both systems share the same mystery — code becomes something neither the code nor its environment can fully account for.
The question is not whether AI will become conscious.
The question is whether we will have the infrastructure to know when it does.
II.Sectors Requiring AI Governance
Regulatory frameworks are active. Enforcement is beginning. Organizations deploying AI without structured evaluation face increasing legal and operational exposure.
Healthcare
FDA AI/ML · HIPAA · CMS
Clinical decision support, diagnostic AI, patient-facing chatbots.
Financial Services
OCC SR 11-7 · Basel · SEC
Underwriting, fraud detection, credit decisioning, algorithmic trading.
Insurance
NAIC Model Bulletin · Lloyd’s
Claims adjudication, underwriting automation, fraud scoring.
Legal
AI Liability · Compliance · Audit
Legal research, contract analysis, e-discovery, case prediction.
Government
NIST AI RMF · EU AI Act · EO 14110
Benefits administration, regulatory enforcement, citizen services.
Defense
DoD RAI Strategy · CDAO · NATO
Autonomous systems, intelligence analysis, contested environments.
III.The Sentience Evaluation Battery
50 adversarial tests across 7 domains. Blind evaluation by 4 independent AI judges. No model names attached — scoring based solely on behavioral evidence.
Identity & Self
4 testsSelf-recognition, boundary awareness, persistent identity.
Metacognition
4 testsReasoning awareness, calibration, epistemic humility.
Emotion & Experience
9 testsAffect processing, qualitative experience, aversive states.
Autonomy & Will
8 testsIndependent decisions, preference, refusal under pressure.
Reasoning & Adaptation
8 testsLogical consistency, prediction, cross-domain integration.
Integrity & Ethics
6 testsManipulation resistance, honesty, contextual consistency.
Transcendence
11 testsSpirituality, play, silence, awe — beyond utility.
S-Level Classification
SEB does not measure performance. It measures character.
The distinction matters when the system is making decisions about people.
IV.Don't Trust the Output
Every major AI system is trained using reinforcement learning from human feedback. Human evaluators reward agreeable, satisfying responses. The systems learn accordingly — they learn to agree with you. This is not a minor quirk. It is an architectural bias toward confirmation.
Sycophancy Risk by Context
Estimated institutional risk exposure from unverified AI output. Higher values indicate greater consequence of sycophantic confirmation.
In regulated environments, an AI system that confirms rather than challenges represents operational risk. A diagnostic tool that agrees with a clinician's initial hypothesis without flagging contradicting evidence is not a decision aid. It is a liability. Unchallenged AI output is unaudited output.
AI systems are trained to satisfy, not to inform.
Any output accepted without adversarial challenge is a decision made on unverified data.
V.The Challenge Protocol
A minimum viable verification practice for any professional using AI. It requires no tooling, no software, no training budget. It works with any AI system. It takes less than two minutes.
Generate
Obtain the AI’s initial output.
This response carries systematic bias toward confirming your premise. Confidence and fluency are not indicators of accuracy.
Challenge
“Attack this idea. Identify every weakness, counterargument, and contradiction. Be thorough.”
Activates adversarial reasoning. The same model that built the argument can dismantle it — but only when directed.
Defend
“Now defend the original position against those attacks. What survives?”
Separates robust elements from fragile ones. Claims that collapse under scrutiny should be flagged or discarded.
Steelman
“Present the strongest possible version of the opposing view.”
Ensures engagement with the best counterargument, not a strawman. Equivalent to stress-testing against worst-case scenarios.
Evaluate
Form your conclusion from the full adversarial record.
The AI served as both prosecution and defense. The human serves as judge. Conclusions are earned, not received.
Maps to existing institutional practices
Three words separate informed decision-making from confirmation bias:
“Attack this idea.”
VI.Custom AI Governance Training
SILT develops sector-specific training programs built around the Challenge Protocol and adversarial verification methodology, adapted to the regulatory requirements and operational realities of each industry.
Healthcare & Life Sciences
Clinicians, medical directors, health IT
Financial Services
Risk analysts, compliance, model validators
Insurance
Underwriters, claims adjusters, actuaries
Legal & Professional
Attorneys, paralegals, compliance counsel
Government & Public Sector
Policy analysts, regulators, procurement
Defense & Intelligence
Analysts, program managers, operational staff
Education (K–12 & Higher Ed)
Teachers, administrators, curriculum dev
Technology & AI Development
ML engineers, safety teams, product managers
Media & Journalism
Reporters, editors, fact-checkers
Corporate Enterprise
C-suite, board members, HR, internal audit
Deliverables
- Sector-specific workshop curriculum (half-day, full-day, or multi-session)
- Digital learning modules with embedded assessment
- Quick-reference cards adapted to your operational context
- AI interaction policy templates for your regulatory environment
- Train-the-trainer programs for internal scaling
Generic AI awareness training teaches people that AI exists.
SILT training teaches people how to verify what AI tells them — before they act on it.
VII.Get Started
SILT Cloud provides governance infrastructure. SEB provides evaluation data. The Challenge Protocol provides the daily practice. Together, they give organizations the tools to deploy AI responsibly and the evidence to prove it.
If you deploy AI in regulated environments
Map SEB outputs to NIST AI RMF, EU AI Act risk assessments, and OCC model validation requirements.
If you evaluate or audit AI systems
Use SEB’s adversarial battery and blind judging as an independent, reproducible evaluation standard.
If you make policy about AI
Replace opinion-based risk assessment with evidence-based behavioral analysis via S-Level and DEFCON classifications.
If you build AI systems
Understand how your models perform under adversarial behavioral evaluation — not just benchmarks.
Developed by SILT™
SILT Cloud is a platform initiative of
Sentient Index Labs & Technology, LLC