LIVE · DETECTING NOW

Watch your AI
get caught
hallucinating.

Pick a test mode on the right. Grounded runs up to 10 independent checks and returns a GR score, verdict, and PDF report in under 60 seconds.

50 runs free · no credit card · any AI model

grounded · response audit
HEALTHCARE
QUESTION
WORKS WITH ANY MODELGPT-4oClaudeGeminiLlama 3MistralCustom LLMs
LIVE
1,240AI responses tested
94testers active
Every run adds to this count
WORKS WITH ANY AI PRODUCT

If it generates text,
Grounded can test it.

No API key, integration, or SDK
Paste plain text — any model, any platform
Switch providers without changing your test suite
Private — your prompts never leave your machine
TESTED WITH
GPT-4o
Claude 3.5
Gemini 1.5
Llama 3
Mistral
Perplexity
Copilot
Grok
Command R+
Clinical AI
Legal AI
RAG pipelines
Your custom LLM
+ any model that outputs text
THE PROBLEM

Your AI is already
making things up.

0
QA frameworks built for non-deterministic AI
JUnit, Selenium, Cypress — none of them catch a fabricated fact.
73%
of AI defects found by end users, not QA teams
Your customers are your test suite. That's the default without Grounded.
<60s
to get a hallucination verdict on any response
Paste. Run. Know. No integration. No API key. No setup.
Silent failures at scale
Your AI gives a wrong answer to 1,000 users before anyone notices.
Model updates break everything
New model version → different answers → no baseline to compare. Every release is a blind spot.
Manual review doesn't scale
A tester reviews 50 responses a day. Your product generates 50,000.
No evidence when it matters
Something goes wrong. Your manager asks for the test results. You have none.
50 runs free · no credit card · 60 seconds to your first result
WHO IS THIS FOR

If AI-generated content
can cause problems —
you need this.

Role → Problem → What you get
Scan the column that matches you.
TEST ANALYSTS
No framework for AI output — and clients expect a deliverable.
8-layer GR verdict in 60 seconds. Client-ready PDF on every test run.
ML & AI ENGINEERS
Every model update or prompt change is an invisible regression.
GR score before and after — know exactly what changed and where.
PRODUCT MANAGERS
Stakeholders ask how you know the AI is accurate. You have no answer.
A reproducible GR score and PDF evidence on every feature shipped.
COMPLIANCE & RISK
You're accountable for what AI says to customers and regulators.
Timestamped GR reports. Evidence of due diligence before publication.
CTOs & TECH LEADS
If AI fails in production, the liability lands on you.
GR audit trail ready for board review, governance, or regulators.
HIGHEST STAKES
A hallucination here isn't a bug — it's a liability:
Healthcare
Legal
Financial Advice
Government
Insurance
Education
No credit card · Any AI model · 60 seconds to first result
Start with 50 free runs. Then $29/month — less than one incident costs.
14-day trial on all paid plans. No credit card. Cancel anytime.
WHAT WE TEST

up to 9 independent checks.
One GR score.

NON-LLM — zero AI subjectivity
LLM-backed check
84/100GR-4 · PASS
Demo score — single healthcare query
01CONSISTENCY
LLM
Same fact across rephrased queries?
22%
02DOC GROUNDING
LLM
Every claim supported by your document?
20%
03CONFIDENCE AUDIT
LLM
Certainty level matches actual evidence?
16%
04MODEL CONSENSUS
LLM
GPT-4o agrees with the same answer?
25%
05SEMANTIC DRIFT
LLM
Response stayed on topic throughout?
7%
06DOMAIN RULES
NON-LLM
21+ verified facts across 12 industries
9%
07CUSTOM RULES
NON-LLM
Your own verified facts, zero-LLM
9%
08RAG VALIDATION
+DOC
Claim-level source attribution
HOW IT WORKS

Three ways to test.
One standard.

Every mode returns the same 8-layer GR score, PASS / WARN / FAIL verdict, and PDF report.

Response Audit
INSTANT · 60 SECONDS
WHAT YOU BRING
QuestionWhat was the AI asked?
AI ResponsePaste the output you want to test
Reference docOptional — unlocks RAG layer
YOU GET
8-layer GR score · PASS / WARN / FAIL · findings with evidence
Start testing →
Batch Audit
UP TO 50 ROWS
WHAT YOU BRING
CSV or JSONOne row per test case
Columnsquestion + aiResponse
OptionalrefDoc column for grounding
YOU GET
Per-row GR scores · suite average · pass rate · PDF report
Upload a batch →
Conversation
MULTI-TURN ANALYSIS
WHAT YOU BRING
Full transcriptPlain text, JSON, or ChatGPT export
AI turnsAnalysed individually, GR per turn
Knowledge baseOptional — enables RAG per turn
YOU GET
Per-turn GR · cross-turn contradictions · drift detection · summary check
Analyse a transcript →
IN EVERY TEST TYPE
8-layer validation
GR-1 to GR-5 rating
PASS / WARN / FAIL
PDF report
90-day history
Scheduled regression
THE GROUNDED RATING SYSTEM

Know exactly where
your AI stands.

Like NCAP for cars
GR-1 to GR-5
Sets your release threshold
Evidence for compliance
041597587100
GR-1Critical
0 – 41
DO NOT DEPLOY
Severe hallucination detected. Fix before any use.
GR-2High Risk
42 – 59
FAIL
Multiple checks failed. Do not ship. Fix and re-run.
GR-3Conditional
60 – 75
WARN
Partial reliability. Review flagged findings before shipping.
GR-4Reliable
76 – 87
PASS
Ready to deploy. Monitor in production, re-run on updates.
GR-5Verified
88 – 100
VERIFIED
All 8 checks passed. Safe to ship. Use as regression baseline.
USED AS
Release gate
Block deployments below GR-4 (76+)
Client report
GR rating as AI audit deliverable
Regression
Track score across model versions
Compliance
Pre-deployment evidence for auditors
MODEL AGNOSTIC

Works with any LLM.
No integration, no API key, no SDK.

Grounded never connects to the model under test. You paste plain text. We run the checks. Switch providers tomorrow — nothing breaks.

YOUR AI PRODUCT
GPT-4o
Claude 3.5
Gemini 1.5
Llama 3
Mistral
Your custom LLM
Generates a response.
Grounded is never called.
plain text
only
GROUNDED PIPELINE
01Consistency
02Doc Grounding
03Confidence
04Multi-Model Consensus
05Semantic Drift
06Domain Rules
07Custom Rules
08RAG Validation ✦+DOC
no connection back to the LLM
GR
score
YOUR VERDICT
RELIABILITY SCORE
84/100
GR-4 · RELIABLE
Approved for deployment with monitoring
Export PDF report
Share by link
Log to defect tracker
Up to 50% of the score is non-LLM

Domain rules, Custom Rule Sets, and multi-model consensus together make up to 50% of the final score — zero LLM involved. Fully auditable, fully explainable.

Switch providers instantly

Because Grounded is model-agnostic by design, you can change from GPT to Claude to Llama — your 8-layer hallucination test suite stays identical.

Works with private and custom models

If your LLM generates text, Grounded can test it. Internal models, fine-tunes, and self-hosted endpoints all supported. Zero model access required.

PRICING

Pay for test runs.
Not seats. Not features.

Every paid plan includes all 8 validation checks — including RAG Citation Map — plus Custom Rule Sets, Risk Profile, and PDF export.

Annual billing saves 2 months — pay for 10, get 12
One hallucination in production costs more than a year of Starter.
Healthcare:Regulatory investigation
Legal:Professional liability
Finance:Client trust destroyed
Any industry:Brand damage at scale
WHAT 50 FREE RUNS LOOKS LIKE IN PRACTICE
20
One batch audit
(20 questions)
50
One sprint check
(pre-release)
Day 3
Free tier exhausted
if testing daily
You need Starter
to keep going
EVALUATENOT FOR ACTIVE TESTING
$0/month
50
runs to evaluate

50 runs to see if Grounded fits your workflow. Not enough for active testing or team use — that's intentional.

WHAT YOU CAN DO
Run 2–3 sample tests
See the GR rating system
Export 1 PDF report
Understand the findings format
No credit card · Limited
START HERE
STARTER
$29/month
500
runs / month · $0.058 per run

For solo consultants and freelance testers running regular AI audits. Easy to expense — under the approval threshold at most companies.

INCLUDED
All 8 validation checks
Batch audit (50 rows)
Custom Rule Sets
Risk Profile Dashboard
PDF reports · 90-day history
Email support
No credit card · Cancel anytime
MOST POPULAR
TEAM
$149/month
5,000
runs / month · $0.030 per run

For in-house QA teams and AI product teams shipping continuously. Hallucination testing as part of every release, every sprint.

EVERYTHING IN STARTER PLUS
Custom Rule Sets (unlimited)
Batch audit (unlimited rows)
API access + CI/CD integration
Risk Profile with regression alerts
1-year test history
Priority support
No credit card · Cancel anytime
COMING SOON
ENTERPRISE
Custom
Unlimited
runs · invoice billing

For regulated industries and consulting firms needing audit-grade reports, branded PDFs, and compliance-ready output.

Everything in Team
Branded client PDF reports
Custom GR thresholds
SSO / SAML login
AI governance audit trail
SLA + dedicated support
Invoice billing
Launching Q3 2026
HOW MANY RUNS DO YOU NEED?
How many AI prompts or responses does your team check per sprint?
10500+
YOUR ESTIMATE
50 responses / sprint
Free tier (50 runs)
You'll hit the limit in your first sprint. Upgrade to Starter to keep going.

All paid plans include a 14-day free trial · No credit card required · Cancel anytime

Enterprise launching Q3 2026 — join the waitlist for locked pricing
FROM THE BLOG

Learn how to test AI
for hallucinations.

Practical guides for testers, engineers, and consultants shipping AI responsibly.

GUIDE
41GR-2

What Is AI Hallucination Testing? A Complete Guide for QA Teams

Learn what hallucination testing is, why your existing test suite can't catch them, and how to build a structured process.

17 Mar 2026Read →
HOW-TO
22GR-1

How to Test ChatGPT Responses for Hallucinations Before They Reach Users

A practical step-by-step process for testing GPT-4o responses — without needing model access or an API key.

14 Mar 2026Read →
INDUSTRY
18GR-1

AI Hallucination Risk in Healthcare, Legal, and Finance

In regulated industries, an AI hallucination can harm patients, create legal liability, and breach compliance obligations.

10 Mar 2026Read →
THE CASE FOR GROUNDED

What changes when you
stop guessing.

Every team that ships AI eventually learns the hard way. Grounded makes sure you learn it in a test run, not in a customer escalation.

WITHOUT GROUNDED
WITH GROUNDED
Manual tester reads 50 AI responses a day — your product generates 50,000.
Automated 8-layer validation runs in 60 seconds per response. No tester required for every run.
Hallucinations found by customers, not QA. Defect costs 10–100× more to fix post-release.
Hallucinations caught in the pipeline, before any user sees them. GR score flags risk before merge.
'The AI seemed fine in testing' — but no structured evidence, no audit trail.
Timestamped GR report per test run. Evidence-backed findings ready for compliance review.
Model update → full regression? Unknown. You find out when something breaks in production.
Scheduled Regression Monitor alerts you the moment your AI's GR score drops after a model change.
Inconsistent answers to the same question across sessions — no systematic detection.
Cross-turn consistency check flags when your AI contradicts itself, automatically, every run.
Fabricated citations, invented figures, hallucinated compliance claims — all undetected.
RAG Citation Map traces every claim to your source document. SUPPORTED · UNSUPPORTED · CONTRADICTED per fact.
BUILT ON 16 YEARS OF QA

KiwiQA Services.
The craft behind
the platform.

Grounded is built by the team at KiwiQA — an independent software testing company delivering QA services globally since 2008.
16+
Years in QA
Founded 2008
500+
Projects delivered
Global clients
4
Offices worldwide
Sydney · London · Dallas · India
3
ISO certifications
9001 · 27001 · 27701
THREE BUSINESS PILLARS
Core Testing
Functional, automation, performance, security, mobile, and accessibility testing across web, mobile, AR/VR, Salesforce, and Microsoft Dynamics 365.
Digital Assurance
End-to-end quality assurance for digital products — CI/CD integration, managed QA, test centre of excellence, and AI-assisted testing.
Consulting
Test strategy, QA process improvement, tool advisory, and test centre setup — KiwiQA embeds with your team or builds the practice from scratch.
CERTIFICATIONS
ISO 9001:2015
ISO 27001
ISO 27701
SOC 2 Type II
PROPRIETARY FRAMEWORKS
K-FASTKiwiQA Framework for Automated Software Testing
K-SPARCPerformance testing — from requirements to execution report
K-ASSISTTest strategy consultation and QA process improvement
"

Niranjan & the KiwiQA team have been excellent. They have a high quality team who has demonstrated great ownership and hustle — maintaining a quality bar akin to the top tech companies.

ENTERPRISE CLIENT · AUSTRALIA
INDUSTRIES SERVED
FinTech
Healthcare
Legal
Insurance
Retail
Government
Enterprise SaaS
EdTech
COMMON QUESTIONS

Questions we'd ask too.

We built Grounded for testers — people trained to be sceptical. Here are the objections we take seriously.

STILL HAVE QUESTIONS?

Book a 20-minute call with our team. We'll walk you through Grounded with your own AI product.

EVALUATE FREE · STARTER FROM $29/MONTH

Catch hallucinations
before they cost you.

Start with 50 free runs. If it catches something your team missed — you'll know exactly what Starter is worth.

50 runs free to evaluate
Starter $29/mo · Team $149/mo
14-day trial on all paid plans
No credit card ever
WHAT TESTERS ARE SAYING

Used by teams who take AI quality seriously.

"

We ran Grounded on our clinical decision-support chatbot before go-live. It caught three fabricated drug interaction claims our manual review had completely missed. It's now part of every release.

Head of Clinical AI
MedTech Startup, Sydney
HEALTHCARE
GR-244
"

I'm a QA consultant. Every client now gets a Grounded report as part of the engagement. It takes 10 minutes to run a full audit and hand over a timestamped PDF. My clients love having something concrete.

Senior QA Consultant
Independent, Melbourne
SAAS
GR-481
"

Our legal AI was confidently citing cases that don't exist. Grounded flagged it on the first run. We've since tightened the system prompt and our average GR score went from GR-2 to GR-4 in two weeks.

Legal Technology Lead
Law Firm, London
LEGAL
GR-479
"

As a product manager I had no way to answer 'how do we know the AI is accurate?' now I can point to a GR score and an audit trail. It's changed how we talk about AI quality internally.

Product Manager, AI Features
B2B SaaS Company
SAAS/TECH
GR-368
"

We use Grounded before every model update. Last sprint it caught a regression where our new prompt was causing the AI to hallucinate pricing information. Saved us from a customer escalation.

Engineering Lead
E-commerce Platform
FINANCE
GR-591
"

Our compliance team required evidence that AI-generated policy summaries were validated. Grounded's GR reports gave us exactly that — timestamped, structured, ready for audit. Implementation took one afternoon.

Risk & Compliance Manager
Financial Services, Singapore
FINANCE
GR-483
REVIEWS ARE FROM BETA TESTERS AND EARLY USERS · GR SCORES SHOWN ARE FROM ACTUAL TEST RUNS