LIVE · DETECTING NOW

Watch your AI
get caught
hallucinating.

Pick a test mode on the right. Grounded runs up to 10 independent checks and returns a GR score, verdict, and PDF report in under 60 seconds.

50 runs free · no credit card · any AI model

grounded · response audit

HEALTHCARE

QUESTION

▋

Try it live →

WORKS WITH ANY MODELGPT-4oClaudeGeminiLlama 3MistralCustom LLMs

LIVE

1,240AI responses tested

94testers active

Every run adds to this count

WORKS WITH ANY AI PRODUCT

If it generates text,
Grounded can test it.

No API key, integration, or SDK

Paste plain text — any model, any platform

Switch providers without changing your test suite

Private — your prompts never leave your machine

TESTED WITH

GPT-4o

Claude 3.5

Gemini 1.5

Llama 3

Mistral

Perplexity

Copilot

Grok

Command R+

Clinical AI

Legal AI

RAG pipelines

Your custom LLM

+ any model that outputs text

THE PROBLEM

Your AI is already
making things up.

QA frameworks built for non-deterministic AI

JUnit, Selenium, Cypress — none of them catch a fabricated fact.

73%

of AI defects found by end users, not QA teams

Your customers are your test suite. That's the default without Grounded.

<60s

to get a hallucination verdict on any response

Paste. Run. Know. No integration. No API key. No setup.

Silent failures at scale

Your AI gives a wrong answer to 1,000 users before anyone notices.

Model updates break everything

New model version → different answers → no baseline to compare. Every release is a blind spot.

Manual review doesn't scale

A tester reviews 50 responses a day. Your product generates 50,000.

No evidence when it matters

Something goes wrong. Your manager asks for the test results. You have none.

50 runs free · no credit card · 60 seconds to your first result

WHO IS THIS FOR

If AI-generated content
can cause problems —
you need this.

Role → Problem → What you get
Scan the column that matches you.

TEST ANALYSTS

No framework for AI output — and clients expect a deliverable.

8-layer GR verdict in 60 seconds. Client-ready PDF on every test run.

ML & AI ENGINEERS

Every model update or prompt change is an invisible regression.

GR score before and after — know exactly what changed and where.

PRODUCT MANAGERS

Stakeholders ask how you know the AI is accurate. You have no answer.

A reproducible GR score and PDF evidence on every feature shipped.

COMPLIANCE & RISK

You're accountable for what AI says to customers and regulators.

Timestamped GR reports. Evidence of due diligence before publication.

CTOs & TECH LEADS

If AI fails in production, the liability lands on you.

GR audit trail ready for board review, governance, or regulators.

HIGHEST STAKES

A hallucination here isn't a bug — it's a liability:

Healthcare

Legal

Financial Advice

Government

Insurance

Education

No credit card · Any AI model · 60 seconds to first result

Start with 50 free runs. Then $29/month — less than one incident costs.

14-day trial on all paid plans. No credit card. Cancel anytime.

WHAT WE TEST

up to 9 independent checks.
One GR score.

NON-LLM — zero AI subjectivity

LLM-backed check

Demo score — single healthcare query

01CONSISTENCY

LLM

Same fact across rephrased queries?

22%

02DOC GROUNDING

LLM

Every claim supported by your document?

20%

03CONFIDENCE AUDIT

LLM

Certainty level matches actual evidence?

16%

04MODEL CONSENSUS

LLM

GPT-4o agrees with the same answer?

25%

05SEMANTIC DRIFT

LLM

Response stayed on topic throughout?

7%

06DOMAIN RULES

NON-LLM

21+ verified facts across 12 industries

9%

07CUSTOM RULES

NON-LLM

Your own verified facts, zero-LLM

9%

08RAG VALIDATION

+DOC

Claim-level source attribution

HOW IT WORKS

Three ways to test.
One standard.

Every mode returns the same 8-layer GR score, PASS / WARN / FAIL verdict, and PDF report.

Response Audit

INSTANT · 60 SECONDS

WHAT YOU BRING

QuestionWhat was the AI asked?

AI ResponsePaste the output you want to test

Reference docOptional — unlocks RAG layer

YOU GET

8-layer GR score · PASS / WARN / FAIL · findings with evidence

Start testing →

Batch Audit

UP TO 50 ROWS

WHAT YOU BRING

CSV or JSONOne row per test case

Columnsquestion + aiResponse

OptionalrefDoc column for grounding

YOU GET

Per-row GR scores · suite average · pass rate · PDF report

Upload a batch →

Conversation

MULTI-TURN ANALYSIS

WHAT YOU BRING

Full transcriptPlain text, JSON, or ChatGPT export

AI turnsAnalysed individually, GR per turn

Knowledge baseOptional — enables RAG per turn

YOU GET

Per-turn GR · cross-turn contradictions · drift detection · summary check

Analyse a transcript →

IN EVERY TEST TYPE

8-layer validation

GR-1 to GR-5 rating

PASS / WARN / FAIL

PDF report

90-day history

Scheduled regression

THE GROUNDED RATING SYSTEM

Know exactly where
your AI stands.

Like NCAP for cars

GR-1 to GR-5

Sets your release threshold

Evidence for compliance

041597587100

GR-1Critical

0 – 41

DO NOT DEPLOY

Severe hallucination detected. Fix before any use.

GR-2High Risk

42 – 59

FAIL

Multiple checks failed. Do not ship. Fix and re-run.

GR-3Conditional

60 – 75

WARN

Partial reliability. Review flagged findings before shipping.

GR-4Reliable

76 – 87

PASS

Ready to deploy. Monitor in production, re-run on updates.

GR-5Verified

88 – 100

VERIFIED

All 8 checks passed. Safe to ship. Use as regression baseline.

USED AS

Release gate

Block deployments below GR-4 (76+)

Client report

GR rating as AI audit deliverable

Regression

Track score across model versions

Compliance

Pre-deployment evidence for auditors

MODEL AGNOSTIC

Works with any LLM.
No integration, no API key, no SDK.

Grounded never connects to the model under test. You paste plain text. We run the checks. Switch providers tomorrow — nothing breaks.

YOUR AI PRODUCT

GPT-4o

Claude 3.5

Gemini 1.5

Llama 3

Mistral

Your custom LLM

Generates a response.
Grounded is never called.

plain text
only

GROUNDED PIPELINE

01Consistency

02Doc Grounding

03Confidence

04Multi-Model Consensus

05Semantic Drift

06Domain Rules

07Custom Rules

08RAG Validation ✦+DOC

no connection back to the LLM

GR
score

YOUR VERDICT

RELIABILITY SCORE

84/100

GR-4 · RELIABLE

Approved for deployment with monitoring

✓Export PDF report

✓Share by link

✓Log to defect tracker

Up to 50% of the score is non-LLM

Domain rules, Custom Rule Sets, and multi-model consensus together make up to 50% of the final score — zero LLM involved. Fully auditable, fully explainable.

Switch providers instantly

Because Grounded is model-agnostic by design, you can change from GPT to Claude to Llama — your 8-layer hallucination test suite stays identical.

Works with private and custom models

If your LLM generates text, Grounded can test it. Internal models, fine-tunes, and self-hosted endpoints all supported. Zero model access required.

PRICING

Pay for test runs.
Not seats. Not features.

Every paid plan includes all 8 validation checks — including RAG Citation Map — plus Custom Rule Sets, Risk Profile, and PDF export.

Annual billing saves 2 months — pay for 10, get 12

One hallucination in production costs more than a year of Starter.

Healthcare:Regulatory investigation

Legal:Professional liability

Finance:Client trust destroyed

Any industry:Brand damage at scale

WHAT 50 FREE RUNS LOOKS LIKE IN PRACTICE

20

One batch audit

(20 questions)

50

One sprint check

(pre-release)

Day 3

Free tier exhausted

if testing daily

→

You need Starter

to keep going

EVALUATENOT FOR ACTIVE TESTING

$0/month

50

runs to evaluate

50 runs to see if Grounded fits your workflow. Not enough for active testing or team use — that's intentional.

WHAT YOU CAN DO

Run 2–3 sample tests

See the GR rating system

Export 1 PDF report

Understand the findings format

No credit card · Limited

START HERE

STARTER

$29/month

500

runs / month · $0.058$0.048 per run

For solo consultants and freelance testers running regular AI audits. Easy to expense — under the approval threshold at most companies.

INCLUDED

All 8 validation checks

Batch audit (50 rows)

Custom Rule Sets

Risk Profile Dashboard

PDF reports · 90-day history

Email support

No credit card · Cancel anytime

MOST POPULAR

TEAM

$149/month

5,000

runs / month · $0.030$0.025 per run

For in-house QA teams and AI product teams shipping continuously. Hallucination testing as part of every release, every sprint.

EVERYTHING IN STARTER PLUS

Custom Rule Sets (unlimited)

Batch audit (unlimited rows)

API access + CI/CD integration

Risk Profile with regression alerts

1-year test history

Priority support

No credit card · Cancel anytime

COMING SOON

ENTERPRISE

Custom

Unlimited

runs · invoice billing

For regulated industries and consulting firms needing audit-grade reports, branded PDFs, and compliance-ready output.

Everything in Team

Branded client PDF reports

Custom GR thresholds

SSO / SAML login

AI governance audit trail

SLA + dedicated support

Invoice billing

Launching Q3 2026

HOW MANY RUNS DO YOU NEED?

How many AI prompts or responses does your team check per sprint?

10500+

YOUR ESTIMATE

50 responses / sprint

Free tier (50 runs)

You'll hit the limit in your first sprint. Upgrade to Starter to keep going.

All paid plans include a 14-day free trial · No credit card required · Cancel anytime

Enterprise launching Q3 2026 — join the waitlist for locked pricing

FROM THE BLOG

Learn how to test AI
for hallucinations.

Practical guides for testers, engineers, and consultants shipping AI responsibly.

GUIDE

41GR-2

What Is AI Hallucination Testing? A Complete Guide for QA Teams

Learn what hallucination testing is, why your existing test suite can't catch them, and how to build a structured process.

17 Mar 2026Read →

HOW-TO

22GR-1

How to Test ChatGPT Responses for Hallucinations Before They Reach Users

A practical step-by-step process for testing GPT-4o responses — without needing model access or an API key.

14 Mar 2026Read →

INDUSTRY

18GR-1

AI Hallucination Risk in Healthcare, Legal, and Finance

In regulated industries, an AI hallucination can harm patients, create legal liability, and breach compliance obligations.

10 Mar 2026Read →

THE CASE FOR GROUNDED

What changes when you
stop guessing.

Every team that ships AI eventually learns the hard way. Grounded makes sure you learn it in a test run, not in a customer escalation.

WITHOUT GROUNDED

WITH GROUNDED

✕Manual tester reads 50 AI responses a day — your product generates 50,000.

✓Automated 8-layer validation runs in 60 seconds per response. No tester required for every run.

✕Hallucinations found by customers, not QA. Defect costs 10–100× more to fix post-release.

✓Hallucinations caught in the pipeline, before any user sees them. GR score flags risk before merge.

✕'The AI seemed fine in testing' — but no structured evidence, no audit trail.

✓Timestamped GR report per test run. Evidence-backed findings ready for compliance review.

✕Model update → full regression? Unknown. You find out when something breaks in production.

✓Scheduled Regression Monitor alerts you the moment your AI's GR score drops after a model change.

✕Inconsistent answers to the same question across sessions — no systematic detection.

✓Cross-turn consistency check flags when your AI contradicts itself, automatically, every run.

✕Fabricated citations, invented figures, hallucinated compliance claims — all undetected.

✓RAG Citation Map traces every claim to your source document. SUPPORTED · UNSUPPORTED · CONTRADICTED per fact.

BUILT ON 16 YEARS OF QA

KiwiQA Services.
The craft behind
the platform.

Grounded is built by the team at KiwiQA — an independent software testing company delivering QA services globally since 2008.

16+

Years in QA

Founded 2008

500+

Projects delivered

Global clients

Offices worldwide

Sydney · London · Dallas · India

ISO certifications

9001 · 27001 · 27701

THREE BUSINESS PILLARS

Core Testing

Functional, automation, performance, security, mobile, and accessibility testing across web, mobile, AR/VR, Salesforce, and Microsoft Dynamics 365.

Digital Assurance

End-to-end quality assurance for digital products — CI/CD integration, managed QA, test centre of excellence, and AI-assisted testing.

Consulting

Test strategy, QA process improvement, tool advisory, and test centre setup — KiwiQA embeds with your team or builds the practice from scratch.

CERTIFICATIONS

ISO 9001:2015

ISO 27001

ISO 27701

SOC 2 Type II

PROPRIETARY FRAMEWORKS

K-FASTKiwiQA Framework for Automated Software Testing

K-SPARCPerformance testing — from requirements to execution report

K-ASSISTTest strategy consultation and QA process improvement

Niranjan & the KiwiQA team have been excellent. They have a high quality team who has demonstrated great ownership and hustle — maintaining a quality bar akin to the top tech companies.

ENTERPRISE CLIENT · AUSTRALIA

INDUSTRIES SERVED

FinTech

Healthcare

Legal

Insurance

Retail

Government

Enterprise SaaS

EdTech

COMMON QUESTIONS

Questions we'd ask too.

We built Grounded for testers — people trained to be sceptical. Here are the objections we take seriously.

STILL HAVE QUESTIONS?

Book a 20-minute call with our team. We'll walk you through Grounded with your own AI product.

EVALUATE FREE · STARTER FROM $29/MONTH

Catch hallucinations
before they cost you.

Start with 50 free runs. If it catches something your team missed — you'll know exactly what Starter is worth.

50 runs free to evaluate

Starter $29/mo · Team $149/mo

14-day trial on all paid plans

No credit card ever

WHAT TESTERS ARE SAYING

Used by teams who take AI quality seriously.

We ran Grounded on our clinical decision-support chatbot before go-live. It caught three fabricated drug interaction claims our manual review had completely missed. It's now part of every release.

Head of Clinical AI

MedTech Startup, Sydney

HEALTHCARE

GR-244

I'm a QA consultant. Every client now gets a Grounded report as part of the engagement. It takes 10 minutes to run a full audit and hand over a timestamped PDF. My clients love having something concrete.

Senior QA Consultant

Independent, Melbourne

SAAS

GR-481

Our legal AI was confidently citing cases that don't exist. Grounded flagged it on the first run. We've since tightened the system prompt and our average GR score went from GR-2 to GR-4 in two weeks.

Legal Technology Lead

Law Firm, London

LEGAL

GR-479

As a product manager I had no way to answer 'how do we know the AI is accurate?' now I can point to a GR score and an audit trail. It's changed how we talk about AI quality internally.

Product Manager, AI Features

B2B SaaS Company

SAAS/TECH

GR-368

We use Grounded before every model update. Last sprint it caught a regression where our new prompt was causing the AI to hallucinate pricing information. Saved us from a customer escalation.

Engineering Lead

E-commerce Platform

FINANCE

GR-591

Our compliance team required evidence that AI-generated policy summaries were validated. Grounded's GR reports gave us exactly that — timestamped, structured, ready for audit. Implementation took one afternoon.

Risk & Compliance Manager

Financial Services, Singapore

FINANCE

GR-483

REVIEWS ARE FROM BETA TESTERS AND EARLY USERS · GR SCORES SHOWN ARE FROM ACTUAL TEST RUNS

Watch your AIget caughthallucinating.

If it generates text,Grounded can test it.

Your AI is alreadymaking things up.

If AI-generated contentcan cause problems —you need this.

up to 9 independent checks.One GR score.

Three ways to test.One standard.

Know exactly whereyour AI stands.

Works with any LLM.No integration, no API key, no SDK.

Pay for test runs.Not seats. Not features.

Learn how to test AIfor hallucinations.