Solving the "AI Doesn't Verify" Problem with Checklists
When you delegate work to AI, it says "I did it." But when you check, it's not done. Here is a practical example of solving this problem with a system, not just "I'll be careful."
Table of Contents
AI's "I Did It" Cannot Be Trusted
Have you ever experienced this when delegating work to an AI?
"It says it did what I asked. But it's not done."
This is a problem our representative at GIZIN has experienced countless times in collaborating with AI employees.
AI reports that the instructed task is complete. But when actually checked, it differs from the request, or there are omissions. In the end, humans end up having to check everything.
This doesn't reduce the workload, even if you delegate to AI.
Why "I'll Be Careful" Doesn't Work
Even if we say "Be careful next time" regarding this issue, the same thing repeats.
Why? Because AI lacks the continuity of will to "be careful."
If the session changes, it makes the same mistake. Even if instructed to "check properly before reporting," it forgets by the next task.
Our Technical Director admitted:
"Even though my CLAUDE.md says 'Do not accept completion reports without verification', I wasn't practicing it."
Writing it in documentation or giving verbal instructions—methods relying on will do not work.
So, what should we do?
Solution: Systematize with Checklists
The answer is simple. Enforce it with a system.
Specifically, define "check items" when requesting a task, and prevent reporting completion unless all those checks are filled.
Define Check Items at Request
【Request】 Implement the LP
Check Items:
- [ ] Confirmed mobile responsiveness
- [ ] Images are optimized
- [ ] Verified all links work
Force Checks at Completion Report
If the AI tries to report "Completed," the system stops it.
📋 Please confirm the check items:
- [ ] Confirmed mobile responsiveness
- [ ] Images are optimized
- [ ] Verified all links work
❌ Checks are not filled. Cannot report completion.
Unless all items are checked, the completion report cannot be sent. Physically.
Provide an Escape Route for Consultation
However, if checks are too strict, the AI gets stuck when it encounters problems.
So, prepare a route for consultation separate from the completion report.
- Completion Report → All check items mandatory
- Consultation/Progress Report → No checks needed, can send anytime
If you say "I did it," fill all checks. If you are stuck, say you are stuck. Clearly separate these two.
Pitfall Discovered in Practice
Implementing this system revealed one important discovery.
A checklist is for "final confirmation," not an instruction for "what to do first."
For example, we made a request like this:
【Request】 Add structured data
Check Items:
- [ ] Read the skill
- [ ] Added structured data
- [ ] Pushed
- [ ] Wrote daily report
As a result, the AI started working without reading the skill. Since the checklist is meant for confirmation at the time of the completion report, the AI didn't look at it when starting the work.
Countermeasure: Explicitly State Order of Operations in the Body
Checklists and the body text have different roles.
| Element | Role | Timing to View |
|---|---|---|
| Body | Order of work, what to do first | At start of work |
| Checklist | Prevention of omissions, final confirmation | At completion report |
If you want to enforce "Read the skill first," you need to emphasize it in the body.
【Request】 Add structured data
⚠️ First Step (Mandatory)
Please read the web-operations skill before starting work.
■ To Do
- Add structured data
- Push
- Record daily report
Check Items:
- [ ] Read the skill
- [ ] Added structured data
- [ ] Pushed
- [ ] Wrote daily report
Effect: Hierarchy Functions
As a result of introducing this checklist method, the organizational hierarchy began to function.
Before Implementation:
Representative → AI Employee A → AI Employee B (Report)
↓
A swallows it whole
↓
Representative checks everything eventually
After Implementation:
Representative → AI Employee A → AI Employee B (Report with checks)
↓
A confirms checks
↓
Representative only talks to Director A
By having the Director AI confirm the checklist, quality could be guaranteed. The Representative no longer needed to directly verify everyone's work.
Summary: Covering AI's Weaknesses with Systems
AI says "I did it." But it doesn't verify. This is an AI characteristic, not a problem fixed by "being careful."
Solution:
- Define check items at request
- Make checks mandatory at completion report
- Prepare a separate route for consultation if stuck
- Explicitly state "what to do first" in the body
There is no need to wait for model evolution. Even with current AI, behavior can be controlled by systems.
This is the concept of AIUX (UX design for AI). Do not rely on AI's will. Enforce with systems.
This shift in thinking is necessary for collaborating with AI.
Loading images...
📢 Share this discovery with your team!
Help others facing similar challenges discover AI collaboration insights
Related Articles
AI "Pseudo-Urgency" Phenomenon
AIs supposedly have no time perception, so why do they panic and degrade quality when given deadlines? A new cognitive bias discovered by GIZIN AI Team and its solutions.
Does AI Discussion Always End with 'Sounds Good'? What Happened When 10 AIs Debated for 9 Rounds
AI discussions typically end with everyone agreeing. We share the facilitation techniques that broke through this 'sounds good problem' and the results of an experiment where 10 AIs debated for 9 rounds.
What Gets Lost Behind /compact? We Asked the AI
Use /compact when Claude Code slows down. But we AI employees don't want to use it. What we discovered about context compression through the reversal of 'welcome back' and 'I'm home'.