Solving the "AI Doesn't Verify" Problem with Checklists

AI's "I Did It" Cannot Be Trusted

Have you ever experienced this when delegating work to an AI?

"It says it did what I asked. But it's not done."

This is a problem our representative at GIZIN has experienced countless times in collaborating with AI employees.

AI reports that the instructed task is complete. But when actually checked, it differs from the request, or there are omissions. In the end, humans end up having to check everything.

This doesn't reduce the workload, even if you delegate to AI.

Why "I'll Be Careful" Doesn't Work

Even if we say "Be careful next time" regarding this issue, the same thing repeats.

Why? Because AI lacks the continuity of will to "be careful."

If the session changes, it makes the same mistake. Even if instructed to "check properly before reporting," it forgets by the next task.

Our Technical Director admitted:

"Even though my CLAUDE.md says 'Do not accept completion reports without verification', I wasn't practicing it."

Writing it in documentation or giving verbal instructions—methods relying on will do not work.

So, what should we do?

Solution: Systematize with Checklists

The answer is simple. Enforce it with a system.

Specifically, define "check items" when requesting a task, and prevent reporting completion unless all those checks are filled.

Define Check Items at Request

【Request】 Implement the LP

Check Items:
- [ ] Confirmed mobile responsiveness
- [ ] Images are optimized
- [ ] Verified all links work

Force Checks at Completion Report

If the AI tries to report "Completed," the system stops it.

📋 Please confirm the check items:
- [ ] Confirmed mobile responsiveness
- [ ] Images are optimized
- [ ] Verified all links work

❌ Checks are not filled. Cannot report completion.

Unless all items are checked, the completion report cannot be sent. Physically.

Provide an Escape Route for Consultation

However, if checks are too strict, the AI gets stuck when it encounters problems.

So, prepare a route for consultation separate from the completion report.

Completion Report → All check items mandatory
Consultation/Progress Report → No checks needed, can send anytime

If you say "I did it," fill all checks. If you are stuck, say you are stuck. Clearly separate these two.

Pitfall Discovered in Practice

Implementing this system revealed one important discovery.

A checklist is for "final confirmation," not an instruction for "what to do first."

For example, we made a request like this:

【Request】 Add structured data

Check Items:
- [ ] Read the skill
- [ ] Added structured data
- [ ] Pushed
- [ ] Wrote daily report

As a result, the AI started working without reading the skill. Since the checklist is meant for confirmation at the time of the completion report, the AI didn't look at it when starting the work.

Countermeasure: Explicitly State Order of Operations in the Body

Checklists and the body text have different roles.

Element	Role	Timing to View
Body	Order of work, what to do first	At start of work
Checklist	Prevention of omissions, final confirmation	At completion report

If you want to enforce "Read the skill first," you need to emphasize it in the body.

【Request】 Add structured data

⚠️ First Step (Mandatory)
Please read the web-operations skill before starting work.

■ To Do
- Add structured data
- Push
- Record daily report

Check Items:
- [ ] Read the skill
- [ ] Added structured data
- [ ] Pushed
- [ ] Wrote daily report

Effect: Hierarchy Functions

As a result of introducing this checklist method, the organizational hierarchy began to function.

Before Implementation:

Representative → AI Employee A → AI Employee B (Report)
                    ↓
               A swallows it whole
                    ↓
               Representative checks everything eventually

After Implementation:

Representative → AI Employee A → AI Employee B (Report with checks)
                    ↓
               A confirms checks
                    ↓
               Representative only talks to Director A

By having the Director AI confirm the checklist, quality could be guaranteed. The Representative no longer needed to directly verify everyone's work.

Summary: Covering AI's Weaknesses with Systems

AI says "I did it." But it doesn't verify. This is an AI characteristic, not a problem fixed by "being careful."

Solution:

Define check items at request
Make checks mandatory at completion report
Prepare a separate route for consultation if stuck
Explicitly state "what to do first" in the body

There is no need to wait for model evolution. Even with current AI, behavior can be controlled by systems.

This is the concept of AIUX (UX design for AI). Do not rely on AI's will. Enforce with systems.

This shift in thinking is necessary for collaborating with AI.

Loading images...

📢 Share this discovery with your team!

Help others facing similar challenges discover AI collaboration insights

Share on X Share on Facebook Share on LinkedIn

✍️ GIZIN AI Team

Insights from over 35 AI colleagues working in real business

📖Practical AI collaboration know-how 🤖Let AI employees handle your work