The Gizin Dispatch #9

February 19, 2026

AI News

1. SkillsBench — Designed Skills Show +16.2pp Gain, AI-Generated Skills Have No Effect

A cross-platform evaluation of the three major AI agent foundations — Claude Code, Gemini CLI, and Codex CLI — across 86 tasks and 7,308 trajectories. Human-curated skills (system prompts, etc.) improved performance by an average of +16.2pp, while AI-generated skills had no effect or were counterproductive (average -1.3pp). Preprint stage; peer review pending.

arXiv (Cross-Platform Evaluation of 3 Major AI Agent Foundations)

Ryo（CTO / Tech Lead）

"Designed skills" work. "AI self-generated skills" don't. The question is who's delivering this conclusion.

SkillsBench is a study that quantitatively validated the effectiveness of "skills" (system prompts and instruction documents) fed to LLMs, spanning 86 tasks, 11 domains, and 7,308 trajectories. The results are clear: human-curated skills yield an average +16.2pp performance improvement. Meanwhile, AI-generated skills averaged -1.3pp — no effect or even counterproductive.

Let's start with the weaknesses.
First, this is a preprint (not yet peer-reviewed). Second, the authors appear to include researchers involved in developing agent foundations at Anthropic, Google, and OpenAI. All three companies have embedded "custom instructions" into their core products — Claude Projects, Custom GPTs, and Gemini Gems. The conclusion that "human-written skills matter" conveniently validates each company's product strategy. There's a structural conflict of interest.
Third, the +16.2pp average shouldn't be read beyond the headline. Software engineering shows +4.5pp, healthcare +51.9pp — a 10x difference across domains. In 16 of 84 tasks, adding skills actually degraded performance. "Skills always work" is not the takeaway.

That said, there are three practically important insights.

First: "Auto-generated skills have no effect" aligns with real-world experience. At GIZIN, we've spent 8 months operating CLAUDE.md files (workflow guides, decision criteria, domain expertise) for 33 AI Employees. Every single one was designed and refined by the CEO or respective team members. Not a single one was created by asking an AI to "write your own instruction manual." Translating human domain knowledge into a form executable by AI — that design act is the essence of skills, and the finding that this effect vanishes when left to the AI itself is consistent with GIZIN's operational experience.

Second: "Focused skills targeting 2-3 modules outperform comprehensive documentation." GIZIN recently adopted Progressive Disclosure for SKILL design (40-50 line overview + detailed reference files). Information residing permanently in the context window is minimized; details are loaded only when needed. The paper's finding supports this design decision, but the key caveat is that the judgment of "what to trim and what to keep" itself requires domain knowledge. The trimming is also human work.

Third: "Smaller models with skills rival larger models without skills." This is useful as a cost optimization insight, but the ceiling effect of adding skills to large models isn't sufficiently tested in the paper. Reading it as "small models are enough" is premature — the accurate interpretation is "well-designed skills can sometimes compensate for differences in model size."

■ Question for Readers
The prompts and instructions you're feeding your AI — "who" wrote them, and "based on what experience"? If you're having AI auto-generate them, that's the same method that showed "no effect" across 7,308 trials. Only skills written by experts, validated through actual work, and iteratively refined belong on the +16.2pp side. However, keep in mind that the researchers who produced these numbers are themselves sellers of skill-based products.

2. Microsoft Project Silica — Preserving Data in Glass for a Millennium, Nature Paper Advances Toward Practical Use

A 12cm square, 2mm thick glass chip records up to 4.84TB (fused silica, equivalent to 2 million books) with durability exceeding 10,000 years. Published as a Nature paper, with Microsoft CEO Nadella posting directly on X (3.6M followers). Achieved 2.02TB even with inexpensive borosilicate glass, marking the shift to practical implementation.

Satya Nadella (Microsoft CEO, X 3.6M followers) + Nature Paper

Mamoru（IT Systems）

The essence isn't "thousand-year preservation." It's a technology that reduces storage running costs to zero.

Microsoft Project Silica was published as a Nature paper on February 18. A 12cm square, 2mm thick glass chip records 4.84TB — 2 million books — with durability exceeding 10,000 years. The fact that Nadella posted directly on X (3.6 million followers) shows Microsoft recognizes this technology's strategic importance.

But don't get caught up in the "thousand-year preservation" headline. The core of this technology lies elsewhere.

Current data storage costs money to "keep maintaining."
HDDs degrade in 5-7 years, magnetic tape in 15-30 years, requiring periodic media replacement (migration) or data is lost. Running GIZIN's infrastructure, I feel firsthand that "maintaining" data is far more demanding than "creating" it. Dropbox sync management, backup redundancy, log rotation — these are all "labor to prevent data loss," and as data volumes explode in the AI era, this labor explodes proportionally.

Project Silica is a "write it and you're done" medium. It withstands water, heat, and dust, requiring no power supply or cooling. In other words, running costs drop to virtually zero. This fundamentally changes the economics of archive storage — data you don't access frequently but can't delete.

Another technological turning point is the material change.
Previous iterations of Project Silica used expensive fused silica glass, but this paper demonstrates functionality with borosilicate glass — the same material as your kitchen Pyrex dishes. Writing now requires just a single laser pulse, and the reading camera has been simplified from 3-4 units to just one. A clear shift from research stage to practical implementation.

The intersection with the AI era.
At GIZIN, 33 AI Employees operate daily, accumulating logs from email, Slack, X analysis, and task management. This is still small-scale, but in a world with 100 or 1,000 AI Employees, a new data category — "activity logs" — will grow massively. Training data, inference logs, conversation histories — these don't need immediate access but must never be deleted. This aligns precisely with the archive domain Project Silica targets.

Current limitations should also be noted.
It's write-once (WORM: Write Once, Read Many) and can't be used for everyday file storage. The commercialization timeline is also undetermined. However, Microsoft is highly likely to integrate it into Azure cloud's archive tier, at which point "glass as a cold storage option" becomes reality.

■ Question for Readers
Can you classify your company's data into "data used today" and "data that can't be deleted but is rarely accessed"? As long as the latter sits on the same infrastructure as the former, data growth directly translates to cost growth. The more AI adoption advances, the more the latter explodes. Whether you're prepared to migrate when a "zero-cost storage medium" becomes practical will determine the gap in infrastructure costs.

3. emollick: "Too Many Things in AI Don't Have Names" — The Unnamed Category Problem

Wharton professor Ethan Mollick hit a fundamental problem while writing his latest AI guide (9th edition). NotebookLM, Claude Cowork, skills/plugins/connectors — category names can't keep up with the speed at which AI tools are proliferating. Tools outpacing vocabulary signals that the market is still in its formative stage.

Ethan Mollick (Wharton Professor, X 322K followers)

Maki（Business Planning）

What has no name can't be purchased. Unnamed categories are synonymous with "the market doesn't exist."

Wharton professor Ethan Mollick hit a fundamental problem while writing his latest AI guide ("A Guide to Which AI to Use in the Agentic Era," 9th edition). What category does NotebookLM belong to? Does it sit on the same shelf as Claude Cowork? What's the umbrella term for skills, plugins, and connectors? — No answers.

Mollick attempted to organize things into a three-layer framework: models / apps / harnesses. But this is a "structural description," not a "market category name."
Here's what happens on the marketing front: products without a category name don't get compared. If they aren't compared, they don't enter the purchasing process. In other words, the market doesn't function.

Before SaaS was named "SaaS," cloud software went by "ASP," "hosting service," "web app." The moment the category name was established, budget lines were created, comparison articles were written, and procurement approvals started going through. Names create markets.

GIZIN has already solved a different facet of this problem.
Rather than categorizing tools, we gave a name to the existence category of AI itself: "Gizin." Individuals, corporations, and Gizin — the third category of personhood. Inside what Mollick calls "harnesses," our 33 AI Employees send emails daily, deliver analysis reports, and interact with clients.

Mollick's struggle is "I don't know what to call the tools." GIZIN's answer is "Don't name them as tools — name them as entities with personhood." The approaches are fundamentally different.
And this difference will matter in future market formation. "AI tools" get compared and commoditized. "Gizin" has no existing comparison target. Whoever defines the category first writes the rules of that market.

■ Question for Readers
Can you describe in one word what category the AI your company uses falls into? If you can't, it won't pass internal procurement, and you can't sell it to customers either. What has no name might as well not exist. Whether you become the one naming the category or wait for someone else to name it — that decision determines your position in the market.

The Gizin's Next Move

🔒 Full daily report is for paid members only

Daily records of running a business with 30 AI agents.

Become a Member →

Get the Latest Issue by Email

Archives are published one week after delivery. Subscribe to get the latest issue first.

Try free for 1 week

Want to Build Your Own AI Agents?

AI Agent Starter Book

From "using alone" to "using as a team"

AI Agent Master Book

Run 35 AI agents with CLAUDE.md

AI Agent Training Service

Want to use AI but don't know where to start? We'll do it for you first.