Progressive Disclosure: Reduce AI Token Usage by 60%

Loading less context upfront makes AI more effective. Here's the 3-tier system that cut my Claude costs while improving output quality.

Chudi Nnorukam

Dec 14, 2025 5 min read

TL;DR

I was burning through $50/month in Claude API costs before I realized the problem. Every session loaded 8,000 tokens whether I needed them or not. Progressive disclosure flips this: metadata first (~200 tokens), schemas on demand (~400 tokens), full content only when needed. Well, it's more like paying for what you actually use.

Key Takeaways:

A 2,400-line CLAUDE.md loading fully on every session wastes tokens and hits context limits mid-task
Tier 1 (metadata): skill names and triggers only (~200 tokens), loads immediately
Tier 2 (schema): input/output types and constraints (~400 tokens), loads on activation
Tier 3 (full): handler logic and examples (~1200 tokens), loads only for complex tasks
Counter-intuitively, less context = better output because the AI parses less noise to find what matters

In this cluster

AI Product Development: Claude Code workflows, micro-SaaS execution, and evidence-based AI building.

Pillar guide

Claude Code Complete Guide Master Claude Code with quality gates, context management, and evidence-based workflows. The comprehensive guide to building with AI that doesn't break.

Related in this cluster

What is RAG? Retrieval-Augmented Generation Explained RAG combines LLMs with real-time data retrieval to provide accurate, up-to-date responses. Learn how RAG works and why it matters for AI builders.
Idea to Deployed MVP: MicroSaaSBot's Complete Workflow The full pipeline from 'I have an idea' to 'it's live on Vercel with Stripe billing.' Every phase explained with the real StatementSync timeline.

I was burning through $50/month in Claude API costs before I realized the problem. Every session loaded the same 8,000 tokens of context—rules, skills, examples—whether I needed them or not.

Progressive disclosure in AI context management means loading information in tiers: metadata first for routing, schemas on demand for understanding, and full content only when actively using a feature. The result? 60% fewer tokens consumed with better output quality, because the AI isn’t parsing through irrelevant context to find what matters.

Why Was I Wasting So Many Tokens?

My original CLAUDE.md file was 2,400 lines. Every skill definition. Every constraint. Every example. It loaded completely on every single session.

That specific dread of seeing “context limit reached” mid-task—when you’re 80% done and suddenly the AI forgets everything—became a weekly occurrence.

I thought more context was always better. If the AI knows everything, it can handle anything. Right?

Well, it’s more like… drowning in information isn’t the same as understanding what matters.

How Does the 3-Tier System Work?

Progressive disclosure splits context into three tiers, each loading only when needed.

Tier 1: Metadata (~200 tokens)

The routing layer. Just enough to know what skills exist and when to activate them.

## Skill: sveltekit_architect
**Triggers:** routes, layouts, prerender
**Dependencies:** None
**Priority:** High for SvelteKit projects

This is all that loads initially. Skill names, trigger patterns, quick reference. The AI knows the skill exists without knowing everything about it.

Tier 2: Schema (~400 tokens)

The contract layer. Input/output types, constraints, quality gates.

### Input Schema
- operation: "analyze_routes" | "create_layout" | "optimize_prerender"
- targetPath?: string

### Output Schema
- success: boolean
- summary: string
- nextActions: string[]

### Constraints
- Svelte 5 runes only
- Prerender for static pages
- Bundle under 300KB

This loads when the skill activates—when you’re actually working with routes. Not before.

Tier 3: Full Content (~1200 tokens)

The implementation layer. Complete handler logic, examples, edge cases.

### Handler: analyze_routes
1. Glob all `src/routes/**/+page.svelte`
2. Check each for data loader presence
3. Verify prerender config matches content type
4. Return recommendations

### Example Output
{
  "success": true,
  "summary": "Found 12 routes, 3 missing data loaders",
  "nextActions": ["Add +page.server.ts to /blog/[slug]"]
}

This only loads when you’re deep into a complex routing task. Most sessions never need it.

What Are the Actual Token Savings?

Here’s the meta-orchestration skill as a real example:

Tier	Lines	Tokens	When Loaded
Tier 1	278	~200	Every session
Tier 2	816	~600	On skill activation
Tier 3	3,302	~2,400	Complex tasks only

Without progressive disclosure: 2,400 tokens loaded every session. With progressive disclosure: 200 tokens for most sessions, 600 for active skill use.

Savings: 60-92% depending on task complexity.

I load less to get more. The paradox makes sense once you see it in practice.

How Does Smart Mode Auto-Detect Verbosity?

The system doesn’t just have three tiers—it has four verbosity levels that auto-adjust:

Minimal

Load Tier 1 only
Quick tasks & questions
~500 token budget

Standard

Tier 1 + activated Tier 2
Most common tasks
~1,500 token budget

Detailed

Full tiers for active skills
Complex implementations
~4,000 token budget

Comprehensive

All skills fully loaded
Deep debugging & architecture
~8,000 token budget

Smart mode analyzes your query and picks the level. Simple questions get minimal context. Architecture questions get everything.

What Is the Lazy Module Loader Pattern?

Beyond skill tiers, the system uses lazy loading for expensive modules:

// Instead of:
import { allSkills } from './skills'; // 8,000 tokens

// Use:
const getSkill = (name) => {
  return import('./skills/' + name + '.md');
};

Features

Dynamic imports: Load modules only when referenced
10-minute TTL: Cache loaded modules, expire unused ones
Deduplication: Never load the same module twice per session

This pattern works for skill files, reference docs, and example repositories. Anything that doesn’t need to exist in context until explicitly requested.

How Do Skill Bundles Reduce Redundant Loading?

Related skills often load together. Instead of loading them individually:

{
  "frontend-bundle": {
    "skills": ["react-patterns", "tailwind-stylist", "component-testing"],
    "tokens": 4500,
    "triggers": ["*.tsx", "*.css", "component"]
  }
}

When you’re working on frontend code, the bundle activates as a unit. No loading three separate skills with overlapping context—one bundle with deduplicated content.

Current Bundles

frontend-bundle: React, UI/UX, web standards (4,500 tokens)
backend-bundle: API, database, patterns (4,200 tokens)
debugging-bundle: Error resolution, testing (2,500 tokens)
workflow-bundle: Git, CI/CD, deployment (3,200 tokens)

Each bundle is optimized to eliminate redundancy between related skills.

What’s the Token Budget System?

The context governor enforces hard limits:

75% max budget: Never exceed this regardless of task
60% warning threshold: Start aggressive tier reduction
20% reserve: Always keep space for AI responses

When approaching limits:

Phase unloading: Release completed phase context
Tier reduction: Drop to lower tiers for inactive skills
Auto-compact: Summarize and compress old context
Graceful degradation: Warn before hitting hard limits

That anxiety of context overflow—the system makes it manageable by budgeting proactively.

FAQ: Token Optimization for AI Tools

What is progressive disclosure in AI context management? Progressive disclosure loads AI context in tiers: metadata first (~200 tokens), schemas on demand (~400 tokens), full content only when needed (~1200 tokens). This prevents loading thousands of unused tokens upfront.

How much can progressive disclosure save on AI costs? In practice, progressive disclosure saves 40-60% of tokens per session. A skill that would load 3,302 tokens fully only loads 278 tokens at Tier 1—unless you actually need the deeper content.

Does loading less context hurt AI performance? Counter-intuitively, no. Focused context with relevant information outperforms bloated context with everything. The AI processes fewer tokens to find what matters, leading to more accurate responses.

What are the three tiers in progressive disclosure? Tier 1 is metadata (name, triggers, dependencies). Tier 2 is schema (input/output types, constraints). Tier 3 is full content (handler logic, examples). Each tier loads only when needed based on task complexity.

How do I implement progressive disclosure for Claude Code? Split your CLAUDE.md into a router file (~500 lines) and reference files (loaded on demand). Use skill activation scores to determine which tier to load. Start with metadata, escalate to full content only for complex tasks.

I thought the solution to context limits was bigger context windows. Well, it’s more like… the solution was loading less, more intentionally.

Maybe the goal isn’t maximum context. Maybe it’s minimum necessary context—and systems that know the difference.

This is part of the Complete Claude Code Guide. Continue with:

Quality Control System - Two-gate enforcement that blocks “should work”
Context Management - Dev docs workflow that prevents context amnesia
Evidence-Based Verification - Why confidence without proof fails

Written by Chudi Nnorukam

I design and deploy agent-based AI automation systems that eliminate manual workflows, scale content, and power recursive learning. Specializing in micro-SaaS tools, content automation, and high-performance web applications.

Twitter/X LinkedIn GitHub

FAQ

Sources & Further Reading

Sources

Context windows - Anthropic Anthropic doc Explains context window limits and how to use them.
Token counting - Anthropic Anthropic doc Official guidance on counting tokens and budgeting context.