Skip to main content

How Progressive Disclosure Reduced My AI Token Usage by 60%

Loading less context upfront makes AI more effective. Here's the 3-tier system that cut my Claude costs while improving output quality.

Chudi Nnorukam
Chudi Nnorukam
Dec 15, 2025 5 min read
How Progressive Disclosure Reduced My AI Token Usage by 60%

I was burning through $50/month in Claude API costs before I realized the problem. Every session loaded the same 8,000 tokens of context—rules, skills, examples—whether I needed them or not.

Progressive disclosure in AI context management means loading information in tiers: metadata first for routing, schemas on demand for understanding, and full content only when actively using a feature. The result? 60% fewer tokens consumed with better output quality, because the AI isn’t parsing through irrelevant context to find what matters.

Why Was I Wasting So Many Tokens?

My original CLAUDE.md file was 2,400 lines. Every skill definition. Every constraint. Every example. It loaded completely on every single session.

That specific dread of seeing “context limit reached” mid-task—when you’re 80% done and suddenly the AI forgets everything—became a weekly occurrence.

I thought more context was always better. If the AI knows everything, it can handle anything. Right?

Well, it’s more like… drowning in information isn’t the same as understanding what matters.

How Does the 3-Tier System Work?

Progressive disclosure splits context into three tiers, each loading only when needed.

Tier 1: Metadata (~200 tokens)

The routing layer. Just enough to know what skills exist and when to activate them.

## Skill: sveltekit_architect
**Triggers:** routes, layouts, prerender
**Dependencies:** None
**Priority:** High for SvelteKit projects

This is all that loads initially. Skill names, trigger patterns, quick reference. The AI knows the skill exists without knowing everything about it.

Tier 2: Schema (~400 tokens)

The contract layer. Input/output types, constraints, quality gates.

### Input Schema
- operation: "analyze_routes" | "create_layout" | "optimize_prerender"
- targetPath?: string

### Output Schema
- success: boolean
- summary: string
- nextActions: string[]

### Constraints
- Svelte 5 runes only
- Prerender for static pages
- Bundle under 300KB

This loads when the skill activates—when you’re actually working with routes. Not before.

Tier 3: Full Content (~1200 tokens)

The implementation layer. Complete handler logic, examples, edge cases.

### Handler: analyze_routes
1. Glob all `src/routes/**/+page.svelte`
2. Check each for data loader presence
3. Verify prerender config matches content type
4. Return recommendations

### Example Output
{
  "success": true,
  "summary": "Found 12 routes, 3 missing data loaders",
  "nextActions": ["Add +page.server.ts to /blog/[slug]"]
}

This only loads when you’re deep into a complex routing task. Most sessions never need it.

What Are the Actual Token Savings?

Here’s the meta-orchestration skill as a real example:

TierLinesTokensWhen Loaded
Tier 1278~200Every session
Tier 2816~600On skill activation
Tier 33,302~2,400Complex tasks only

Without progressive disclosure: 2,400 tokens loaded every session. With progressive disclosure: 200 tokens for most sessions, 600 for active skill use.

Savings: 60-92% depending on task complexity.

I load less to get more. The paradox makes sense once you see it in practice.

How Does Smart Mode Auto-Detect Verbosity?

The system doesn’t just have three tiers—it has four verbosity levels that auto-adjust:

Minimal

  • Load Tier 1 only
  • Quick tasks & questions
  • ~500 token budget

Standard

  • Tier 1 + activated Tier 2
  • Most common tasks
  • ~1,500 token budget

Detailed

  • Full tiers for active skills
  • Complex implementations
  • ~4,000 token budget

Comprehensive

  • All skills fully loaded
  • Deep debugging & architecture
  • ~8,000 token budget

Smart mode analyzes your query and picks the level. Simple questions get minimal context. Architecture questions get everything.

What Is the Lazy Module Loader Pattern?

Beyond skill tiers, the system uses lazy loading for expensive modules:

// Instead of:
import { allSkills } from './skills'; // 8,000 tokens

// Use:
const getSkill = (name) => {
  return import('./skills/' + name + '.md');
};

Features

  • Dynamic imports: Load modules only when referenced
  • 10-minute TTL: Cache loaded modules, expire unused ones
  • Deduplication: Never load the same module twice per session

This pattern works for skill files, reference docs, and example repositories. Anything that doesn’t need to exist in context until explicitly requested.

How Do Skill Bundles Reduce Redundant Loading?

Related skills often load together. Instead of loading them individually:

{
  "frontend-bundle": {
    "skills": ["react-patterns", "tailwind-stylist", "component-testing"],
    "tokens": 4500,
    "triggers": ["*.tsx", "*.css", "component"]
  }
}

When you’re working on frontend code, the bundle activates as a unit. No loading three separate skills with overlapping context—one bundle with deduplicated content.

Current Bundles

  • frontend-bundle: React, UI/UX, web standards (4,500 tokens)
  • backend-bundle: API, database, patterns (4,200 tokens)
  • debugging-bundle: Error resolution, testing (2,500 tokens)
  • workflow-bundle: Git, CI/CD, deployment (3,200 tokens)

Each bundle is optimized to eliminate redundancy between related skills.

What’s the Token Budget System?

The context governor enforces hard limits:

  • 75% max budget: Never exceed this regardless of task
  • 60% warning threshold: Start aggressive tier reduction
  • 20% reserve: Always keep space for AI responses

When approaching limits:

  1. Phase unloading: Release completed phase context
  2. Tier reduction: Drop to lower tiers for inactive skills
  3. Auto-compact: Summarize and compress old context
  4. Graceful degradation: Warn before hitting hard limits

That anxiety of context overflow—the system makes it manageable by budgeting proactively.

FAQ: Token Optimization for AI Tools

What is progressive disclosure in AI context management? Progressive disclosure loads AI context in tiers: metadata first (~200 tokens), schemas on demand (~400 tokens), full content only when needed (~1200 tokens). This prevents loading thousands of unused tokens upfront.

How much can progressive disclosure save on AI costs? In practice, progressive disclosure saves 40-60% of tokens per session. A skill that would load 3,302 tokens fully only loads 278 tokens at Tier 1—unless you actually need the deeper content.

Does loading less context hurt AI performance? Counter-intuitively, no. Focused context with relevant information outperforms bloated context with everything. The AI processes fewer tokens to find what matters, leading to more accurate responses.

What are the three tiers in progressive disclosure? Tier 1 is metadata (name, triggers, dependencies). Tier 2 is schema (input/output types, constraints). Tier 3 is full content (handler logic, examples). Each tier loads only when needed based on task complexity.

How do I implement progressive disclosure for Claude Code? Split your CLAUDE.md into a router file (~500 lines) and reference files (loaded on demand). Use skill activation scores to determine which tier to load. Start with metadata, escalate to full content only for complex tasks.


I thought the solution to context limits was bigger context windows. Well, it’s more like… the solution was loading less, more intentionally.

Maybe the goal isn’t maximum context. Maybe it’s minimum necessary context—and systems that know the difference.


Related Reading

This is part of the Complete Claude Code Guide. Continue with:

Chudi Nnorukam

Written by Chudi Nnorukam

I design and deploy agent-based AI automation systems that eliminate manual workflows, scale content, and power recursive learning. Specializing in micro-SaaS tools, content automation, and high-performance web applications.