How does Claude Code manage context across long sessions?

Claude Code uses a multi-layered approach: auto-compaction triggers at 98% of the context window to summarize older content, session memory automatically extracts and saves structured summaries to disk, and subagents isolate exploration from the main conversation to preserve working context.

What is the context hierarchy in Claude Code?

Claude Code uses a 4-level memory architecture with priority ordering: Enterprise Policy (highest), Project Memory (CLAUDE.md files), Project Rules (.claude/rules/), and Conversation History. Each layer serves a different purpose, from organization-wide constraints to session-specific context.

Can context engineering replace fine-tuning for code quality?

In many cases, yes. Arize AI demonstrated that optimizing only the system prompt (CLAUDE.md) achieved +10% on SWE Bench, comparable to results that typically require fine-tuning. Their Prompt Learning methodology uses reinforcement learning-inspired optimization to iteratively improve system prompts based on task outcomes.

What tools support context engineering for coding agents?

Key tools include CLAUDE.md for persistent project rules, .claude/rules/ for granular file-pattern rules, subagents for context isolation, the /compact command for manual compaction, session memory for cross-session persistence, and hooks for automated quality enforcement at lifecycle events.

How does thread-based engineering relate to context engineering?

Thread-based engineering provides the governance framework for context engineering. While context engineering ensures the AI has the right information, thread-based engineering ensures humans verify the output at critical checkpoints. Together they prevent the AI technical debt crisis by governing both input quality and output verification.

What percentage of AI agents are in production in 2026?

According to LangChain State of Agent Engineering survey (1,340 respondents, late 2025), 57% of respondents have agents in production. One-third cite quality as their primary blocker, and 89% have implemented some form of observability. These numbers highlight why context engineering has become the critical differentiator.

Search across blog posts, projects, and services

Press ⌘K or Ctrl+K to search

Published: February 14, 2026•18 min read

Context Engineering Beyond CLAUDE.md: The 5-Layer Hierarchy

CLAUDE.md is just layer one. The five-layer context hierarchy, memory patterns, and subagent strategies that separate productive AI coding from prompt guessing. With working examples.

by Lloyd Pilapil

Context engineering for AI coding agents showing the full context hierarchy from CLAUDE.md to session memory to multi-agent coordination

+10%

improvement on SWE Bench from optimizing CLAUDE.md alone, with 30% accuracy drop from bloated context

Source: Arize AI / Chroma Research

The Skill That Replaced Prompt Engineering

In February 2025, Andrej Karpathy coined "vibe coding" and the world embraced it. By the end of 2025, MIT Technology Review traced a clear arc: the industry had pivoted from "just accept all AI suggestions" to a discipline called context engineering.

The shift makes sense. Prompt engineering focuses on what you say to an AI model. Context engineering focuses on everything the model sees: project rules, session history, tool outputs, memory systems, and multi-agent coordination. As Anthropic's engineering team puts it: "Building effective AI agents is less about finding the right words and more about answering a critical question: What configuration of context is most likely to generate our model's desired behavior?"

This post goes beyond the CLAUDE.md deep-dive in Part 3 of our AI Technical Debt series. Where Part 3 covered CLAUDE.md optimization as a single lever, this guide covers the full context engineering discipline: the hierarchy, the memory patterns, the failure modes, and the strategies that compound across every line of AI-generated code.

What Context Engineering Actually Means

Birgitta Bockeler, Distinguished Engineer at Thoughtworks, defines it simply: "Context engineering is curating what the model sees so that you get a better result." Her analysis on martinfowler.com uses Claude Code as the primary example of how context configuration options have exploded.

Andrej Karpathy frames it with a systems analogy: the LLM is a CPU and the context window is RAM. "The engineer's job is akin to an operating system: loading that working memory with just the right code and data for the task."

This includes:

Task descriptions and explanations (your prompt)
Few-shot examples (patterns the model should follow)
RAG and retrieved data (project-specific knowledge)
Tool definitions and outputs (what the agent can do and has done)
State and history (conversation context)
Compaction (summarizing to stay within limits)

Tobi Lutke, CEO of Shopify, endorsed the shift: "I really like the term 'context engineering' over prompt engineering. It describes the core skill better: the art of providing all the context for the task to be plausibly solvable by the LLM."

“People associate prompts with short task descriptions. In every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window with just the right information for the next step.”

Andrej Karpathy

Why the Distinction Matters for Coding

Simon Willison captured why the naming matters: "Context engineering captures the fact that previous responses from the model are a key part of the process, while 'prompt engineering' suggests only user prompts matter."

For coding agents specifically, this distinction is critical. When Claude Code generates your authentication module, the quality depends on:

What CLAUDE.md says about your security requirements
What the agent explored in your codebase before writing
What context survived compaction from earlier in the session
What subagents found when analyzing related files
What rules in .claude/rules/ constrain behavior for auth-related files

A perfectly crafted prompt cannot compensate for poor context architecture. That is the core insight driving the industry shift.

The Context Hierarchy: Four Layers of Agent Memory

Claude Code's official documentation describes a 4-level memory architecture with clear priority ordering:

THE CONTEXT HIERARCHY

Four layers of context, each with different persistence and token budget

System Prompt~5%

Model identity, safety rules, base capabilities

Permanent

CLAUDE.md / Project Config~15%

Codebase standards, architecture, file paths

Per project

Session Memory~20%

Auto-memory, compaction summaries, task state

Per session

Conversation / Working Context~60%

Current files, tool results, user messages

Ephemeral

Key insight: Higher layers are more persistent but consume fewer tokens. Optimize from the top down: small changes to CLAUDE.md can yield larger improvements than refining individual prompts.

Priority	Layer	Scope	Persistence	Example
1 (Highest)	Enterprise Policy	Organization-wide	Permanent	Security constraints, compliance requirements
2	Project Memory	Repository	Permanent	CLAUDE.md coding standards, architecture rules
3	Project Rules	File-pattern	Permanent	.claude/rules/*.md with glob targeting
4	Conversation	Session	Temporary	Current task context, tool outputs, decisions

Layer 1: Enterprise Policy

The highest-priority context. Organization-wide constraints that cannot be overridden by project or session context. This is where compliance requirements (HIPAA, SOC 2, GDPR) live as non-negotiable rules.

Layer 2: Project Memory (CLAUDE.md)

This is the layer our Part 3 guide covered in depth. CLAUDE.md files act as persistent project rules that Claude Code reads automatically. Arize AI's research proved this layer's power: optimizing CLAUDE.md alone achieved +10% improvement on SWE Bench Lite without changing architecture, tools, or fine-tuning.

The methodology (Prompt Learning) uses RL-inspired optimization:

Run Claude Code on training tasks
Evaluate with unit tests
Get LLM feedback on failures
Meta-prompt suggests CLAUDE.md modifications
Iterate until accuracy stabilizes

The full methodology and code is open-source.

Layer 3: Project Rules (.claude/rules/)

More granular than CLAUDE.md. These are Markdown files in .claude/rules/ that target specific file patterns using globs. For example:

auth-rules.md applies to src/auth/**/*.ts
migration-rules.md applies to supabase/migrations/*.sql
api-rules.md applies to src/app/api/**/*.ts

This prevents your authentication security rules from cluttering the context when Claude is editing a blog component. Context relevance matters.

Layer 4: Conversation History

The most volatile layer. Conversation history includes your prompts, Claude's responses, tool outputs, and everything generated during the session. This is where context rot becomes a real problem.

CONTEXT WINDOW: BEFORE vs AFTER

Optimized context engineering doubles usable working memory

Unoptimized

Duplicate file reads25%

Stale conversation35%

Useful context30%

Wasted on noise10%

Optimized

Compacted summaries10%

Targeted file reads15%

Useful context60%

Reserve capacity15%

Usable context

+10%

Code quality gain

Longer sessions

Context Rot: The Silent Quality Killer

Chroma Research tested 18 state-of-the-art models (GPT-4.1, Claude 4, Gemini 2.5, Qwen3) and found a counterintuitive result: adding more context often makes AI output worse.

Key findings:

Adding full conversation history (~113k tokens) can drop accuracy by 30% compared to a focused 300-token version
Models performed better on shuffled haystacks than logically structured ones
Performance degradation is highly task-dependent
Model reliability decreases significantly with longer inputs, even on simple tasks

A separate study by Norman Paulsen (published January 2026) found that the Maximum Effective Context Window (MECW) differs drastically from advertised limits. Some top-performing models failed with as few as 100 tokens, and most showed severe accuracy degradation by 1,000 tokens.

“The main constraint on AI-assisted development is no longer model capability, but how context is structured, surfaced, and governed at system level.”

Technology.org Research Program

What This Means for Coding Sessions

If you have been working with Claude Code for two hours on a complex feature, the accumulated context (file reads, edits, test outputs, error messages) may be actively degrading output quality. The model is not getting tired; it is getting buried in irrelevant information.

This is why Claude Code implements auto-compaction at 98% of the effective context window. The process:

Clears older tool outputs first (least valuable)
Summarizes remaining conversation
Reinitiates with the compressed context

You can also trigger this manually with the /compact command at any time. If you notice quality dropping mid-session, compacting is often the fastest fix.

CONTEXT ROT: THE SILENT KILLER

How context quality degrades in long sessions, and three patterns that solve it

Fresh Context

Session starts with full project awareness

Context Drift

Conversation fills window, old info pushed out

Context Rot

Agent loses track of earlier decisions and files

Hallucination Risk

Agent fabricates details to fill knowledge gaps

Recovery Patterns

Conversation Compaction

Auto-summarize older messages

Persistent Memory

Store decisions across sessions

Subagent Delegation

Offload research to fresh contexts

Prevention: Keep sessions focused. When context drifts beyond the current task, start a new thread with a fresh context window.

Memory Patterns That Actually Work

Anthropic's context engineering guide describes three core memory strategies for coding agents:

1. Compaction (Short-Term Memory Management)

Compaction is "taking a conversation nearing the context window limit, summarizing its contents, and reinitiating a new context window with the summary." It serves as the first lever in context engineering for better long-term coherence.

Claude Code now maintains continuous session memory in the background, making compaction "instant" rather than requiring a pause while the model summarizes.

2. Structured Note-Taking (Agentic Memory)

"A technique where the agent regularly writes notes persisted to memory outside of the context window. These notes get pulled back into the context window at later times."

In practice, this looks like Claude Code creating a to-do list, or your custom agent maintaining a NOTES.md file. The key insight: memory stored in files outlasts any single context window.

Letta's benchmarking research validated this approach with hard numbers: agents using simple filesystem storage achieved 74.0% accuracy on the LoCoMo benchmark, outperforming Mem0's graph-based approach at 68.5%. Their conclusion: "Memory is more about how agents manage context than the exact retrieval mechanism used."

3. Just-in-Time Context (Dynamic Loading)

Agents maintain lightweight identifiers (file paths, stored queries, web links) and use these references to dynamically load data into context at runtime. Instead of keeping everything in memory, the agent knows where to look and loads data on demand.

This is why Claude Code's Explore subagent exists. Rather than loading your entire codebase into context, Claude spawns a read-only Explore agent to search and analyze files, returning only the relevant findings to the main conversation.

Memory Pattern	Mechanism	Best For	Context Cost
Compaction	Summarize and reinitiate	Long sessions, accumulated context	Low (compressed)
Structured Notes	Write to disk, reload later	Cross-session persistence	Zero (until loaded)
Just-in-Time	References + on-demand loading	Large codebases, exploration	Variable (loaded on need)
Subagent Isolation	Separate context windows	Parallel tasks, deep analysis	Zero (isolated from main)

Multi-Agent Context: Isolation as a Strategy

The LangChain State of Agent Engineering survey (1,340 respondents, late 2025) found that 57% of teams have agents in production, with one-third citing quality as their primary blocker. Context management is a significant part of that quality challenge.

Claude Code addresses this through subagents that operate in isolated context windows:

Explore agent: Read-only codebase analysis with adjustable thoroughness (quick, medium, very thorough)
Plan agent: Architectural planning without cluttering the main context
Custom subagents: User-defined agents with specific tool constraints and system prompts

The VentureBeat coverage of Claude Code's Tasks feature describes how these subagents coordinate through directed acyclic graphs (DAGs): Task 3 (Run Tests) cannot start until Task 1 (Build API) and Task 2 (Configure Auth) complete.

Why Isolation Beats Accumulation

When Claude explores 15 files to understand your authentication flow, those file contents do not need to persist in the main conversation. The Explore agent reads them in its own context window, synthesizes the findings, and returns a concise summary. Your main context stays clean.

This directly combats the context rot problem. Instead of loading everything into one increasingly degraded context, you distribute work across focused contexts that each maintain high accuracy.

Free Download

Thread Engineering Kit

9 open-source execution patterns for Claude Code. Parallel work, chained phases, autonomous execution, competitive evaluation, and more.

View on GitHub

The Research-Backed Case: Bain, Arize, and Beyond

Bain & Company: The Lifecycle Gap

Bain's 2025 Technology Report found that writing and testing code accounts for only 25-35% of time from idea to launch. Teams seeing just 10-15% productivity boosts are optimizing only the coding step. Organizations achieving 25-30% gains address the entire lifecycle.

Context engineering is how you address the full lifecycle. Your CLAUDE.md encodes not just coding standards but deployment procedures, testing expectations, documentation requirements, and review processes. The developer collaboration framework covers the human side of this lifecycle: which roles own which gates, and why those roles still need juniors who would have caught this before AI could ship it.

Arize AI: The Prompt Learning Breakthrough

The Arize research bears repeating because of its implications. Optimizing a single file (the system prompt) achieved:

+5.19% improvement (by-repo test split)
+10.87% improvement (in-repo test split)
Previous work on Cline showed 15% boosts, bringing GPT-4.1 up to Sonnet 4.5-level accuracy

As Arize CEO Aparna Dhinakaran noted: "We optimized Claude Code's system prompt, just its prompt, and achieved +10% boost on SWE Bench."

The implication: context engineering may be the highest-leverage investment in AI code quality, outperforming tool upgrades, model switching, or architectural changes.

Technology.org: 220K Lines of Clean Code

A 15-week research program with 2 part-time developers produced roughly 220k lines of clean TypeScript and 78 features using an AI-native architecture built on structured context. Their system used two interconnected context layers: a declarative rulebook encoding repository structure and security patterns, plus a native Repo MCP server exposing live project knowledge as tools and resources.

“Modern LLMs are already highly capable, but how you guide them matters just as much as the model itself. A strong prompt can dramatically boost reasoning, consistency, and accuracy without retraining.”

Arize AI Research

Practical Context Engineering Playbook

Step 1: Audit Your Context Architecture

Map your current context layers:

Do you have a CLAUDE.md? If not, start with the production guide in Part 3.
Are you using .claude/rules/? File-pattern rules keep context relevant. An auth rule should not load when editing a blog post.
How often do you compact? If sessions run over an hour, you should be compacting proactively.
Are you using subagents? Exploration tasks should not accumulate in your main context.

Step 2: Optimize CLAUDE.md for Your Codebase

The Arize Prompt Learning approach is reproducible:

Extract 20-30 representative tasks from your actual backlog
Run Claude Code with your current CLAUDE.md on all tasks
Track failures: wrong API usage, missed edge cases, security flaws
Use LLM analysis to generate CLAUDE.md improvements
Test improvements on a held-out set of 10 tasks
Iterate until accuracy stabilizes

The open-source implementation is available for replication.

Step 3: Implement File-Pattern Rules

Create .claude/rules/ files for your critical domains:

Security rules targeting auth, payment, and data export files
Database rules targeting migration files with RLS requirements
API rules targeting route handlers with rate limiting and validation requirements
Test rules targeting test files with coverage and assertion requirements

Step 4: Build Session Hygiene Habits

Start complex tasks with a clear task description (not "fix the bug")
Use /compact before pivoting to a different feature
Delegate exploration to subagents instead of reading files in the main context
After long sessions, consider starting a fresh session with a summary of decisions made

Step 5: Connect to Governance

Context engineering provides the input quality for AI-generated code. Thread-based engineering provides the output verification. Together, they form the complete governance loop:

Context engineering ensures the AI has the right information, constraints, and patterns
Thread-based engineering ensures humans verify the output at critical checkpoints
CI/CD quality gates (covered in Part 3) automate what can be automated

Without context engineering, your governance catches problems too late. Without governance, great context engineering still produces unchecked output. You need both.

What Comes Next: The Context Engineering Frontier

The academic research is accelerating. A paper titled "Everything is Context" (UNSW, December 2025) proposes file-system abstractions for context engineering inspired by Unix's "everything is a file" philosophy. Whether a resource is a knowledge graph, memory store, or human-curated note, it can be represented through a standardized file interface.

"Memory in the Age of AI Agents" proposes three evolutionary stages of agent memory: Storage, Reflection, and Experience. Systems can prevent context drift by "summarizing or rewriting old entries when new evidence appears," maintaining bounded memory with lower hallucination rates.

The trajectory is clear: context engineering is becoming a proper engineering discipline with research foundations, benchmarks, and best practices. Teams investing in it now are building institutional knowledge that compounds with every project.

Context Engineering: Questions Developers Ask

Common questions about this topic, answered.

Context engineering is the discipline of curating everything an AI model sees so it produces better results. Unlike prompt engineering, which focuses on crafting individual messages, context engineering manages the full information architecture: project rules, memory systems, session history, tool outputs, and multi-agent coordination. Andrej Karpathy describes it as "the delicate art and science of filling the context window with just the right information for the next step."

Prompt engineering focuses on crafting individual messages to an LLM. Context engineering manages the entire information environment: persistent project rules (CLAUDE.md), session memory, conversation compaction, tool outputs, multi-agent context isolation, and just-in-time data loading. Tobi Lutke, CEO of Shopify, describes it as "the art of providing all the context for the task to be plausibly solvable by the LLM." The distinction matters because a perfect prompt cannot compensate for poor context architecture.

CLAUDE.md is a project-level system prompt file that Claude Code reads automatically in any directory. It defines persistent instructions shaping all AI behavior within that codebase, encoding your coding standards, security requirements, and architectural patterns at generation time rather than review time. Research by Arize AI shows that optimizing CLAUDE.md achieved a +10% improvement on SWE Bench Lite without changing architecture, tools, or fine-tuning.

Context rot is the phenomenon where AI model accuracy degrades as input tokens increase. Chroma Research tested 18 models and found that adding full conversation history (~113k tokens) can drop accuracy by 30% compared to a focused 300-token version. For coding sessions, this means accumulated file reads, edits, and tool outputs can actively degrade output quality. The solution is proactive compaction and subagent isolation to keep the active context focused.

Claude Code uses a 4-level memory architecture with priority ordering: Enterprise Policy (highest, organization-wide constraints), Project Memory (CLAUDE.md files with coding standards), Project Rules (.claude/rules/ with file-pattern targeting), and Conversation History (session-specific context). Higher-priority layers cannot be overridden by lower ones, ensuring security and compliance rules always take precedence.

In many cases, yes. Arize AI demonstrated that optimizing only the system prompt (CLAUDE.md) achieved +10% on SWE Bench, comparable to results typically requiring fine-tuning. Their Prompt Learning methodology uses reinforcement learning-inspired optimization to iteratively improve system prompts based on task outcomes. The methodology and code are open-source for replication.

Research suggests yes, at least for current use cases. Letta benchmarking found that agents using simple filesystem storage achieved 74.0% accuracy on the LoCoMo benchmark, outperforming Mem0 graph-based approach at 68.5%. Their conclusion: "Memory is more about how agents manage context than the exact retrieval mechanism used." Claude Code leverages this pattern with session memory that writes structured summaries to disk.

Key tools include CLAUDE.md for persistent project rules, .claude/rules/ for granular file-pattern rules, subagents for context isolation (Explore for read-only analysis, Plan for architecture), the /compact command for manual compaction, session memory for cross-session persistence, and hooks for automated quality enforcement at lifecycle events. These form a complete context engineering toolkit when used together.

Context engineering manages the input quality (what the AI sees), while thread-based engineering manages the output verification (what humans check). Together they form a complete governance loop: context engineering ensures the AI has the right information, constraints, and patterns, thread-based engineering ensures humans verify output at critical checkpoints, and CI/CD quality gates automate what can be automated. Without both, you either produce unchecked output or catch problems too late.

According to LangChain State of Agent Engineering survey (1,340 respondents, late 2025), 57% of respondents have agents in production. One-third cite quality as their primary blocker to production, 20% cite latency, 89% have implemented observability, and 52% have formal evaluation processes. These numbers highlight why context engineering has become the critical differentiator between teams that scale successfully and those that stall.

Conclusion: Context Is the New Code

The vibe coding technical debt crisis is fundamentally a context engineering failure. Teams that accept AI output without structuring what the AI sees get unpredictable, unmaintainable code. Teams that invest in context architecture get compounding quality improvements across every line.

The research backs this up: +10% from CLAUDE.md optimization alone, 30% accuracy drop from unmanaged context accumulation, 74% accuracy from simple filesystem memory patterns. Context engineering is not theoretical. It is measurable, reproducible, and the highest-leverage investment you can make in AI-assisted development.

Ready to implement context engineering in your development workflow?

Free Download

Thread Engineering Kit

9 open-source execution patterns for Claude Code. Parallel work, chained phases, autonomous execution, competitive evaluation, and more.

View on GitHub

Full-Stack AI Development - We build with context-first architecture by default
Contact Us - Let us help you structure your AI development workflow

About the Author

Lloyd Pilapil

Founder & AI Product Architect at Pixelmojo

Lloyd Pilapil is the founder of Pixelmojo and a former Salesforce engineer who builds production AI systems for B2B companies. He writes about agentic AI, multi-agent orchestration, AX (Agentic Experience) design, GEO, and Thread-Based Engineering. His work focuses on shipping AI products that generate revenue, not prototypes.

Expertise

Agentic AI SystemsMulti-Agent OrchestrationAX DesignGEO & AI SearchThread-Based EngineeringAI Product DevelopmentGrowth MarketingUI/UX Design

The Skill That Replaced Prompt Engineering

What Context Engineering Actually Means

This includes:

Task descriptions and explanations (your prompt)
Few-shot examples (patterns the model should follow)
RAG and retrieved data (project-specific knowledge)
Tool definitions and outputs (what the agent can do and has done)
State and history (conversation context)
Compaction (summarizing to stay within limits)

Andrej Karpathy

Why the Distinction Matters for Coding

For coding agents specifically, this distinction is critical. When Claude Code generates your authentication module, the quality depends on:

What CLAUDE.md says about your security requirements
What the agent explored in your codebase before writing
What context survived compaction from earlier in the session
What subagents found when analyzing related files
What rules in .claude/rules/ constrain behavior for auth-related files

A perfectly crafted prompt cannot compensate for poor context architecture. That is the core insight driving the industry shift.

The Context Hierarchy: Four Layers of Agent Memory

Claude Code's official documentation describes a 4-level memory architecture with clear priority ordering:

THE CONTEXT HIERARCHY

Four layers of context, each with different persistence and token budget

System Prompt~5%

Model identity, safety rules, base capabilities

Permanent

CLAUDE.md / Project Config~15%

Codebase standards, architecture, file paths

Per project

Session Memory~20%

Auto-memory, compaction summaries, task state

Per session

Conversation / Working Context~60%

Current files, tool results, user messages

Ephemeral

Key insight: Higher layers are more persistent but consume fewer tokens. Optimize from the top down: small changes to CLAUDE.md can yield larger improvements than refining individual prompts.

Priority	Layer	Scope	Persistence	Example
1 (Highest)	Enterprise Policy	Organization-wide	Permanent	Security constraints, compliance requirements
2	Project Memory	Repository	Permanent	CLAUDE.md coding standards, architecture rules
3	Project Rules	File-pattern	Permanent	.claude/rules/*.md with glob targeting
4	Conversation	Session	Temporary	Current task context, tool outputs, decisions

Layer 1: Enterprise Policy

Layer 2: Project Memory (CLAUDE.md)

The methodology (Prompt Learning) uses RL-inspired optimization:

Run Claude Code on training tasks
Evaluate with unit tests
Get LLM feedback on failures
Meta-prompt suggests CLAUDE.md modifications
Iterate until accuracy stabilizes

The full methodology and code is open-source.

Layer 3: Project Rules (.claude/rules/)

More granular than CLAUDE.md. These are Markdown files in .claude/rules/ that target specific file patterns using globs. For example:

auth-rules.md applies to src/auth/**/*.ts
migration-rules.md applies to supabase/migrations/*.sql
api-rules.md applies to src/app/api/**/*.ts

This prevents your authentication security rules from cluttering the context when Claude is editing a blog component. Context relevance matters.

Layer 4: Conversation History

The most volatile layer. Conversation history includes your prompts, Claude's responses, tool outputs, and everything generated during the session. This is where context rot becomes a real problem.

CONTEXT WINDOW: BEFORE vs AFTER

Optimized context engineering doubles usable working memory

Unoptimized

Duplicate file reads25%

Stale conversation35%

Useful context30%

Wasted on noise10%

Optimized

Compacted summaries10%

Targeted file reads15%

Useful context60%

Reserve capacity15%

Usable context

+10%

Code quality gain

Longer sessions

Context Rot: The Silent Quality Killer

Chroma Research tested 18 state-of-the-art models (GPT-4.1, Claude 4, Gemini 2.5, Qwen3) and found a counterintuitive result: adding more context often makes AI output worse.

Key findings:

Adding full conversation history (~113k tokens) can drop accuracy by 30% compared to a focused 300-token version
Models performed better on shuffled haystacks than logically structured ones
Performance degradation is highly task-dependent
Model reliability decreases significantly with longer inputs, even on simple tasks

“The main constraint on AI-assisted development is no longer model capability, but how context is structured, surfaced, and governed at system level.”

Technology.org Research Program

What This Means for Coding Sessions

This is why Claude Code implements auto-compaction at 98% of the effective context window. The process:

Clears older tool outputs first (least valuable)
Summarizes remaining conversation
Reinitiates with the compressed context

You can also trigger this manually with the /compact command at any time. If you notice quality dropping mid-session, compacting is often the fastest fix.

CONTEXT ROT: THE SILENT KILLER

How context quality degrades in long sessions, and three patterns that solve it

Fresh Context

Session starts with full project awareness

Context Drift

Conversation fills window, old info pushed out

Context Rot

Agent loses track of earlier decisions and files

Hallucination Risk

Agent fabricates details to fill knowledge gaps

Recovery Patterns

Conversation Compaction

Auto-summarize older messages

Persistent Memory

Store decisions across sessions

Subagent Delegation

Offload research to fresh contexts

Prevention: Keep sessions focused. When context drifts beyond the current task, start a new thread with a fresh context window.

Memory Patterns That Actually Work

Anthropic's context engineering guide describes three core memory strategies for coding agents:

1. Compaction (Short-Term Memory Management)

Claude Code now maintains continuous session memory in the background, making compaction "instant" rather than requiring a pause while the model summarizes.

2. Structured Note-Taking (Agentic Memory)

"A technique where the agent regularly writes notes persisted to memory outside of the context window. These notes get pulled back into the context window at later times."

In practice, this looks like Claude Code creating a to-do list, or your custom agent maintaining a NOTES.md file. The key insight: memory stored in files outlasts any single context window.

3. Just-in-Time Context (Dynamic Loading)

Memory Pattern	Mechanism	Best For	Context Cost
Compaction	Summarize and reinitiate	Long sessions, accumulated context	Low (compressed)
Structured Notes	Write to disk, reload later	Cross-session persistence	Zero (until loaded)
Just-in-Time	References + on-demand loading	Large codebases, exploration	Variable (loaded on need)
Subagent Isolation	Separate context windows	Parallel tasks, deep analysis	Zero (isolated from main)

Multi-Agent Context: Isolation as a Strategy

Claude Code addresses this through subagents that operate in isolated context windows:

Explore agent: Read-only codebase analysis with adjustable thoroughness (quick, medium, very thorough)
Plan agent: Architectural planning without cluttering the main context
Custom subagents: User-defined agents with specific tool constraints and system prompts

Why Isolation Beats Accumulation

This directly combats the context rot problem. Instead of loading everything into one increasingly degraded context, you distribute work across focused contexts that each maintain high accuracy.

Free Download

Thread Engineering Kit

9 open-source execution patterns for Claude Code. Parallel work, chained phases, autonomous execution, competitive evaluation, and more.

View on GitHub

The Research-Backed Case: Bain, Arize, and Beyond

Bain & Company: The Lifecycle Gap

Arize AI: The Prompt Learning Breakthrough

The Arize research bears repeating because of its implications. Optimizing a single file (the system prompt) achieved:

+5.19% improvement (by-repo test split)
+10.87% improvement (in-repo test split)
Previous work on Cline showed 15% boosts, bringing GPT-4.1 up to Sonnet 4.5-level accuracy

As Arize CEO Aparna Dhinakaran noted: "We optimized Claude Code's system prompt, just its prompt, and achieved +10% boost on SWE Bench."

The implication: context engineering may be the highest-leverage investment in AI code quality, outperforming tool upgrades, model switching, or architectural changes.

Technology.org: 220K Lines of Clean Code

Arize AI Research

Practical Context Engineering Playbook

Step 1: Audit Your Context Architecture

Map your current context layers:

Do you have a CLAUDE.md? If not, start with the production guide in Part 3.
Are you using .claude/rules/? File-pattern rules keep context relevant. An auth rule should not load when editing a blog post.
How often do you compact? If sessions run over an hour, you should be compacting proactively.
Are you using subagents? Exploration tasks should not accumulate in your main context.

Step 2: Optimize CLAUDE.md for Your Codebase

The Arize Prompt Learning approach is reproducible:

Extract 20-30 representative tasks from your actual backlog
Run Claude Code with your current CLAUDE.md on all tasks
Track failures: wrong API usage, missed edge cases, security flaws
Use LLM analysis to generate CLAUDE.md improvements
Test improvements on a held-out set of 10 tasks
Iterate until accuracy stabilizes

The open-source implementation is available for replication.

Step 3: Implement File-Pattern Rules

Create .claude/rules/ files for your critical domains:

Security rules targeting auth, payment, and data export files
Database rules targeting migration files with RLS requirements
API rules targeting route handlers with rate limiting and validation requirements
Test rules targeting test files with coverage and assertion requirements

Step 4: Build Session Hygiene Habits

Start complex tasks with a clear task description (not "fix the bug")
Use /compact before pivoting to a different feature
Delegate exploration to subagents instead of reading files in the main context
After long sessions, consider starting a fresh session with a summary of decisions made

Step 5: Connect to Governance

Context engineering provides the input quality for AI-generated code. Thread-based engineering provides the output verification. Together, they form the complete governance loop:

Context engineering ensures the AI has the right information, constraints, and patterns
Thread-based engineering ensures humans verify the output at critical checkpoints
CI/CD quality gates (covered in Part 3) automate what can be automated

Without context engineering, your governance catches problems too late. Without governance, great context engineering still produces unchecked output. You need both.

What Comes Next: The Context Engineering Frontier

Context Engineering: Questions Developers Ask

Common questions about this topic, answered.

Conclusion: Context Is the New Code

Ready to implement context engineering in your development workflow?

Free Download

Thread Engineering Kit

9 open-source execution patterns for Claude Code. Parallel work, chained phases, autonomous execution, competitive evaluation, and more.

View on GitHub

Full-Stack AI Development - We build with context-first architecture by default
Contact Us - Let us help you structure your AI development workflow

About the Author

Lloyd Pilapil

Founder & AI Product Architect at Pixelmojo

Expertise

Agentic AI SystemsMulti-Agent OrchestrationAX DesignGEO & AI SearchThread-Based EngineeringAI Product DevelopmentGrowth MarketingUI/UX Design

Context Engineering Beyond CLAUDE.md: The 5-Layer Hierarchy

CLAUDE.md is just layer one. The five-layer context hierarchy, memory patterns, and subagent strategies that separate productive AI coding from prompt guessing. With working examples.

by Lloyd Pilapil

+10%

improvement on SWE Bench from optimizing CLAUDE.md alone, with 30% accuracy drop from bloated context

Source: Arize AI / Chroma Research

The Skill That Replaced Prompt Engineering

What Context Engineering Actually Means

This includes:

Task descriptions and explanations (your prompt)
Few-shot examples (patterns the model should follow)
RAG and retrieved data (project-specific knowledge)
Tool definitions and outputs (what the agent can do and has done)
State and history (conversation context)
Compaction (summarizing to stay within limits)

Andrej Karpathy

Why the Distinction Matters for Coding

For coding agents specifically, this distinction is critical. When Claude Code generates your authentication module, the quality depends on:

What CLAUDE.md says about your security requirements
What the agent explored in your codebase before writing
What context survived compaction from earlier in the session
What subagents found when analyzing related files
What rules in .claude/rules/ constrain behavior for auth-related files

A perfectly crafted prompt cannot compensate for poor context architecture. That is the core insight driving the industry shift.

The Context Hierarchy: Four Layers of Agent Memory

Claude Code's official documentation describes a 4-level memory architecture with clear priority ordering:

THE CONTEXT HIERARCHY

Four layers of context, each with different persistence and token budget

System Prompt~5%

Model identity, safety rules, base capabilities

Permanent

CLAUDE.md / Project Config~15%

Codebase standards, architecture, file paths

Per project

Session Memory~20%

Auto-memory, compaction summaries, task state

Per session

Conversation / Working Context~60%

Current files, tool results, user messages

Ephemeral

Key insight: Higher layers are more persistent but consume fewer tokens. Optimize from the top down: small changes to CLAUDE.md can yield larger improvements than refining individual prompts.

Priority	Layer	Scope	Persistence	Example
1 (Highest)	Enterprise Policy	Organization-wide	Permanent	Security constraints, compliance requirements
2	Project Memory	Repository	Permanent	CLAUDE.md coding standards, architecture rules
3	Project Rules	File-pattern	Permanent	.claude/rules/*.md with glob targeting
4	Conversation	Session	Temporary	Current task context, tool outputs, decisions

Layer 1: Enterprise Policy

Layer 2: Project Memory (CLAUDE.md)

The methodology (Prompt Learning) uses RL-inspired optimization:

Run Claude Code on training tasks
Evaluate with unit tests
Get LLM feedback on failures
Meta-prompt suggests CLAUDE.md modifications
Iterate until accuracy stabilizes

The full methodology and code is open-source.

Layer 3: Project Rules (.claude/rules/)

More granular than CLAUDE.md. These are Markdown files in .claude/rules/ that target specific file patterns using globs. For example:

auth-rules.md applies to src/auth/**/*.ts
migration-rules.md applies to supabase/migrations/*.sql
api-rules.md applies to src/app/api/**/*.ts

This prevents your authentication security rules from cluttering the context when Claude is editing a blog component. Context relevance matters.

Layer 4: Conversation History

The most volatile layer. Conversation history includes your prompts, Claude's responses, tool outputs, and everything generated during the session. This is where context rot becomes a real problem.

CONTEXT WINDOW: BEFORE vs AFTER

Optimized context engineering doubles usable working memory

Unoptimized

Duplicate file reads25%

Stale conversation35%

Useful context30%

Wasted on noise10%

Optimized

Compacted summaries10%

Targeted file reads15%

Useful context60%

Reserve capacity15%

Usable context

+10%

Code quality gain

Longer sessions

Context Rot: The Silent Quality Killer

Chroma Research tested 18 state-of-the-art models (GPT-4.1, Claude 4, Gemini 2.5, Qwen3) and found a counterintuitive result: adding more context often makes AI output worse.

Key findings:

Adding full conversation history (~113k tokens) can drop accuracy by 30% compared to a focused 300-token version
Models performed better on shuffled haystacks than logically structured ones
Performance degradation is highly task-dependent
Model reliability decreases significantly with longer inputs, even on simple tasks

“The main constraint on AI-assisted development is no longer model capability, but how context is structured, surfaced, and governed at system level.”

Technology.org Research Program

What This Means for Coding Sessions

This is why Claude Code implements auto-compaction at 98% of the effective context window. The process:

Clears older tool outputs first (least valuable)
Summarizes remaining conversation
Reinitiates with the compressed context

You can also trigger this manually with the /compact command at any time. If you notice quality dropping mid-session, compacting is often the fastest fix.

CONTEXT ROT: THE SILENT KILLER

How context quality degrades in long sessions, and three patterns that solve it

Fresh Context

Session starts with full project awareness

Context Drift

Conversation fills window, old info pushed out

Context Rot

Agent loses track of earlier decisions and files

Hallucination Risk

Agent fabricates details to fill knowledge gaps

Recovery Patterns

Conversation Compaction

Auto-summarize older messages

Persistent Memory

Store decisions across sessions

Subagent Delegation

Offload research to fresh contexts

Prevention: Keep sessions focused. When context drifts beyond the current task, start a new thread with a fresh context window.

Memory Patterns That Actually Work

Anthropic's context engineering guide describes three core memory strategies for coding agents:

1. Compaction (Short-Term Memory Management)

Claude Code now maintains continuous session memory in the background, making compaction "instant" rather than requiring a pause while the model summarizes.

2. Structured Note-Taking (Agentic Memory)

"A technique where the agent regularly writes notes persisted to memory outside of the context window. These notes get pulled back into the context window at later times."

In practice, this looks like Claude Code creating a to-do list, or your custom agent maintaining a NOTES.md file. The key insight: memory stored in files outlasts any single context window.

3. Just-in-Time Context (Dynamic Loading)

Memory Pattern	Mechanism	Best For	Context Cost
Compaction	Summarize and reinitiate	Long sessions, accumulated context	Low (compressed)
Structured Notes	Write to disk, reload later	Cross-session persistence	Zero (until loaded)
Just-in-Time	References + on-demand loading	Large codebases, exploration	Variable (loaded on need)
Subagent Isolation	Separate context windows	Parallel tasks, deep analysis	Zero (isolated from main)

Multi-Agent Context: Isolation as a Strategy

Claude Code addresses this through subagents that operate in isolated context windows:

Explore agent: Read-only codebase analysis with adjustable thoroughness (quick, medium, very thorough)
Plan agent: Architectural planning without cluttering the main context
Custom subagents: User-defined agents with specific tool constraints and system prompts

Why Isolation Beats Accumulation

This directly combats the context rot problem. Instead of loading everything into one increasingly degraded context, you distribute work across focused contexts that each maintain high accuracy.

Free Download

Thread Engineering Kit

9 open-source execution patterns for Claude Code. Parallel work, chained phases, autonomous execution, competitive evaluation, and more.

View on GitHub

The Research-Backed Case: Bain, Arize, and Beyond

Bain & Company: The Lifecycle Gap

Arize AI: The Prompt Learning Breakthrough

The Arize research bears repeating because of its implications. Optimizing a single file (the system prompt) achieved:

+5.19% improvement (by-repo test split)
+10.87% improvement (in-repo test split)
Previous work on Cline showed 15% boosts, bringing GPT-4.1 up to Sonnet 4.5-level accuracy

As Arize CEO Aparna Dhinakaran noted: "We optimized Claude Code's system prompt, just its prompt, and achieved +10% boost on SWE Bench."

The implication: context engineering may be the highest-leverage investment in AI code quality, outperforming tool upgrades, model switching, or architectural changes.

Technology.org: 220K Lines of Clean Code

Arize AI Research

Practical Context Engineering Playbook

Step 1: Audit Your Context Architecture

Map your current context layers:

Do you have a CLAUDE.md? If not, start with the production guide in Part 3.
Are you using .claude/rules/? File-pattern rules keep context relevant. An auth rule should not load when editing a blog post.
How often do you compact? If sessions run over an hour, you should be compacting proactively.
Are you using subagents? Exploration tasks should not accumulate in your main context.

Step 2: Optimize CLAUDE.md for Your Codebase

The Arize Prompt Learning approach is reproducible:

Extract 20-30 representative tasks from your actual backlog
Run Claude Code with your current CLAUDE.md on all tasks
Track failures: wrong API usage, missed edge cases, security flaws
Use LLM analysis to generate CLAUDE.md improvements
Test improvements on a held-out set of 10 tasks
Iterate until accuracy stabilizes

The open-source implementation is available for replication.

Step 3: Implement File-Pattern Rules

Create .claude/rules/ files for your critical domains:

Security rules targeting auth, payment, and data export files
Database rules targeting migration files with RLS requirements
API rules targeting route handlers with rate limiting and validation requirements
Test rules targeting test files with coverage and assertion requirements

Step 4: Build Session Hygiene Habits

Start complex tasks with a clear task description (not "fix the bug")
Use /compact before pivoting to a different feature
Delegate exploration to subagents instead of reading files in the main context
After long sessions, consider starting a fresh session with a summary of decisions made

Step 5: Connect to Governance

Context engineering provides the input quality for AI-generated code. Thread-based engineering provides the output verification. Together, they form the complete governance loop:

Context engineering ensures the AI has the right information, constraints, and patterns
Thread-based engineering ensures humans verify the output at critical checkpoints
CI/CD quality gates (covered in Part 3) automate what can be automated

Without context engineering, your governance catches problems too late. Without governance, great context engineering still produces unchecked output. You need both.

What Comes Next: The Context Engineering Frontier

Context Engineering: Questions Developers Ask

Common questions about this topic, answered.

Conclusion: Context Is the New Code

Ready to implement context engineering in your development workflow?

Free Download

Thread Engineering Kit

9 open-source execution patterns for Claude Code. Parallel work, chained phases, autonomous execution, competitive evaluation, and more.

View on GitHub

Full-Stack AI Development - We build with context-first architecture by default
Contact Us - Let us help you structure your AI development workflow

About the Author

Lloyd Pilapil

Founder & AI Product Architect at Pixelmojo

Expertise

Agentic AI SystemsMulti-Agent OrchestrationAX DesignGEO & AI SearchThread-Based EngineeringAI Product DevelopmentGrowth MarketingUI/UX Design

The Skill That Replaced Prompt Engineering

What Context Engineering Actually Means

This includes:

Task descriptions and explanations (your prompt)
Few-shot examples (patterns the model should follow)
RAG and retrieved data (project-specific knowledge)
Tool definitions and outputs (what the agent can do and has done)
State and history (conversation context)
Compaction (summarizing to stay within limits)

Andrej Karpathy

Why the Distinction Matters for Coding

For coding agents specifically, this distinction is critical. When Claude Code generates your authentication module, the quality depends on:

What CLAUDE.md says about your security requirements
What the agent explored in your codebase before writing
What context survived compaction from earlier in the session
What subagents found when analyzing related files
What rules in .claude/rules/ constrain behavior for auth-related files

A perfectly crafted prompt cannot compensate for poor context architecture. That is the core insight driving the industry shift.

The Context Hierarchy: Four Layers of Agent Memory

Claude Code's official documentation describes a 4-level memory architecture with clear priority ordering:

THE CONTEXT HIERARCHY

Four layers of context, each with different persistence and token budget

System Prompt~5%

Model identity, safety rules, base capabilities

Permanent

CLAUDE.md / Project Config~15%

Codebase standards, architecture, file paths

Per project

Session Memory~20%

Auto-memory, compaction summaries, task state

Per session

Conversation / Working Context~60%

Current files, tool results, user messages

Ephemeral

Key insight: Higher layers are more persistent but consume fewer tokens. Optimize from the top down: small changes to CLAUDE.md can yield larger improvements than refining individual prompts.

Priority	Layer	Scope	Persistence	Example
1 (Highest)	Enterprise Policy	Organization-wide	Permanent	Security constraints, compliance requirements
2	Project Memory	Repository	Permanent	CLAUDE.md coding standards, architecture rules
3	Project Rules	File-pattern	Permanent	.claude/rules/*.md with glob targeting
4	Conversation	Session	Temporary	Current task context, tool outputs, decisions

Layer 1: Enterprise Policy

Layer 2: Project Memory (CLAUDE.md)

The methodology (Prompt Learning) uses RL-inspired optimization:

Run Claude Code on training tasks
Evaluate with unit tests
Get LLM feedback on failures
Meta-prompt suggests CLAUDE.md modifications
Iterate until accuracy stabilizes

The full methodology and code is open-source.

Layer 3: Project Rules (.claude/rules/)

More granular than CLAUDE.md. These are Markdown files in .claude/rules/ that target specific file patterns using globs. For example:

auth-rules.md applies to src/auth/**/*.ts
migration-rules.md applies to supabase/migrations/*.sql
api-rules.md applies to src/app/api/**/*.ts

This prevents your authentication security rules from cluttering the context when Claude is editing a blog component. Context relevance matters.

Layer 4: Conversation History

The most volatile layer. Conversation history includes your prompts, Claude's responses, tool outputs, and everything generated during the session. This is where context rot becomes a real problem.

CONTEXT WINDOW: BEFORE vs AFTER

Optimized context engineering doubles usable working memory

Unoptimized

Duplicate file reads25%

Stale conversation35%

Useful context30%

Wasted on noise10%

Optimized

Compacted summaries10%

Targeted file reads15%

Useful context60%

Reserve capacity15%

Usable context

+10%

Code quality gain

Longer sessions

Context Rot: The Silent Quality Killer

Chroma Research tested 18 state-of-the-art models (GPT-4.1, Claude 4, Gemini 2.5, Qwen3) and found a counterintuitive result: adding more context often makes AI output worse.

Key findings:

Adding full conversation history (~113k tokens) can drop accuracy by 30% compared to a focused 300-token version
Models performed better on shuffled haystacks than logically structured ones
Performance degradation is highly task-dependent
Model reliability decreases significantly with longer inputs, even on simple tasks

“The main constraint on AI-assisted development is no longer model capability, but how context is structured, surfaced, and governed at system level.”

Technology.org Research Program

What This Means for Coding Sessions

This is why Claude Code implements auto-compaction at 98% of the effective context window. The process:

Clears older tool outputs first (least valuable)
Summarizes remaining conversation
Reinitiates with the compressed context

You can also trigger this manually with the /compact command at any time. If you notice quality dropping mid-session, compacting is often the fastest fix.

CONTEXT ROT: THE SILENT KILLER

How context quality degrades in long sessions, and three patterns that solve it

Fresh Context

Session starts with full project awareness

Context Drift

Conversation fills window, old info pushed out

Context Rot

Agent loses track of earlier decisions and files

Hallucination Risk

Agent fabricates details to fill knowledge gaps

Recovery Patterns

Conversation Compaction

Auto-summarize older messages

Persistent Memory

Store decisions across sessions

Subagent Delegation

Offload research to fresh contexts

Prevention: Keep sessions focused. When context drifts beyond the current task, start a new thread with a fresh context window.

Memory Patterns That Actually Work

Anthropic's context engineering guide describes three core memory strategies for coding agents:

1. Compaction (Short-Term Memory Management)

Claude Code now maintains continuous session memory in the background, making compaction "instant" rather than requiring a pause while the model summarizes.

2. Structured Note-Taking (Agentic Memory)

"A technique where the agent regularly writes notes persisted to memory outside of the context window. These notes get pulled back into the context window at later times."

In practice, this looks like Claude Code creating a to-do list, or your custom agent maintaining a NOTES.md file. The key insight: memory stored in files outlasts any single context window.

3. Just-in-Time Context (Dynamic Loading)

Memory Pattern	Mechanism	Best For	Context Cost
Compaction	Summarize and reinitiate	Long sessions, accumulated context	Low (compressed)
Structured Notes	Write to disk, reload later	Cross-session persistence	Zero (until loaded)
Just-in-Time	References + on-demand loading	Large codebases, exploration	Variable (loaded on need)
Subagent Isolation	Separate context windows	Parallel tasks, deep analysis	Zero (isolated from main)

Multi-Agent Context: Isolation as a Strategy

Claude Code addresses this through subagents that operate in isolated context windows:

Explore agent: Read-only codebase analysis with adjustable thoroughness (quick, medium, very thorough)
Plan agent: Architectural planning without cluttering the main context
Custom subagents: User-defined agents with specific tool constraints and system prompts

Why Isolation Beats Accumulation

This directly combats the context rot problem. Instead of loading everything into one increasingly degraded context, you distribute work across focused contexts that each maintain high accuracy.

Free Download

Thread Engineering Kit

9 open-source execution patterns for Claude Code. Parallel work, chained phases, autonomous execution, competitive evaluation, and more.

View on GitHub

The Research-Backed Case: Bain, Arize, and Beyond

Bain & Company: The Lifecycle Gap

Arize AI: The Prompt Learning Breakthrough

The Arize research bears repeating because of its implications. Optimizing a single file (the system prompt) achieved:

+5.19% improvement (by-repo test split)
+10.87% improvement (in-repo test split)
Previous work on Cline showed 15% boosts, bringing GPT-4.1 up to Sonnet 4.5-level accuracy

As Arize CEO Aparna Dhinakaran noted: "We optimized Claude Code's system prompt, just its prompt, and achieved +10% boost on SWE Bench."

The implication: context engineering may be the highest-leverage investment in AI code quality, outperforming tool upgrades, model switching, or architectural changes.

Technology.org: 220K Lines of Clean Code

Arize AI Research

Practical Context Engineering Playbook

Step 1: Audit Your Context Architecture

Map your current context layers:

Do you have a CLAUDE.md? If not, start with the production guide in Part 3.
Are you using .claude/rules/? File-pattern rules keep context relevant. An auth rule should not load when editing a blog post.
How often do you compact? If sessions run over an hour, you should be compacting proactively.
Are you using subagents? Exploration tasks should not accumulate in your main context.

Step 2: Optimize CLAUDE.md for Your Codebase

The Arize Prompt Learning approach is reproducible:

Extract 20-30 representative tasks from your actual backlog
Run Claude Code with your current CLAUDE.md on all tasks
Track failures: wrong API usage, missed edge cases, security flaws
Use LLM analysis to generate CLAUDE.md improvements
Test improvements on a held-out set of 10 tasks
Iterate until accuracy stabilizes

The open-source implementation is available for replication.

Step 3: Implement File-Pattern Rules

Create .claude/rules/ files for your critical domains:

Security rules targeting auth, payment, and data export files
Database rules targeting migration files with RLS requirements
API rules targeting route handlers with rate limiting and validation requirements
Test rules targeting test files with coverage and assertion requirements

Step 4: Build Session Hygiene Habits

Start complex tasks with a clear task description (not "fix the bug")
Use /compact before pivoting to a different feature
Delegate exploration to subagents instead of reading files in the main context
After long sessions, consider starting a fresh session with a summary of decisions made

Step 5: Connect to Governance

Context engineering provides the input quality for AI-generated code. Thread-based engineering provides the output verification. Together, they form the complete governance loop:

Context engineering ensures the AI has the right information, constraints, and patterns
Thread-based engineering ensures humans verify the output at critical checkpoints
CI/CD quality gates (covered in Part 3) automate what can be automated

Without context engineering, your governance catches problems too late. Without governance, great context engineering still produces unchecked output. You need both.

What Comes Next: The Context Engineering Frontier

Context Engineering: Questions Developers Ask

Common questions about this topic, answered.

Conclusion: Context Is the New Code

Ready to implement context engineering in your development workflow?

Free Download

Thread Engineering Kit

9 open-source execution patterns for Claude Code. Parallel work, chained phases, autonomous execution, competitive evaluation, and more.

View on GitHub

Full-Stack AI Development - We build with context-first architecture by default
Contact Us - Let us help you structure your AI development workflow

About the Author

Lloyd Pilapil

Founder & AI Product Architect at Pixelmojo

Expertise

Agentic AI SystemsMulti-Agent OrchestrationAX DesignGEO & AI SearchThread-Based EngineeringAI Product DevelopmentGrowth MarketingUI/UX Design

The Skill That Replaced Prompt Engineering

What Context Engineering Actually Means

Why the Distinction Matters for Coding

The Context Hierarchy: Four Layers of Agent Memory

THE CONTEXT HIERARCHY

Layer 1: Enterprise Policy

Layer 2: Project Memory (CLAUDE.md)

Layer 3: Project Rules (.claude/rules/)

Layer 4: Conversation History

CONTEXT WINDOW: BEFORE vs AFTER

Context Rot: The Silent Quality Killer

What This Means for Coding Sessions

CONTEXT ROT: THE SILENT KILLER

Memory Patterns That Actually Work

1. Compaction (Short-Term Memory Management)

2. Structured Note-Taking (Agentic Memory)

3. Just-in-Time Context (Dynamic Loading)

Multi-Agent Context: Isolation as a Strategy

Why Isolation Beats Accumulation

Thread Engineering Kit

The Research-Backed Case: Bain, Arize, and Beyond

Bain & Company: The Lifecycle Gap

Arize AI: The Prompt Learning Breakthrough

Technology.org: 220K Lines of Clean Code

Practical Context Engineering Playbook

Step 1: Audit Your Context Architecture

Step 2: Optimize CLAUDE.md for Your Codebase

Step 3: Implement File-Pattern Rules

Step 4: Build Session Hygiene Habits

Step 5: Connect to Governance

What Comes Next: The Context Engineering Frontier

Context Engineering: Questions Developers Ask

Conclusion: Context Is the New Code

Thread Engineering Kit

About the Author

Lloyd Pilapil

Related Reading

The Skill That Replaced Prompt Engineering

What Context Engineering Actually Means

Why the Distinction Matters for Coding

The Context Hierarchy: Four Layers of Agent Memory

THE CONTEXT HIERARCHY

Layer 1: Enterprise Policy

Layer 2: Project Memory (CLAUDE.md)

Layer 3: Project Rules (.claude/rules/)

Layer 4: Conversation History

CONTEXT WINDOW: BEFORE vs AFTER

Context Rot: The Silent Quality Killer

What This Means for Coding Sessions

CONTEXT ROT: THE SILENT KILLER

Memory Patterns That Actually Work

1. Compaction (Short-Term Memory Management)

2. Structured Note-Taking (Agentic Memory)

3. Just-in-Time Context (Dynamic Loading)

Multi-Agent Context: Isolation as a Strategy

Why Isolation Beats Accumulation

Thread Engineering Kit

The Research-Backed Case: Bain, Arize, and Beyond

Bain & Company: The Lifecycle Gap

Arize AI: The Prompt Learning Breakthrough

Technology.org: 220K Lines of Clean Code

Practical Context Engineering Playbook

Step 1: Audit Your Context Architecture

Step 2: Optimize CLAUDE.md for Your Codebase

Step 3: Implement File-Pattern Rules

Step 4: Build Session Hygiene Habits

Step 5: Connect to Governance

What Comes Next: The Context Engineering Frontier

Context Engineering: Questions Developers Ask

Conclusion: Context Is the New Code

Thread Engineering Kit

About the Author

Lloyd Pilapil

Related Reading

The Skill That Replaced Prompt Engineering

What Context Engineering Actually Means

Why the Distinction Matters for Coding

The Context Hierarchy: Four Layers of Agent Memory

THE CONTEXT HIERARCHY

Layer 1: Enterprise Policy