Agentic Architecture & Orchestration
27%An agent is a loop, not a script. You control it by shaping what the model sees and by gating what it can do — never by parsing its prose. When a step MUST happen (verify identity before refunding), enforce it in code; when judgment is needed, give the model goals and context, not procedures.
Design and implement agentic loops for autonomous task execution
An agentic loop is the engine of every agent. You send a request; the model responds with a stop_reason. If it is 'tool_use', the model is asking you to run one or more tools — you execute them, append the results to the conversation, and call again. If it is 'end_turn', the model is done and you return its answer. The model drives WHICH tool to call based on accumulated context; your code only drives the loop's continuation. Crucially, there are two layers here. With the lower-level Client SDK (anthropic.messages.create) you write the while-loop yourself. With the higher-level Agent SDK (query()) Claude runs the loop for you and you consume a stream of messages. The exam tests the conceptual loop, which both share.
Skills to demonstrate
- Continue the loop while stop_reason == 'tool_use'; terminate on 'end_turn'.
- Append the assistant's tool_use message AND the corresponding tool_result(s) before the next call, preserving order and tool_use_id matching.
- Handle multiple tool_use blocks in one response by executing all and returning all results in a single user message.
- Recognize and route other stop reasons: max_tokens (retry/continue), refusal, pause_turn (server tools — append and continue).
Distractor traps
- Parsing natural-language signals ('I'm done now') to decide termination — brittle and wrong.
- Using an arbitrary iteration cap as the PRIMARY stop mechanism instead of stop_reason (a cap is a safety net, not the control).
- Treating presence of assistant text as 'finished' — the model can emit text alongside a tool_use.
- Forgetting to append the assistant turn before tool_results, which corrupts the transcript.
Build the loop by hand with the Client SDK (Python). Give Claude two tools: get_yeschef_booking(booking_id) and lookup_chef_availability(chef_id, date). Hardcode fake data. Write the while-loop yourself, log every stop_reason, and confirm it terminates on end_turn — never on a text check or an iteration cap. Then rebuild the same thing with the Agent SDK query() and note how much loop code disappears.
Check yourself
Your YesChef support agent loop occasionally never returns to the user — it keeps calling tools in circles when a chef's availability tool returns ambiguous data. A teammate proposes capping the loop at 5 iterations and returning whatever text the model last produced. Why is this the wrong primary fix?
In a single response, Claude returns two tool_use blocks: lookup_order and get_customer. What must your loop do before the next API call?
Orchestrate multi-agent systems with coordinator-subagent patterns
The dominant multi-agent shape is hub-and-spoke: a coordinator owns all inter-agent communication, error handling, and routing, while specialized subagents do focused work. Subagents have ISOLATED context — they do not inherit the coordinator's history automatically. The coordinator's real job is decomposition, delegation, aggregation, and deciding which subagents to invoke for a given query. The classic failure is decomposition that is too narrow: the coordinator slices a broad topic into a few sub-slices that quietly omit whole regions of the problem, and every subagent then succeeds at the wrong task.
Skills to demonstrate
- Design coordinators that analyze a query and dynamically pick subagents, rather than always running the full pipeline.
- Partition scope to minimize duplication — assign distinct subtopics or source types to each subagent.
- Build iterative refinement loops: evaluate synthesis for gaps, re-delegate targeted queries, re-synthesize until coverage holds.
- Route ALL subagent communication through the coordinator for observability and consistent error handling.
Distractor traps
- Blaming downstream agents (search, synthesis) when the coordinator's decomposition is the actual root cause.
- Assuming subagents share memory or inherit parent context — they don't.
- Always routing through every subagent regardless of query needs (wasteful, dilutes results).
Model a 'trichocereus care research' coordinator with two subagents: a web-search agent and a document-analysis agent. Deliberately give the coordinator a NARROW decomposition (only 'watering schedule') and watch it miss light, soil, and cold-hardiness. Then fix the decomposition prompt to enumerate sub-domains and confirm coverage improves. This makes the Question-7-style root-cause lesson stick.
Check yourself
You run your research system on 'caring for columnar cacti in cold climates.' Every subagent succeeds, yet the final report covers only watering and ignores cold-hardiness, soil, and light. The coordinator log shows it decomposed the topic into 'watering frequency,' 'watering amount,' and 'overwatering signs.' What is the most likely root cause?
Configure subagent invocation, context passing, and spawning
Subagents are spawned via the Task tool, and a coordinator can only invoke them if its allowedTools includes 'Task'. Because subagents start with isolated context, you must pass everything they need explicitly in their prompt — prior findings, source metadata, goals. Use structured formats to keep content separate from metadata (URLs, doc names, page numbers) so attribution survives the handoff. To parallelize, emit multiple Task calls in a SINGLE coordinator response rather than across turns. And prefer goal-and-criteria prompts over step-by-step procedures so subagents can adapt.
Skills to demonstrate
- Include complete prior-agent findings directly in a subagent's prompt (e.g., feed search results + doc analysis into the synthesis subagent).
- Use structured data to preserve source attribution across handoffs.
- Spawn parallel subagents with multiple Task calls in one response.
- Write coordinator prompts specifying goals/quality bars, not procedures.
Distractor traps
- Expecting a subagent to 'remember' what the coordinator knows.
- Spawning sequentially when the work is independent (lost latency).
- Over-specifying steps, which prevents subagents from adapting to what they find.
Take your Braves analytics idea: a coordinator with allowedTools including 'Task' spawns a 'pull Statcast for a player' subagent and a 'pull game logs' subagent IN PARALLEL (two Task calls, one response). Confirm each subagent receives the player ID and date range explicitly in its prompt, and that the synthesis step still knows which numbers came from which source.
Check yourself
Your coordinator delegates to a synthesis subagent, but the synthesis output keeps inventing which source a statistic came from. The search subagent definitely found correct, attributed data. What is the most direct fix?
Implement multi-step workflows with enforcement and handoff patterns
When a step MUST precede another for correctness (verify customer identity before issuing a refund), a prompt instruction is not enough — prompts have a non-zero failure rate. Use programmatic enforcement: a prerequisite gate or hook that blocks the downstream tool until the required step has completed. This is the single most-tested idea in the support scenario. For mid-process escalation, design structured handoffs that carry everything a human needs (customer ID, root cause, recommended action) because that human can't see the conversation.
Skills to demonstrate
- Block downstream tool calls until prerequisites complete (e.g., block process_refund until get_customer returns a verified ID).
- Decompose multi-concern requests into items, investigate in parallel with shared context, then synthesize one resolution.
- Compile structured handoff summaries for human agents who lack the transcript.
Distractor traps
- Choosing 'make the system prompt stronger' for a requirement that must be deterministic.
- Choosing a routing classifier (tool availability) when the problem is tool ORDERING.
- Relying on few-shot examples to enforce a hard business rule.
On a YesChef refund flow, write a PreToolUse hook that blocks process_refund until get_customer has returned a verified customer ID in this session. Prove it: prompt the agent to refund using only a name and confirm the hook stops it. Then compare against a version that only puts 'always verify first' in the system prompt and observe the non-zero failure rate.
Check yourself
Production data shows your YesChef agent skips get_customer in 12% of cases and issues refunds against a customer's stated name, sometimes refunding the wrong account. Which change most effectively fixes this?
Apply Agent SDK hooks for tool call interception and data normalization
Hooks intercept the agent lifecycle without changing the model. PostToolUse fires after a tool returns and is ideal for normalizing heterogeneous data (Unix timestamps vs ISO 8601 vs numeric status codes) before the model ever sees it. PreToolUse fires before a tool runs and is ideal for compliance gates (block a refund over $500, redirect to escalation). The exam's core distinction: hooks give DETERMINISTIC guarantees; prompts give PROBABILISTIC compliance. Choose hooks whenever a business rule must always hold.
Skills to demonstrate
- Normalize mixed data formats in PostToolUse so the agent reasons over clean inputs.
- Block policy-violating tool calls in PreToolUse and redirect (e.g., to human escalation).
- Pick hooks over prompts when compliance must be guaranteed.
Distractor traps
- Using PostToolUse when you needed to BLOCK an action (that's PreToolUse).
- Enforcing a hard limit via prompt wording.
Two hooks on the YesChef agent: (1) a PostToolUse hook that converts every booking timestamp (some Unix, some ISO) into one canonical ISO format before the model sees it; (2) a PreToolUse hook that blocks process_refund over $500 and routes to escalate_to_human. Test both with inputs that trip them.
Check yourself
Your booking tools return dates inconsistently: get_yeschef_booking gives a Unix epoch, lookup_chef_availability gives ISO 8601, and a legacy tool returns a numeric status code. The agent keeps miscomparing dates. Which hook and why?
Design task decomposition strategies for complex workflows
Two decomposition modes. Fixed sequential pipelines (prompt chaining) suit predictable, multi-aspect work — e.g., review each file individually, then a cross-file integration pass. Dynamic adaptive decomposition suits open-ended investigation where each step's findings generate the next subtasks. Knowing which to reach for is the skill: chaining for predictable reviews, dynamic for exploration like 'add comprehensive tests to a legacy codebase' (map structure, find high-impact areas, build a plan that adapts).
Skills to demonstrate
- Match the pattern to the task: chaining for predictable multi-aspect reviews, dynamic for open-ended investigation.
- Split large reviews into per-file local passes plus a cross-file integration pass to avoid attention dilution.
- Decompose open-ended tasks by first mapping structure, then prioritizing, then adapting as dependencies surface.
Distractor traps
- Using one big single pass for a many-file review (attention dilution, contradictory findings).
- Forcing a rigid pipeline onto genuinely exploratory work.
On your Music League dashboard repo, run a deliberately bad single-pass review across many files and note the inconsistency. Then restructure: a per-file local pass plus one cross-file integration pass. Compare the quality and consistency of findings.
Check yourself
A PR touches 14 files in your Music League data pipeline. Your single-pass review gives deep feedback on some files, shallow on others, misses obvious bugs, and even flags a pattern in one file while approving identical code in another. Best restructuring?
Manage session state, resumption, and forking
Long-lived agent work needs session control. Resume a specific prior conversation by name with --resume <session-name>. Use fork_session to branch from a shared baseline and explore divergent approaches independently (e.g., compare two refactors from one codebase analysis). Two judgment calls: when prior context is still mostly valid, resume; when prior tool results are STALE (files changed since), it's more reliable to start fresh with an injected structured summary than to resume on stale results. And when you do resume after edits, tell the agent exactly which files changed so it re-analyzes targets rather than everything.
Skills to demonstrate
- Use --resume with named sessions to continue investigations across work sessions.
- Use fork_session to create parallel exploration branches from a shared baseline.
- Choose resumption (context mostly valid) vs fresh-start-with-summary (tool results stale).
- Inform a resumed session about specific file changes for targeted re-analysis.
Distractor traps
- Resuming on stale tool results and trusting them.
- Re-exploring an entire codebase when only a few files changed.
On the YesChef repo: start a named analysis session, make a small code change, then practice both paths — (a) resume and explicitly tell the agent which file changed, and (b) start fresh with a written summary. Note which felt more reliable and why.
Check yourself
You analyzed the YesChef codebase in a session yesterday, then refactored three files. Today you want the agent to continue, but its prior tool results now describe code that no longer exists. Best approach?
Teach it back
Without looking at notes, explain to an imaginary junior engineer: why is checking stop_reason the correct way to terminate an agent loop, and what specifically goes wrong if you instead (a) cap iterations at N, or (b) stop when the assistant returns text? Then explain when you'd reach for a hook instead of a stronger system prompt.