Tests the full agent -> tool execution -> model feedback loop: - Shell tool execution with mock model - Multi-tool chaining (file.write -> file.read) across iterations - Verification that tool results are correctly passed back to model