feat: add swarm-common obsidian vault
Add Obsidian vault to the swarm-common virtiofs share for access from zap VM and other VMs. Contains agent memory, notes, and infrastructure documentation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,72 @@
|
||||
# Anthropic — Prompt Caching Best Practices
|
||||
|
||||
**Source**: https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching
|
||||
**Fetched**: 2026-03-05
|
||||
|
||||
---
|
||||
|
||||
## How It Works
|
||||
|
||||
1. On first request: system processes full prompt and caches the prefix once the response begins.
|
||||
2. On subsequent requests with same prefix: uses cached version (much cheaper + faster).
|
||||
3. Cache is checked against a cryptographic hash of the prefix content.
|
||||
|
||||
## Two Caching Modes
|
||||
|
||||
### Automatic Caching (recommended for multi-turn)
|
||||
Add `cache_control: {"type": "ephemeral"}` at the **top level** of the request body.
|
||||
- System automatically caches all content up to the last cacheable block.
|
||||
- Moves cache breakpoint forward as conversation grows.
|
||||
- Best for multi-turn conversations.
|
||||
|
||||
### Explicit Cache Breakpoints
|
||||
Place `cache_control` directly on individual content blocks.
|
||||
- Finer control over exactly what gets cached.
|
||||
- Use when you want to cache specific blocks (e.g., a large document) but not others.
|
||||
|
||||
## Cache Lifetimes
|
||||
|
||||
| Duration | Cost | Availability |
|
||||
|----------|------|-------------|
|
||||
| 5 minutes (default) | 1.25x base input price for write | All models |
|
||||
| 1 hour | 2x base input price for write | Available at additional cost |
|
||||
|
||||
- Cache **reads** cost 0.1x (10%) of base input price.
|
||||
- Cache is refreshed for **no additional cost** each time cached content is used.
|
||||
- Default TTL: **5 minutes** (refreshed on each use within TTL).
|
||||
|
||||
## Pricing Per Million Tokens (relevant models)
|
||||
|
||||
| Model | Base Input | 5m Write | 1h Write | Cache Read | Output |
|
||||
|-------|-----------|----------|----------|------------|--------|
|
||||
| Claude Opus 4.6 | $5 | $6.25 | $10 | $0.50 | $25 |
|
||||
| Claude Sonnet 4.6 | $3 | $3.75 | $6 | $0.30 | $15 |
|
||||
| Claude Haiku 4.5 | $1 | $1.25 | $2 | $0.10 | $5 |
|
||||
|
||||
> Note: We use Copilot subscription (flat rate), so per-token cost doesn't apply directly. But quota burn follows similar relative proportions — caching still saves quota by reducing re-processing of identical prefixes.
|
||||
|
||||
## Supported Models
|
||||
- Claude Opus 4.6, 4.5, 4.1, 4
|
||||
- Claude Sonnet 4.6, 4.5, 4
|
||||
- Claude Haiku 4.5, 3.5
|
||||
|
||||
**Not supported**: Non-Claude models (GPT, GLM, Gemini) — caching is Anthropic-only.
|
||||
|
||||
## What Gets Cached
|
||||
Prefix order: `tools` → `system` → `messages` (up to the cache breakpoint).
|
||||
|
||||
The full prefix is cached — all of tools, system, and messages up to and including the marked block.
|
||||
|
||||
## Key Best Practices
|
||||
|
||||
1. **Put static content first**: Instructions, system prompts, and background context should come before dynamic/user content.
|
||||
2. **Use 1-hour cache for long sessions**: Default 5-minute TTL means cache expires between turns if idle > 5 min. Use 1h for agents with longer gaps.
|
||||
3. **Automatic caching for multi-turn**: Simplest approach, handles the growing message history automatically.
|
||||
4. **Minimum size**: Cache only activates for content > a certain token threshold (details not specified, but system prompts qualify easily).
|
||||
5. **Privacy**: Cache stores KV representations and cryptographic hashes, NOT raw text. ZDR-compatible.
|
||||
|
||||
## For Our Setup (OpenClaw)
|
||||
- Main session system prompt is large (~15-20k tokens) and mostly static → ideal caching candidate.
|
||||
- Heartbeat turns are the same every 25-30min → if using 1h cache, heartbeats keep cache warm for free.
|
||||
- OpenClaw's `cacheRetention` config likely maps to this `cache_control` setting.
|
||||
- Applies to: `litellm/copilot-claude-*` models only. Does NOT apply to GLM, GPT-4o, Gemini.
|
||||
Reference in New Issue
Block a user