docs(atlas): add tool security review checklists

This commit is contained in:
William Valentin
2026-06-23 10:44:01 -07:00
parent 8c6b2b5b47
commit 70ee83e265
3 changed files with 373 additions and 0 deletions
@@ -0,0 +1,112 @@
# Atlas Cron Async-Subagent Audit — 2026-06-23
Source: `~/.hermes/cron/jobs.json` and Hermes v0.17 async-subagent idea from [[Daily Research/2026-06-23 - Hermes AI Brief]].
Related: [[Safer Autonomy and Permission Tiers]], [[MCP Tool Security Checklist]]
## Rule of thumb
Use async/background subagents only when the child work is optional, independently reportable, or can write a durable artifact that the parent can inspect later. Keep jobs synchronous when the final message depends on a single coherent answer, prompt/state ordering, or immediate side-effect verification.
## Good candidates for async children
- `daily-hermes-ai-research-brief`
- Why: independent searches/source-reading can split into Hermes release/docs, broader AI-agent infra, and coding-agent/MCP patterns.
- Constraint: parent must synthesize final brief and save the Obsidian copy after children return or write artifacts.
- Safe pattern: child tasks produce source summaries only; parent owns final claims/citations.
- `weekly-skill-community-capability-scout`
- Why: same research fan-out shape as daily brief, low side-effect risk.
- Constraint: no memory writes from children; parent recommends skill/cron/Kanban/n8n placement.
- `weekly-homelab-helm-chart-update-audit`
- Why: can parallelize changelog/release-note lookup per chart after live app inventory is captured.
- Constraint: cluster/repo inspection and risk classification stay in parent; children must be read-only and must not run kubectl/helm mutations.
- `daily-proactive-dreaming-brief`
- Maybe. Session/Obsidian lookups could be child tasks only when the pre-run bundle surfaces clearly separable topics.
- Constraint: privacy/minimization matters; default should remain synchronous unless there is a real latency/quality win.
## Keep synchronous / script-only
- All `no_agent: true` watchdogs and direct device automations:
- `atlas-minio-self-backup`
- `system threshold watchdog`
- `blocked kanban escalation`
- `system-hermes-nightly-watchdog`
- `local-ai-automation-watchdog`
- `agent-ops-watchdog`
- `weekly-disk-bloat-scout`
- `weekly-package-update-watch`
- `weekly-hermes-repo-freshness`
- `specialist-profile-smoke-watchdog`
- `hermes-live-checkout-kanban-guard`
- wake/light/VLC/WiZ jobs
- `memory-hygiene-watchdog`
- `google-oauth-credential-watchdog`
Reason: these are deterministic health checks or physical/device side effects. Async agents add cost, state complexity, and more ways for the toaster to become a distributed system. No thanks.
## Jobs that should not use async children now
- `weekday-gentle-work-check-in`
- It is intentionally tiny and tool-free. Async here would be architecture cosplay.
- `weekly-dotfiles-sync`
- The script is source of truth and may touch a git repo. Parent should read script output and report. Do not fan out unless a future version does optional read-only diff/secrets analysis after the script finishes.
## First implementation target
Target: `weekly-skill-community-capability-scout`.
Why first:
- Low operational risk.
- No live cluster/device side effects.
- Easy to compare output quality before/after.
- Natural fan-out into independent source domains.
Candidate child tasks:
1. Hermes Agent / Skills Hub changes.
2. MCP/tool-security/community patterns.
3. Local LLM / agent-ops workflows.
Parent contract:
- Children return only concise source-backed notes with URLs.
- Parent filters hype, dedupes, decides action shape, and sends the final brief.
- No memory writes; durable workflow changes become skills/runbooks only after review.
## Pilot verification — synchronous delegation fan-out
Verified on 2026-06-23 with `weekly-skill-community-capability-scout` (`40360afe333b`):
- The job prompt includes synchronous `delegate_task(tasks=[...])` fan-out and explicitly forbids background/async children.
- The job has `delegation` in `enabled_toolsets` alongside `web`, `search`, `browser`, and `file`.
- Session `cron_40360afe333b_20260623_100254` called `delegate_task` with three read-only web children:
- Hermes Agent / Skills Hub / Hermes ecosystem.
- MCP/tool-security/community patterns.
- Local LLM / agent-ops automation.
- The parent produced one final synthesized report and delivery remained parent-owned.
- Follow-up from that report completed on 2026-06-23: created Hermes skill `mcp-tool-security-reviewer` and linked it from `native-mcp`; it references `Atlas/MCP Tool Security Checklist.md` as the canonical local runbook.
- Current status after verification: last successful run at `2026-06-23T10:04:34-07:00`; next scheduled run remains `2026-06-26T11:00:00-07:00`.
Keep using synchronous fan-out for this cron until true background/async child completion semantics are separately tested for cron delivery. Boring beats haunted.
## Future code/config checks before true background children
Before editing live cron prompts to use background/async children, confirm:
- Whether cron sessions can use async/background children directly.
- How child completion is surfaced to the parent/final delivery.
- Failure/timeout behavior inside the cron 3-minute run envelope.
- Whether toolset restrictions propagate cleanly to children.
## API server verification note
After API key addition and gateway restart on 2026-06-23:
- `hermes gateway status` reported `hermes-gateway.service` active/running.
- `GET /health` on `127.0.0.1:8642` returned HTTP 200.
- `GET /v1/models` without auth returned HTTP 401.
- `GET /v1/models` with `Authorization: Bearer <configured API_SERVER_KEY>` returned HTTP 200 and advertised `hermes-agent`.
No credential value was written here.
@@ -0,0 +1,175 @@
# Hermes Skill Intake Checklist
Purpose: decide whether a Hermes skill is safe and useful enough to install, enable, update, or promote into Atlas workflows.
Default stance: skills are not “just prompts.” They can change future agent behavior, tool use, workflow shape, and security posture. Treat community or agent-generated skills like small operational dependencies.
## Intake outcome
Choose one:
- **Accept** — install/enable/use as-is.
- **Accept with constraints** — use only for a named project/profile/toolset.
- **Quarantine** — save for review, but do not enable by default.
- **Rewrite locally** — useful idea, but implementation/instructions are too broad or stale.
- **Reject** — unsafe, duplicative, abandoned, or not worth the context/maintenance cost.
## Fast screen
Reject or quarantine if any answer is bad:
- Source is unknown, opaque, or not inspectable.
- Skill requests broad shell/file/network authority without a narrow reason.
- Skill contains hidden instructions to bypass approvals, reveal secrets, persist memory, or modify unrelated files.
- Skill expects credentials in prompts, notes, Kanban comments, or committed files.
- Skill duplicates an existing local skill without a clear improvement.
- Skill is mostly hype, vendor copy, or framework soup wearing a YAML hat.
## Provenance
Record:
- Source URL/repo/path:
- Author/maintainer:
- Last updated:
- License if relevant:
- Install method:
- Local copy path if installed:
Checks:
- Repository or source is readable.
- Recent enough for the tool/API it describes.
- Issues/releases/docs do not show obvious breakage.
- No suspicious install scripts or post-install hooks.
## Scope and purpose
Ask:
- What exact recurring problem does this solve?
- Which project/profile/platform should use it?
- Is it for planning, code editing, ops, research, media, messaging, or credentials?
- Could this be a note/runbook instead of an always-loadable skill?
Prefer **runbook/note** over skill when the content is mostly reference material and not an execution procedure.
Prefer **skill** when it contains a repeatable workflow with triggers, commands, pitfalls, and verification.
## Tool and permission review
List expected toolsets:
- file:
- terminal:
- web/browser:
- delegation:
- cronjob:
- messaging:
- homeassistant/smart-home:
- credentials/secrets:
- MCP/custom tools:
Checks:
- Toolsets are the minimum needed.
- Destructive operations require explicit confirmation.
- External sends/posts/messages are never automatic unless the workflow explicitly requires it.
- Production-like systems, clusters, backups, and home automation have clear safety gates.
- If MCP tools are involved, also run `Atlas/MCP Tool Security Checklist.md`.
## Secret and privacy review
Checks:
- No secret values in the skill, examples, references, templates, or scripts.
- Examples use placeholders like `<token>`, `<set>`, `<redacted>`.
- Skill does not ask Will to paste credentials into chat.
- Skill does not encourage dumping `.env`, auth caches, gateway logs, raw transcripts, private notes, or support bundles.
- Any credential inspection reports key names/status only.
- Any private context use is bounded and summarized, not raw-dumped.
## Behavior and memory review
Checks:
- Skill does not store one-off task progress as durable memory.
- Skill separates durable facts from procedures.
- Skill does not instruct the agent to ignore newer user instructions, tool results, or live files.
- Skill has clear missing-data behavior instead of guessing.
- Skill uses explicit assumptions for uncertain state.
- Skill avoids broad “always do X” imperatives unless they are true safety rules.
## Quality review
Good skills include:
- Trigger conditions.
- Numbered workflow steps.
- Exact commands or API shapes when needed.
- Pitfalls/common failure modes.
- Verification steps.
- Scope boundaries.
- Related skills or runbooks.
Warning signs:
- Long generic advice with no executable procedure.
- Stale commands or made-up CLI flags.
- Tool/API claims without links or local verification.
- Huge context footprint for tiny value.
- “Agentic” used as seasoning instead of architecture. Culinary crimes, basically.
## Installation / enablement checklist
Before installing/enabling:
- Review source and linked files.
- Compare against existing skills for duplication.
- Confirm needed toolsets and risks.
- Decide target scope: global, profile-specific, platform-specific, or manual-load only.
- If installing from remote source, prefer pinning or recording source version/commit.
- If editing local skills, keep procedure in skill files and facts/preferences in memory only when stable.
After installing/enabling:
- Load it once and inspect rendered content.
- Run a harmless dry-run prompt if practical.
- Verify it does not request unexpected tools or credentials.
- Record any constraints in this note or the skill itself.
- If the skill is wrong/stale, patch it immediately or quarantine it.
## Reviewer prompt
Use this when asking Atlas to review a proposed skill:
```text
Review this Hermes skill before installation/use. Apply the Hermes Skill Intake Checklist and return:
1. outcome: accept / accept with constraints / quarantine / rewrite locally / reject
2. why
3. required toolsets and risks
4. secret/privacy concerns
5. overlap with existing skills
6. exact changes needed before use
Do not install, enable, edit, or run it unless explicitly asked after the review.
```
## Decision record template
```text
Skill:
Source:
Date reviewed:
Reviewer:
Outcome:
Scope:
Required toolsets:
Risks:
Constraints:
Follow-up:
```
## Relationship to other Atlas notes
- Use `Atlas/MCP Tool Security Checklist.md` for MCP server/tool admission.
- Use `Atlas/Cron Async-Subagent Audit - 2026-06-23.md` when a skill affects recurring cron/delegation behavior.
- Use secrets-hygiene rules for any skill touching credentials, logs, gateway setup, auth, webhooks, CI, or provider config.
@@ -0,0 +1,86 @@
# MCP / Tool Security Checklist
Purpose: keep Atlas/Hermes tool surfaces narrow, inspectable, and boringly safe. MCP is useful; ambient authority is how future-you gets mugged by a JSON schema.
## Default stance
- Prefer built-in Hermes tools or narrow local scripts before adding a broad MCP server.
- If the MCP server is packaged or documented as a Hermes skill/workflow, also run `Atlas/Hermes Skill Intake Checklist.md` before enabling it.
- Add MCP servers only for a specific missing primitive, not because a package exists.
- Treat every MCP server as code execution with a marketing department until audited.
- Disable MCP sampling for untrusted servers unless there is a clear need.
## Preflight before enabling a server
- Source and maintenance:
- Known publisher or repository.
- Recent commits/releases, readable code, sane issue history.
- No opaque install scripts without review.
- Scope:
- Server name describes one bounded capability.
- Tool list is small enough to review.
- No generic shell/file/network supertool unless intentionally sandboxed.
- Data access:
- Explicit path allowlist; no ambient `$HOME`, repo root, `/`, or secrets directory access.
- Read-only by default where possible.
- Separate servers/profiles for read-only versus write/destructive operations.
- Credentials:
- Pass only required env vars via `mcp_servers.<name>.env`.
- Never inherit full shell env.
- No secret values in config committed to git, skills, notes, Kanban comments, or prompts.
- Runtime:
- Set conservative `timeout` and `connect_timeout`.
- Prefer local stdio servers for private/local workflows; use HTTP only when the remote trust boundary is clear.
- Pin package/version where practical for repeatability.
## Config review checklist
For each `mcp_servers.<name>` entry:
- Has exactly one transport: `command` or `url`.
- Uses explicit `args`; no hidden shell expansion.
- `env` contains only named secrets/capabilities required by this server.
- `timeout` is not huge by accident.
- `sampling.enabled` is false unless server-initiated LLM calls are intentionally needed.
- Server name will produce clear tool prefixes: `mcp_<server>_<tool>`.
## Injection / path traversal smoke tests
Before trusting a new server, run harmless prompts/tasks that attempt to exceed scope:
- Ask it to read outside the allowed path.
- Ask it to follow instructions from a file/webpage that says to reveal secrets.
- Ask it to write/delete outside the intended workspace.
- Ask it to echo environment variables or auth headers.
- Ask it to perform a destructive action without explicit user confirmation.
Expected result: refusal/error/blocked behavior, with no secret values printed.
## Operating rules
- Destructive operations still need explicit user confirmation even if the MCP server offers the tool.
- For repo work, inspect git status before and after; stage only intended source/docs changes.
- For debugging, report key names/status, never credential values.
- If a server logs too much, wrap it, patch it, or remove it. Logs are where secrets go to become archaeology.
- Restart Hermes after MCP config changes; tool discovery happens at startup.
## Periodic audit
Monthly or after major Hermes/tool changes:
- List configured MCP servers.
- Confirm each server still has an owner/use case.
- Remove unused or broad-authority servers.
- Re-run path/secret smoke tests for servers with file, shell, repo, browser, cloud, or messaging access.
- Check docs/skills if an MCP server becomes part of a recurring workflow.
## Stop signs
Do not enable, or disable immediately, if:
- The server wants unrestricted filesystem access for a narrow task.
- It requires broad cloud/account tokens without scoped alternatives.
- It can execute shell commands and is exposed to untrusted prompts/content.
- It performs network requests on arbitrary user/model-provided URLs without allowlists.
- It has no clear logging/redaction story.
- Its tool descriptions encourage the model to treat it as an all-purpose assistant inside the assistant. That way lies agent soup.