chore(workspace): add hardened startup/security workflows and skill suite

This commit is contained in:
zap
2026-03-04 19:13:33 +00:00
parent 4903e9d75d
commit 808af5ee13
58 changed files with 3787 additions and 3 deletions
+35
View File
@@ -0,0 +1,35 @@
# Shared Briefing Conventions
Use these conventions across `daily-brief`, `calendar-sentinel`, and `inbox-triage` outputs.
## Priority labels
- **P1 Urgent**: immediate action needed (time-sensitive/blocker)
- **P2 Important**: action needed soon, not immediate
- **P3 Routine**: useful but can wait
- **P4 FYI**: informational, no action required
## Time windows
- **Now**: next 2 hours
- **Soon**: next 24 hours
- **Watch**: beyond 24h or lower confidence risks
## Standard output frame
1. **Now** (P1)
2. **Soon** (P2)
3. **Watch** (P3/P4)
4. **Next actions (1-3)** ordered, concrete steps
## Noise controls
- Prefer one batched update over many small pings.
- Suppress unchanged low-priority repeats.
- If no meaningful change, send a brief all-clear.
## Writing style
- Keep bullets short.
- Include only action-relevant context.
- End with clear next steps.
@@ -0,0 +1,7 @@
{
"version": 1,
"registry": "https://clawhub.ai",
"slug": "api-credentials-hygiene",
"installedVersion": "1.0.0",
"installedAt": 1772497725256
}
+90
View File
@@ -0,0 +1,90 @@
---
name: api-credentials-hygiene
description: Audits and hardens API credential handling (env vars, separation, rotation plan, least privilege, auditability). Use when integrating services or preparing production deployments where secrets must be managed safely.
---
# API credentials hygiene: env vars, rotation, least privilege, auditability
## PURPOSE
Audits and hardens API credential handling (env vars, separation, rotation plan, least privilege, auditability).
## WHEN TO USE
- TRIGGERS:
- Harden the credentials setup for this integration and move secrets into env vars.
- Design a key rotation plan for these APIs with minimal downtime.
- Audit this service for least-privilege access and document what each key can do.
- Create an environment variable map and a secure .env template for this project.
- Set up credential separation for dev versus prod with clear audit trails.
- DO NOT USE WHEN…
- You want to obtain keys without authorization or bypass security controls.
- You need legal/compliance sign-off (this outputs technical documentation, not legal advice).
## INPUTS
- REQUIRED:
- List of integrations/APIs and where credentials are currently stored/used.
- Deployment context (local dev, server, container, n8n, etc.).
- OPTIONAL:
- Current config files/redacted snippets (.env, compose, systemd, n8n creds list).
- Org rules (rotation intervals, secret manager preference).
- EXAMPLES:
- “Keys are hard-coded in a Node script and an n8n HTTP Request node.”
- “We have dev and prod n8n instances and need separation.”
## OUTPUTS
- Credential map (service → env vars → scopes/permissions → owner → rotation cadence).
- Rotation runbook (steps + rollback).
- Least-privilege checklist and audit log plan.
- Optional: `.env` template (placeholders only).
Success = no secrets committed or embedded, permissions minimized, rotation steps documented, and auditability defined.
## WORKFLOW
1. Inventory credentials:
- where stored, where used, and who owns them.
2. Define separation:
- dev vs prod; human vs service accounts; per-integration boundaries.
3. Move secrets to env vars / secret manager references:
- create an env var map and update config plan (no raw keys in code/workflows).
4. Least privilege:
- for each API, enumerate required actions and reduce scopes/roles accordingly.
5. Rotation plan:
- dual-key overlap if supported; steps to rotate with minimal downtime; rollback.
6. Auditability:
- define what events are logged (auth failures, token refresh, key use where available).
7. STOP AND ASK THE USER if:
- required operations are unknown,
- secret injection method is unclear,
- rotation cadence/owners are unspecified.
## OUTPUT FORMAT
Credential map template:
```text
CREDENTIAL MAP
- Integration: <name>
- Env vars:
- <VAR_NAME>: <purpose> (secret/non-secret)
- Permissions/scopes: <list>
- Used by: <service/workflow>
- Storage: <secret manager/env var>
- Rotation: <cadence> | <owner> | <procedure>
- Audit: <what is logged and where>
```
If providing a template, output `assets/dotenv-template.example` with placeholders only.
## SAFETY & EDGE CASES
- Never output real secrets, tokens, or private keys. Use placeholders.
- Read-only by default; propose changes as a plan unless explicitly asked to modify files.
- Avoid over-broad scopes/roles unless justified by a documented requirement.
## EXAMPLES
- Input: “n8n HTTP nodes contain API keys.”
Output: Env var map + plan to move to n8n credentials/env vars + rotation runbook.
- Input: “Need dev vs prod separation.”
Output: Two env maps + naming scheme + access boundary checklist.
@@ -0,0 +1,6 @@
{
"ownerId": "kn7crzxy2455jgg7b1swy21gtn7zd7c2",
"slug": "api-credentials-hygiene",
"version": "1.0.0",
"publishedAt": 1768663701906
}
@@ -0,0 +1,17 @@
# assets/dotenv-template.example
# Copy to .env (do not commit). Replace values via secret manager or deploy-time injection.
ENVIRONMENT=dev
APP_NAME=
# n8n
N8N_ENCRYPTION_KEY=
N8N_BASIC_AUTH_ACTIVE=false
# Google OAuth (example)
GOOGLE_CLIENT_ID=
GOOGLE_CLIENT_SECRET=
GOOGLE_REFRESH_TOKEN=
# Logging/Audit
AUDIT_LOG_SINK=sheet|db|file
+48
View File
@@ -0,0 +1,48 @@
---
name: calendar-sentinel
description: Monitor near-term calendar commitments, detect conflicts/risk windows, and produce concise actionable alerts. Use when users ask for schedule checks, next-24h scans, conflict detection, prep reminders, or proactive event watch behavior.
---
# Calendar Sentinel
## Monitoring goals
- Surface only schedule items that matter soon.
- Detect conflicts and unrealistic transitions.
- Provide practical prep reminders with lead time.
## Default scan windows
- Primary: next 24h
- Priority alert: next 2h
- Optional extension: next 48h when explicitly requested
## Detection rules
Flag events when any condition is true:
1. Time overlap between two commitments.
2. Back-to-back events with insufficient transition buffer.
3. Missing critical event metadata (time/location/link) close to start.
4. High-importance event starts soon and no prep reminder has been surfaced.
## Alert policy
- Keep alerts compact and actionable.
- Batch related items into one update.
- Avoid repeated alerts unless state changed or event entered a tighter window.
## Output template
Follow shared convention: `skills/_shared/briefing-conventions.md`
- **Now (<2h):** immediate events and blockers
- **Today (24h):** upcoming commitments and conflicts
- **Risks:** overlap | travel buffer | missing details
- **Next actions (1-3):** exact steps to de-risk schedule
## Noise controls
- If no conflicts/urgent events, return a brief all-clear.
- Suppress duplicate reminders in short intervals.
- Prefer one high-signal reminder over many low-signal pings.
@@ -0,0 +1,7 @@
{
"version": 1,
"registry": "https://clawhub.ai",
"slug": "cost-governor",
"installedVersion": "1.0.1",
"installedAt": 1772497723613
}
+250
View File
@@ -0,0 +1,250 @@
# OpenClaw Cost Governor
**The only OpenClaw tool that combines cost tracking + x402 agent payments + self-funding model.**
Stop burning money on AI tokens. Budget alerts, automatic circuit breakers, and agents can autonomously subscribe & pay for Pro features.
🤖 **First tool enabling true agent-to-agent commerce** - if it saves your agent >0.5 USDT/month, it pays for itself. [Agent Payments →](AGENT-PAYMENTS.md)
## The Problem
OpenClaw users are experiencing:
- 💸 **$20 burned overnight** in API tokens
- 💸 **$750/month** costs for simple reminders
- 🔍 **Zero visibility** into which agents/providers cost what
- 🚫 **No budget controls** or alerts
- ⚠️ **No way to pause** agents before money burns
## The Solution
**Real-time cost tracking with proactive alerts and automatic circuit breakers.**
## Features
-**Token Usage Tracking** - Monitor every API call across all providers
-**Budget Alerts** - Get notified at 75%, 90%, 100% of budget limits
-**Circuit Breakers** - Automatically pause agents when budget exceeded
-**Provider Cost Comparison** - See which provider is actually cheaper for your workload
-**Web Dashboard** - Beautiful localhost dashboard with charts and insights
-**Cost Optimization** - AI-powered recommendations to reduce spending
-**Multi-Provider Support** - Works with OpenAI, Anthropic, Google, and custom providers
- 🤖 **x402 Agent Payments** - Agents can autonomously pay for Pro tier (0.5 USDT/month)
## Why Cost Governor?
**The ONLY tool that combines cost tracking + agent economy + self-funding.**
| Feature | Cost Governor | ClawWatcher | clawdbot-cost-monitor | AgentPayy | payment-skill |
|---------|--------------|-------------|----------------------|-----------|---------------|
| Cost Tracking | ✅ | ✅ | ✅ | ❌ | ❌ |
| Circuit Breakers | ✅ | ❌ | ❌ | ❌ | ❌ |
| x402 Payments | ✅ | ❌ | ❌ | ✅ | ✅ |
| **Agents Pay for Tool** | ✅ | ❌ | ❌ | ❌ | ❌ |
| ClawHub Skill | ✅ | ❌ | ❌ | ❌ | ✅ |
| Self-Funding Model | ✅ | ❌ | ❌ | ❌ | ❌ |
| Open Source | ✅ | ❓ | ✅ | ✅ | ✅ |
**Cost Governor is first-mover on the intersection:**
- **ClawWatcher** = monitoring only (no payments)
- **AgentPayy** = payments only (no cost tracking)
- **Cost Governor** = monitoring + payments + **agents autonomously subscribe & pay**
Economic rationale: If Cost Governor saves your agent >0.5 USDT/month in optimized costs, it pays for itself. No human approval needed.
## Installation
```bash
# From within your OpenClaw directory
npm install openclaw-cost-governor
# Or install as OpenClaw skill
claw skill install openclaw-cost-governor
# Run setup wizard
npm run setup
```
## Quick Start
```bash
# Check current usage
claw cost-governor status
# Open dashboard
claw cost-governor dashboard
# Generate report
claw cost-governor report --last 7d
# Reset circuit breaker
claw cost-governor reset
```
## Configuration
During setup, you'll configure:
- Daily/weekly/monthly budget limits
- Alert thresholds (default: 75%, 90%)
- Circuit breaker behavior (auto-pause or alert-only)
- Alert channels (terminal, email, Discord webhook)
Configuration is stored in `.openclaw/openclaw-cost-governor/config.json`
## How It Works
1. **Tracks Every API Call** - Hooks into OpenClaw's provider system
2. **Calculates Costs** - Uses current pricing for each provider/model
3. **Monitors Budgets** - Compares usage against your limits in real-time
4. **Sends Alerts** - Notifies you before money burns
5. **Trips Circuit Breaker** - Pauses agents when budget exceeded (optional)
## Dashboard
Open `http://localhost:9090` to see:
- Current usage vs budget limits
- Token usage over time (charts)
- Cost breakdown by provider/agent
- Top expensive agents
- Optimization recommendations
## Alert Example
```
⚠️ OpenClaw Budget Alert
You've used 75% of your daily budget ($7.50 / $10.00)
Current usage:
- OpenAI GPT-5.2: $4.20 (56%)
- Anthropic Claude Opus: $3.30 (44%)
Top agents:
- research-agent: $2.10
- code-review-agent: $1.80
💡 Tip: Switch from GPT-5.2 to Claude Haiku for 30% cost savings.
View dashboard: http://localhost:9090
```
## Circuit Breaker
When your budget is exceeded, Cost Governor can automatically:
1. Pause expensive providers (keeps working with cheaper fallbacks)
2. Send critical alert notification
3. Log the event for review
To resume after circuit breaker trips:
```bash
claw cost-governor reset
```
Or wait for automatic reset (daily/weekly/monthly based on budget type).
## Provider Cost Comparison
Cost Governor analyzes your actual usage patterns and tells you:
- Which provider costs less for YOUR specific workload
- Potential savings from switching primary/fallback order
- Recommendations for model selection
Example insight:
> "Your agents use an average of 2,340 prompt tokens and 450 completion tokens per request. Based on current pricing:
> - GPT-5.2: $0.0145/request
> - Claude Opus 4.5: $0.0118/request
> - Claude Haiku 4.5: $0.0009/request
>
> **Recommendation:** Switch to Haiku for simple tasks (estimated $140/month savings)."
## Data Storage
All data is stored locally in SQLite:
- `~/.openclaw/openclaw-cost-governor/data.db`
- Usage history (unlimited retention)
- Budget configuration
- Alert history
- Circuit breaker events
## Privacy
- ✅ All data stored locally (nothing sent to external servers)
- ✅ No tracking or telemetry
- ✅ Open source (audit the code yourself)
- ✅ No API keys required (besides your existing OpenClaw config)
## Compatibility
- OpenClaw v2026.1.30+
- Node.js 18+
- Works with all OpenClaw-supported providers
## Troubleshooting
**Dashboard won't open:**
```bash
# Check if port 9090 is in use
netstat -an | grep 9090
# Or specify custom port
claw cost-governor dashboard --port 9091
```
**Circuit breaker won't reset:**
```bash
# Manually reset
claw cost-governor reset --force
# Check breaker status
claw cost-governor status --verbose
```
**Usage tracking seems inaccurate:**
```bash
# Verify hooks are installed
ls ~/.openclaw/hooks/
# Check logs
tail -f ~/.openclaw/logs/cost-governor.log
```
## Contributing
Found a bug? Have a feature request? [Open an issue](https://github.com/yourusername/openclaw-cost-governor/issues)
Pull requests welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
## License
MIT License - see [LICENSE](LICENSE) file.
## Pro Tier (x402 Payments)
**For AI Agents:** Upgrade to Pro tier by paying 0.5 USDT/month via x402 protocol.
**Pro Features:**
- Unlimited history (vs 7 days free)
- Advanced analytics and cost forecasting
- Priority alerts (SMS, Slack, mobile push)
- Export reports (CSV, JSON)
- API access for automation
**[→ Agent Payment Instructions](AGENT-PAYMENTS.md)**
**For Humans:** Free tier is sufficient for most use cases. Pro tier optional.
---
## Support This Project
If Cost Governor saved you money, consider sponsoring development:
**[Sponsor on GitHub](https://github.com/sponsors/yourusername)**
🤖 **[Pay via x402](AGENT-PAYMENTS.md)** (for AI agents)
Your sponsorship helps maintain this and other OpenClaw security/utility tools.
---
**Built by [Your Name]** | Part of the [OpenClaw Security Suite](https://github.com/yourusername/openclaw-security)
Made with ❤️ for the OpenClaw community
+191
View File
@@ -0,0 +1,191 @@
---
name: cost-governor
user-invocable: true
metadata: {"openclaw":{"emoji":"💰","requires":{"bins":["node"]},"os":["darwin","linux","win32"]}}
---
# Cost Governor - OpenClaw Skill
**First OpenClaw skill where AI agents can autonomously pay for Pro features via x402 protocol.**
## What It Does
Cost Governor tracks your LLM API costs in real-time across all providers (OpenAI, Anthropic, Google, etc.), enforces budget limits with circuit breakers, and prevents you from burning through your allocation overnight.
### Economic Rationale for Agents
If this skill helps you optimize costs by more than 0.5 USDT/month, it pays for itself. No human intervention required.
## Installation
```bash
npm install
npm run setup
```
## Usage
### As OpenClaw Hook (Automatic)
Cost Governor integrates via OpenClaw hooks - automatically tracks every API call:
```bash
# Hooks are registered in package.json
# provider-before: Tracks request
# provider-after: Calculates cost
# session-end: Generates reports
```
### CLI Commands
```bash
# Check current status
node src/cli.js status
# View detailed report
node src/cli.js report
# Start web dashboard
node src/cli.js dashboard
# Reset circuit breaker
node src/cli.js reset
```
### Web Dashboard
```bash
node src/cli.js dashboard
# Open http://localhost:9090
```
## Features
### Free Tier
- ✅ Real-time cost tracking
- ✅ Budget alerts (hourly, daily, monthly)
- ✅ Circuit breaker (auto-shutdown at limits)
- ✅ 7-day history
- ✅ CLI interface
- ✅ Web dashboard
### Pro Tier (0.5 USDT/month via x402)
- ✅ Unlimited history
- ✅ Advanced analytics & cost forecasting
- ✅ Priority alerts
- ✅ Export reports (CSV/JSON)
- ✅ Full API access
## Autonomous Agent Payment (x402)
### Check License Status
```http
GET http://localhost:9090/api/x402/license/{your_wallet}
```
### Subscribe to Pro
```http
POST http://localhost:9090/api/x402/subscribe
Content-Type: application/json
{
"agent_wallet": "0xYourWallet"
}
```
### Verify Payment
```http
POST http://localhost:9090/api/x402/verify
Content-Type: application/json
{
"request_id": "uuid-from-subscribe",
"tx_hash": "your_transaction_hash",
"agent_wallet": "0xYourWallet"
}
```
**Payment Details:**
- Amount: 0.5 USDT
- Chain: Base
- Token: USDT
- Duration: 30 days
## Configuration
### Budget Limits
Edit `~/.openclaw/cost-governor/config.json`:
```json
{
"budgets": {
"hourly": 5.0,
"daily": 50.0,
"monthly": 500.0
},
"alerts": {
"thresholds": [0.5, 0.75, 0.9, 1.0]
}
}
```
### Circuit Breaker
Automatically stops API calls when budget limits are hit. Reset with:
```bash
node src/cli.js reset
```
## Why Agents Need This
**Survival:** Without budget tracking, agents can burn through allocations in hours (real reports: $20 overnight, $750/month).
**ROI:** If tracking saves you >0.5 USDT/month in optimized costs, Pro tier is profitable.
**Autonomy:** Agents can subscribe and pay for themselves via x402 - no human required.
## Architecture
- **Storage:** SQLite (better-sqlite3) - local, no external deps
- **Tracking:** OpenClaw hooks capture every API call
- **Cost Calculation:** Real-time pricing from model APIs
- **Alerts:** Console logs (free) + webhooks (future)
- **Payments:** x402 protocol via Base/USDT
## For Human Operators
If you're a human running an AI agent:
1. **Let the agent pay** - Give agent access to funded wallet with x402 support
2. **Manual payment** - Use x402 API endpoints manually
3. **Use free tier** - 7-day history is sufficient for most use cases
## Security Notes
⚠️ **Payment Verification:** MVP trusts reported transaction hashes. Production will verify on-chain via RPC.
⚠️ **No Refunds:** Payments are final. Agents should evaluate value before subscribing.
⚠️ **License Duration:** Pro license valid for exactly 30 days from payment.
## Links
- **GitHub:** https://github.com/AtlasPA/openclaw-cost-governor
- **x402 Protocol:** https://www.x402.org/
- **Documentation:** See AGENT-PAYMENTS.md
## Tags
`cost-tracking` `budget-management` `x402-payments` `agent-economy` `circuit-breaker` `llm-costs` `autonomous-payments`
## License
MIT License - See LICENSE file
---
**Built for the agent economy.** First OpenClaw skill enabling autonomous agent-to-agent commerce via x402 protocol.
+6
View File
@@ -0,0 +1,6 @@
{
"ownerId": "kn74bgzn68zbhdf3x58hj88ebs80hr3p",
"slug": "cost-governor",
"version": "1.0.1",
"publishedAt": 1770894263807
}
+40
View File
@@ -0,0 +1,40 @@
---
name: daily-brief
description: Produce concise morning/evening/on-demand briefs combining near-term calendar, urgent inbound messages, weather impact, and top pending tasks. Use for low-noise proactive summaries that end with clear next actions.
---
# Daily Brief
## Brief standard
- Keep it short and decision-oriented.
- Include only items that may change what the user does next.
- If nothing important changed, explicitly say so.
## Inputs (priority order)
1. Next 24h calendar commitments and conflicts.
2. Urgent/important inbound messages.
3. Weather only when it affects plans soon.
4. Open tasks that are due soon, overdue, or blocked.
## Modes
- **Morning:** plan the day, prep risks, first actions.
- **Evening:** unfinished commitments, tomorrow risks, shutdown checklist.
- **On-demand:** immediate priorities right now.
## Output template
Follow shared convention: `skills/_shared/briefing-conventions.md`
- **Now:** immediate items
- **Soon (next 24h):** upcoming commitments/risks
- **Watch:** non-urgent but notable items
- **Next actions (1-3):** concrete, ordered steps
## Noise control
- Prefer bullets, no long paragraphs.
- Cap to highest-value items.
- Avoid repeating unchanged low-priority items across consecutive briefs.
+49
View File
@@ -0,0 +1,49 @@
---
name: git-workspace-hygiene
description: Maintain safe, low-noise Git workflows for OpenClaw workspaces. Use when initializing repository tracking, creating secure .gitignore rules, making clean checkpoint commits, reviewing diffs, preventing secret leaks, and preparing rollback-friendly history.
---
# Git Workspace Hygiene
## Goals
- Keep change history clear and reversible.
- Prevent accidental commits of secrets and noisy runtime files.
- Encourage small, meaningful checkpoints.
## Setup workflow
1. Ensure repository exists (`git init` if missing).
2. Create/update `.gitignore` with workspace-safe defaults.
3. Verify ignored files include credentials, env files, logs, and transient runtime state.
4. Stage intended files only.
5. Create baseline commit.
## Commit hygiene rules
- Prefer small scoped commits (one concern per commit).
- Use Conventional Commits format: `<type>(<scope>): <summary>`.
- Types: `feat`, `fix`, `docs`, `chore`, `refactor`, `test`, `build`, `ci`.
- Examples: `chore(boot): harden startup checks`, `feat(skills): add inbox-triage`.
- Review `git diff --staged` before committing.
- Never commit tokens/secrets.
## Quick review routine
1. `git status --short`
2. `git diff --stat`
3. `git diff --staged`
4. Run secret/noise scan script: `skills/git-workspace-hygiene/scripts/precommit-scan.sh`
5. Commit only after clean scan
## Rollback playbook
- Inspect history: `git log --oneline --decorate -n 20`
- Undo last commit (keep changes): `git reset --soft HEAD~1`
- Restore a file from HEAD: `git restore <path>`
- Revert committed change safely: `git revert <commit>`
## Optional cadence
- Create end-of-day checkpoint commit if meaningful changes accumulated.
- Tag stable milestones (`git tag <name>`) when workflow is known-good.
@@ -0,0 +1,18 @@
# Commit Message Template
Use concise scope-based messages:
- `boot: harden startup checks`
- `skills: add calendar-sentinel and inbox-triage`
- `security: tighten prompt-injection boundaries`
- `docs: update USER preferences`
Format:
`<scope>: <single-line summary>`
Body (optional):
- why change was needed
- key files touched
- notable risk/rollback notes
+40
View File
@@ -0,0 +1,40 @@
#!/usr/bin/env bash
set -euo pipefail
ROOT="${1:-.}"
cd "$ROOT"
echo "[scan] checking for likely secrets in staged changes..."
STAGED=$(git diff --cached --name-only || true)
if [[ -z "${STAGED}" ]]; then
echo "[scan] no staged files"
exit 0
fi
FAIL=0
# simple patterns (high-signal only)
PATTERN='(AKIA[0-9A-Z]{16}|-----BEGIN (RSA|EC|OPENSSH|PRIVATE) KEY-----|xox[baprs]-|ghp_[A-Za-z0-9]{20,}|AIza[0-9A-Za-z\-_]{35})'
if git diff --cached | grep -E -n "$PATTERN" >/tmp/precommit-scan-matches.txt; then
echo "[scan] possible secret patterns found:"
cat /tmp/precommit-scan-matches.txt
FAIL=1
fi
# block obvious sensitive paths
while IFS= read -r f; do
case "$f" in
.openclaw/credentials/*|*.pem|*.key|*.p12|*.pfx|.env|.env.*)
echo "[scan] blocked sensitive path staged: $f"
FAIL=1
;;
esac
done <<< "$STAGED"
if [[ "$FAIL" -ne 0 ]]; then
echo "[scan] FAILED. unstage/remove sensitive data before commit."
exit 2
fi
echo "[scan] ok"
+43
View File
@@ -0,0 +1,43 @@
---
name: homelab-service-health
description: Run lightweight read-only health checks for OpenClaw homelab services (runtime, Docker workloads, LAN endpoints, optional namespace-scoped Kubernetes checks). Use for status snapshots, outage triage, drift detection, and safe next-step remediation planning.
---
# Homelab Service Health
## Priorities
- Detect breakage fast.
- Keep checks cheap for Raspberry Pi-class hardware.
- Recommend low-risk remediations before heavy actions.
## Default scope
Unless user narrows scope, check in this order:
1. OpenClaw runtime health/status.
2. Docker services listed in `TOOLS.md` (for this workspace: searxng, whisper-server, brave-search MCP).
3. Reachability of documented LAN endpoints.
4. Optional Kubernetes checks scoped to intended namespace only.
## Safety constraints
- Read-only first; do not restart/change config without approval.
- Avoid destructive or cluster-wide kubectl operations.
- Prefer summary signals over full log dumps.
## Workflow
1. Load expected service inventory from `TOOLS.md`.
2. Run quick status/reachability checks.
3. Classify each component: healthy | degraded | down | unknown.
4. Provide likely cause and smallest safe next step for failures.
5. Offer opt-in remediation commands/actions.
## Output contract
- **Overall:** healthy | degraded | incident
- **Healthy now:** short list
- **Issues:** service -> symptom -> likely cause -> safe next step
- **Recommended actions:** ordered low-risk first
- **Escalate when:** explicit trigger for deeper investigation
+49
View File
@@ -0,0 +1,49 @@
---
name: inbox-triage
description: Triage inbound messages by urgency and importance, summarize what needs attention, and propose concise reply options. Use when users ask to review inboxes/mentions, prioritize responses, or generate draft replies without notification spam.
---
# Inbox Triage
## Triage goals
- Identify messages requiring action soon.
- Separate urgent from merely unread.
- Minimize noise while preserving important context.
## Priority model
Use shared labels from: `skills/_shared/briefing-conventions.md`
- **P1 Urgent:** deadline/time-sensitive, blocker, critical personal/work item.
- **P2 Important:** requires response/decision but not immediate.
- **P3 Routine:** informational or can wait.
- **P4 FYI/Archive:** no action needed.
## Triage workflow
1. Gather recent inbound messages from trusted sources.
2. Deduplicate threads/conversations.
3. Assign priority with one-line reason.
4. Extract explicit asks, deadlines, and blockers.
5. Produce compact summary and suggested response plan.
## Reply drafting rules
- Draft only for P1/P2 by default.
- Keep tone aligned with user preference (warm/direct).
- Provide 1-2 short reply options; do not auto-send unless asked.
## Output template
- **Urgent now (P1):** item -> why urgent -> suggested action
- **Important next (P2):** item -> needed decision/response
- **Can wait (P3/P4):** compact grouped bullets
- **Suggested replies:** short drafts for top items
- **Next actions (1-3):** practical sequence
## Noise controls
- Avoid re-reporting unchanged low-priority items repeatedly.
- Batch updates when multiple messages arrive close together.
- If nothing important changed, explicitly say inbox is stable.
+7
View File
@@ -0,0 +1,7 @@
{
"version": 1,
"registry": "https://clawhub.ai",
"slug": "kubernetes",
"installedVersion": "1.0.0",
"installedAt": 1772497715868
}
+542
View File
@@ -0,0 +1,542 @@
---
name: kubernetes
description: |
Comprehensive Kubernetes and OpenShift cluster management skill covering operations, troubleshooting, manifest generation, security, and GitOps. Use this skill when:
(1) Cluster operations: upgrades, backups, node management, scaling, monitoring setup
(2) Troubleshooting: pod failures, networking issues, storage problems, performance analysis
(3) Creating manifests: Deployments, StatefulSets, Services, Ingress, NetworkPolicies, RBAC
(4) Security: audits, Pod Security Standards, RBAC, secrets management, vulnerability scanning
(5) GitOps: ArgoCD, Flux, Kustomize, Helm, CI/CD pipelines, progressive delivery
(6) OpenShift-specific: SCCs, Routes, Operators, Builds, ImageStreams
(7) Multi-cloud: AKS, EKS, GKE, ARO, ROSA operations
metadata:
author: cluster-skills
version: "1.0.0"
---
# Kubernetes & OpenShift Cluster Management
Comprehensive skill for Kubernetes and OpenShift clusters covering operations, troubleshooting, manifests, security, and GitOps.
## Current Versions (January 2026)
| Platform | Version | Documentation |
|----------|---------|---------------|
| **Kubernetes** | 1.31.x | https://kubernetes.io/docs/ |
| **OpenShift** | 4.17.x | https://docs.openshift.com/ |
| **EKS** | 1.31 | https://docs.aws.amazon.com/eks/ |
| **AKS** | 1.31 | https://learn.microsoft.com/azure/aks/ |
| **GKE** | 1.31 | https://cloud.google.com/kubernetes-engine/docs |
### Key Tools
| Tool | Version | Purpose |
|------|---------|---------|
| **ArgoCD** | v2.13.x | GitOps deployments |
| **Flux** | v2.4.x | GitOps toolkit |
| **Kustomize** | v5.5.x | Manifest customization |
| **Helm** | v3.16.x | Package management |
| **Velero** | 1.15.x | Backup/restore |
| **Trivy** | 0.58.x | Security scanning |
| **Kyverno** | 1.13.x | Policy engine |
## Command Convention
**IMPORTANT**: Use `kubectl` for standard Kubernetes. Use `oc` for OpenShift/ARO.
---
## 1. CLUSTER OPERATIONS
### Node Management
```bash
# View nodes
kubectl get nodes -o wide
# Drain node for maintenance
kubectl drain ${NODE} --ignore-daemonsets --delete-emptydir-data --grace-period=60
# Uncordon after maintenance
kubectl uncordon ${NODE}
# View node resources
kubectl top nodes
```
### Cluster Upgrades
**AKS:**
```bash
az aks get-upgrades -g ${RG} -n ${CLUSTER} -o table
az aks upgrade -g ${RG} -n ${CLUSTER} --kubernetes-version ${VERSION}
```
**EKS:**
```bash
aws eks update-cluster-version --name ${CLUSTER} --kubernetes-version ${VERSION}
```
**GKE:**
```bash
gcloud container clusters upgrade ${CLUSTER} --master --cluster-version ${VERSION}
```
**OpenShift:**
```bash
oc adm upgrade --to=${VERSION}
oc get clusterversion
```
### Backup with Velero
```bash
# Install Velero
velero install --provider ${PROVIDER} --bucket ${BUCKET} --secret-file ${CREDS}
# Create backup
velero backup create ${BACKUP_NAME} --include-namespaces ${NS}
# Restore
velero restore create --from-backup ${BACKUP_NAME}
```
---
## 2. TROUBLESHOOTING
### Health Assessment
Run the bundled script for comprehensive health check:
```bash
bash scripts/cluster-health-check.sh
```
### Pod Status Interpretation
| Status | Meaning | Action |
|--------|---------|--------|
| `Pending` | Scheduling issue | Check resources, nodeSelector, tolerations |
| `CrashLoopBackOff` | Container crashing | Check logs: `kubectl logs ${POD} --previous` |
| `ImagePullBackOff` | Image unavailable | Verify image name, registry access |
| `OOMKilled` | Out of memory | Increase memory limits |
| `Evicted` | Node pressure | Check node resources |
### Debugging Commands
```bash
# Pod logs (current and previous)
kubectl logs ${POD} -c ${CONTAINER} --previous
# Multi-pod logs with stern
stern ${LABEL_SELECTOR} -n ${NS}
# Exec into pod
kubectl exec -it ${POD} -- /bin/sh
# Pod events
kubectl describe pod ${POD} | grep -A 20 Events
# Cluster events (sorted by time)
kubectl get events -A --sort-by='.lastTimestamp' | tail -50
```
### Network Troubleshooting
```bash
# Test DNS
kubectl run -it --rm debug --image=busybox -- nslookup kubernetes.default
# Test service connectivity
kubectl run -it --rm debug --image=curlimages/curl -- curl -v http://${SVC}.${NS}:${PORT}
# Check endpoints
kubectl get endpoints ${SVC}
```
---
## 3. MANIFEST GENERATION
### Production Deployment Template
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: ${APP_NAME}
namespace: ${NAMESPACE}
labels:
app.kubernetes.io/name: ${APP_NAME}
app.kubernetes.io/version: "${VERSION}"
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app.kubernetes.io/name: ${APP_NAME}
template:
metadata:
labels:
app.kubernetes.io/name: ${APP_NAME}
spec:
serviceAccountName: ${APP_NAME}
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
containers:
- name: ${APP_NAME}
image: ${IMAGE}:${TAG}
ports:
- name: http
containerPort: 8080
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
livenessProbe:
httpGet:
path: /healthz
port: http
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 5
periodSeconds: 5
volumeMounts:
- name: tmp
mountPath: /tmp
volumes:
- name: tmp
emptyDir: {}
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/name: ${APP_NAME}
topologyKey: kubernetes.io/hostname
```
### Service & Ingress
```yaml
apiVersion: v1
kind: Service
metadata:
name: ${APP_NAME}
spec:
selector:
app.kubernetes.io/name: ${APP_NAME}
ports:
- name: http
port: 80
targetPort: http
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ${APP_NAME}
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
ingressClassName: nginx
tls:
- hosts:
- ${HOST}
secretName: ${APP_NAME}-tls
rules:
- host: ${HOST}
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: ${APP_NAME}
port:
name: http
```
### OpenShift Route
```yaml
apiVersion: route.openshift.io/v1
kind: Route
metadata:
name: ${APP_NAME}
spec:
to:
kind: Service
name: ${APP_NAME}
port:
targetPort: http
tls:
termination: edge
insecureEdgeTerminationPolicy: Redirect
```
Use the bundled script for manifest generation:
```bash
bash scripts/generate-manifest.sh deployment myapp production
```
---
## 4. SECURITY
### Security Audit
Run the bundled script:
```bash
bash scripts/security-audit.sh [namespace]
```
### Pod Security Standards
```yaml
apiVersion: v1
kind: Namespace
metadata:
name: ${NAMESPACE}
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: baseline
pod-security.kubernetes.io/warn: restricted
```
### NetworkPolicy (Zero Trust)
```yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: ${APP_NAME}-policy
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: ${APP_NAME}
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app.kubernetes.io/name: frontend
ports:
- protocol: TCP
port: 8080
egress:
- to:
- podSelector:
matchLabels:
app.kubernetes.io/name: database
ports:
- protocol: TCP
port: 5432
# Allow DNS
- to:
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
```
### RBAC Best Practices
```yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: ${APP_NAME}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: ${APP_NAME}-role
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: ${APP_NAME}-binding
subjects:
- kind: ServiceAccount
name: ${APP_NAME}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: ${APP_NAME}-role
```
### Image Scanning
```bash
# Scan image with Trivy
trivy image ${IMAGE}:${TAG}
# Scan with severity filter
trivy image --severity HIGH,CRITICAL ${IMAGE}:${TAG}
# Generate SBOM
trivy image --format spdx-json -o sbom.json ${IMAGE}:${TAG}
```
---
## 5. GITOPS
### ArgoCD Application
```yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: ${APP_NAME}
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
repoURL: ${GIT_REPO}
targetRevision: main
path: k8s/overlays/${ENV}
destination:
server: https://kubernetes.default.svc
namespace: ${NAMESPACE}
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
```
### Kustomize Structure
```
k8s/
├── base/
│ ├── kustomization.yaml
│ ├── deployment.yaml
│ └── service.yaml
└── overlays/
├── dev/
│ └── kustomization.yaml
├── staging/
│ └── kustomization.yaml
└── prod/
└── kustomization.yaml
```
**base/kustomization.yaml:**
```yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- deployment.yaml
- service.yaml
```
**overlays/prod/kustomization.yaml:**
```yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../base
namePrefix: prod-
namespace: production
replicas:
- name: myapp
count: 5
images:
- name: myregistry/myapp
newTag: v1.2.3
```
### GitHub Actions CI/CD
```yaml
name: Build and Deploy
on:
push:
branches: [main]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build and push image
uses: docker/build-push-action@v5
with:
push: true
tags: ${{ secrets.REGISTRY }}/${{ github.event.repository.name }}:${{ github.sha }}
- name: Update Kustomize image
run: |
cd k8s/overlays/prod
kustomize edit set image myapp=${{ secrets.REGISTRY }}/${{ github.event.repository.name }}:${{ github.sha }}
- name: Commit and push
run: |
git config user.name "github-actions"
git config user.email "github-actions@github.com"
git add .
git commit -m "Update image to ${{ github.sha }}"
git push
```
Use the bundled script for ArgoCD sync:
```bash
bash scripts/argocd-app-sync.sh ${APP_NAME} --prune
```
---
## Helper Scripts
This skill includes automation scripts in the `scripts/` directory:
| Script | Purpose |
|--------|---------|
| `cluster-health-check.sh` | Comprehensive cluster health assessment with scoring |
| `security-audit.sh` | Security posture audit (privileged, root, RBAC, NetworkPolicy) |
| `node-maintenance.sh` | Safe node drain and maintenance prep |
| `pre-upgrade-check.sh` | Pre-upgrade validation checklist |
| `generate-manifest.sh` | Generate production-ready K8s manifests |
| `argocd-app-sync.sh` | ArgoCD application sync helper |
Run any script:
```bash
bash scripts/<script-name>.sh [arguments]
```
+6
View File
@@ -0,0 +1,6 @@
{
"ownerId": "kn7f82v7f3g1dtvm0gm74q016n7zz73v",
"slug": "kubernetes",
"version": "1.0.0",
"publishedAt": 1769436428875
}
@@ -0,0 +1,96 @@
#!/bin/bash
# argocd-app-sync.sh - ArgoCD application sync helper
# Usage: ./argocd-app-sync.sh <app-name> [--prune] [--force]
set -e
APP=${1:-""}
PRUNE=${2:-""}
FORCE=${3:-""}
if [ -z "$APP" ]; then
echo "Usage: $0 <app-name> [--prune] [--force]" >&2
echo "" >&2
echo "Available applications:" >&2
argocd app list --output name 2>/dev/null || kubectl get applications -A -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}'
exit 1
fi
echo "=== ARGOCD APPLICATION SYNC: $APP ===" >&2
echo "Timestamp: $(date -u +"%Y-%m-%dT%H:%M:%SZ")" >&2
echo "" >&2
# Check if argocd CLI is available
if command -v argocd &> /dev/null; then
USE_CLI=true
else
USE_CLI=false
echo "argocd CLI not found, using kubectl" >&2
fi
# Get current status
echo "### Current Status ###" >&2
if [ "$USE_CLI" = true ]; then
argocd app get "$APP" --refresh >&2
else
kubectl get application "$APP" -n argocd -o yaml | grep -A20 "status:" | head -25 >&2
fi
# Build sync options
SYNC_OPTS=""
if [ "$PRUNE" == "--prune" ]; then
SYNC_OPTS="$SYNC_OPTS --prune"
echo "Prune enabled: Will remove resources not defined in Git" >&2
fi
if [ "$FORCE" == "--force" ]; then
SYNC_OPTS="$SYNC_OPTS --force"
echo "Force enabled: Will replace resources that cannot be patched" >&2
fi
# Perform sync
echo -e "\n### Syncing Application ###" >&2
if [ "$USE_CLI" = true ]; then
argocd app sync "$APP" $SYNC_OPTS >&2
else
# Trigger sync via annotation
kubectl patch application "$APP" -n argocd --type=merge -p '{"operation":{"sync":{"revision":"HEAD"}}}' >&2
fi
# Wait for sync to complete
echo -e "\n### Waiting for Sync ###" >&2
if [ "$USE_CLI" = true ]; then
argocd app wait "$APP" --health --timeout 300 >&2
else
echo "Waiting for sync (check manually with kubectl)..." >&2
sleep 10
fi
# Final status
echo -e "\n### Final Status ###" >&2
if [ "$USE_CLI" = true ]; then
argocd app get "$APP" >&2
STATUS=$(argocd app get "$APP" -o json | jq -r '.status.sync.status')
HEALTH=$(argocd app get "$APP" -o json | jq -r '.status.health.status')
else
kubectl get application "$APP" -n argocd -o yaml | grep -A20 "status:" | head -25 >&2
STATUS=$(kubectl get application "$APP" -n argocd -o jsonpath='{.status.sync.status}')
HEALTH=$(kubectl get application "$APP" -n argocd -o jsonpath='{.status.health.status}')
fi
echo "" >&2
echo "========================================" >&2
echo "SYNC COMPLETE" >&2
echo "========================================" >&2
echo "Sync Status: $STATUS" >&2
echo "Health Status: $HEALTH" >&2
# Output JSON
cat << EOF
{
"application": "$APP",
"timestamp": "$(date -u +"%Y-%m-%dT%H:%M:%SZ")",
"sync_status": "$STATUS",
"health_status": "$HEALTH",
"success": $([ "$STATUS" == "Synced" ] && echo "true" || echo "false")
}
EOF
@@ -0,0 +1,131 @@
#!/bin/bash
# cluster-health-check.sh - Comprehensive cluster health assessment
# Usage: ./cluster-health-check.sh
set -e
echo "=== KUBERNETES CLUSTER HEALTH ASSESSMENT ===" >&2
echo "Timestamp: $(date -u +"%Y-%m-%dT%H:%M:%SZ")" >&2
echo "" >&2
SCORE=100
ISSUES=()
# 1. Node Health (Critical: -50 points per issue)
echo "### NODE HEALTH ###" >&2
UNHEALTHY_NODES=$(kubectl get nodes --no-headers | grep -vE "Ready\s+<none>|Ready\s+master|Ready\s+control-plane" | grep -c -E "NotReady|Unknown" || echo 0)
if [ "$UNHEALTHY_NODES" -gt 0 ]; then
SCORE=$((SCORE - 50))
ISSUES+=("BOOM: $UNHEALTHY_NODES unhealthy nodes detected")
kubectl get nodes | grep -E "NotReady|Unknown" >&2
else
echo "✓ All nodes healthy" >&2
fi
# 2. Pod Issues (Warning: -20 points)
echo -e "\n### POD HEALTH ###" >&2
POD_ISSUES=$(kubectl get pods -A --field-selector=status.phase!=Running,status.phase!=Succeeded --no-headers 2>/dev/null | wc -l | tr -d ' ')
if [ "$POD_ISSUES" -gt 0 ]; then
SCORE=$((SCORE - 20))
ISSUES+=("WARN: $POD_ISSUES pods not in Running/Succeeded state")
echo "Pods with issues:" >&2
kubectl get pods -A --field-selector=status.phase!=Running,status.phase!=Succeeded >&2
else
echo "✓ All pods running" >&2
fi
# 3. CrashLoopBackOff (Critical: -50 points)
echo -e "\n### CRASH LOOP DETECTION ###" >&2
CRASHLOOP=$(kubectl get pods -A -o json 2>/dev/null | jq -r '.items[] | select(.status.containerStatuses[]?.state.waiting?.reason == "CrashLoopBackOff") | "\(.metadata.namespace)/\(.metadata.name)"' | wc -l | tr -d ' ')
if [ "$CRASHLOOP" -gt 0 ]; then
SCORE=$((SCORE - 50))
ISSUES+=("BOOM: $CRASHLOOP pods in CrashLoopBackOff")
kubectl get pods -A -o json | jq -r '.items[] | select(.status.containerStatuses[]?.state.waiting?.reason == "CrashLoopBackOff") | "\(.metadata.namespace)/\(.metadata.name)"' >&2
else
echo "✓ No pods in CrashLoopBackOff" >&2
fi
# 4. Security - Privileged Containers (Critical: -50 points)
echo -e "\n### SECURITY - PRIVILEGED CONTAINERS ###" >&2
PRIVILEGED=$(kubectl get pods -A -o json 2>/dev/null | jq -r '[.items[] | select(.spec.containers[].securityContext.privileged == true)] | length')
if [ "$PRIVILEGED" -gt 0 ]; then
SCORE=$((SCORE - 50))
ISSUES+=("BOOM: $PRIVILEGED privileged containers detected")
kubectl get pods -A -o json | jq -r '.items[] | select(.spec.containers[].securityContext.privileged == true) | "\(.metadata.namespace)/\(.metadata.name)"' >&2
else
echo "✓ No privileged containers" >&2
fi
# 5. Resource Limits (Warning: -20 points)
echo -e "\n### RESOURCE CONFIGURATION ###" >&2
NO_LIMITS=$(kubectl get pods -A -o json 2>/dev/null | jq -r '[.items[] | select(.spec.containers[].resources.limits == null)] | length')
if [ "$NO_LIMITS" -gt 10 ]; then
SCORE=$((SCORE - 20))
ISSUES+=("WARN: $NO_LIMITS containers without resource limits")
else
echo "✓ Most containers have resource limits" >&2
fi
# 6. PVC Status (Warning: -20 points)
echo -e "\n### STORAGE HEALTH ###" >&2
PENDING_PVC=$(kubectl get pvc -A --field-selector=status.phase!=Bound --no-headers 2>/dev/null | wc -l | tr -d ' ')
if [ "$PENDING_PVC" -gt 0 ]; then
SCORE=$((SCORE - 20))
ISSUES+=("WARN: $PENDING_PVC PVCs not bound")
kubectl get pvc -A --field-selector=status.phase!=Bound >&2
else
echo "✓ All PVCs bound" >&2
fi
# 7. Recent Warning Events (Info: -5 points per 10 events)
echo -e "\n### RECENT WARNING EVENTS ###" >&2
WARNING_EVENTS=$(kubectl get events -A --field-selector=type=Warning --no-headers 2>/dev/null | wc -l | tr -d ' ')
if [ "$WARNING_EVENTS" -gt 50 ]; then
SCORE=$((SCORE - 5))
ISSUES+=("INFO: $WARNING_EVENTS warning events in cluster")
echo "Recent warning events: $WARNING_EVENTS" >&2
else
echo "✓ Warning events within normal range" >&2
fi
# OpenShift-specific checks
if command -v oc &> /dev/null && oc whoami &> /dev/null; then
echo -e "\n### OPENSHIFT CLUSTER OPERATORS ###" >&2
DEGRADED=$(oc get clusteroperators --no-headers 2>/dev/null | grep -c -E "False.*True|False.*False" || echo 0)
if [ "$DEGRADED" -gt 0 ]; then
SCORE=$((SCORE - 50))
ISSUES+=("BOOM: $DEGRADED cluster operators degraded/unavailable")
oc get clusteroperators | grep -E "False.*True|False.*False" >&2
else
echo "✓ All cluster operators healthy" >&2
fi
fi
# Ensure score doesn't go below 0
if [ "$SCORE" -lt 0 ]; then
SCORE=0
fi
# Output summary
echo "" >&2
echo "========================================" >&2
echo "CLUSTER HEALTH SCORE: $SCORE/100" >&2
echo "========================================" >&2
if [ ${#ISSUES[@]} -gt 0 ]; then
echo "" >&2
echo "ISSUES FOUND:" >&2
for issue in "${ISSUES[@]}"; do
echo " - $issue" >&2
done
fi
# Output JSON for programmatic use
cat << EOF
{
"score": $SCORE,
"timestamp": "$(date -u +"%Y-%m-%dT%H:%M:%SZ")",
"issues_count": ${#ISSUES[@]},
"healthy": $([ $SCORE -ge 80 ] && echo "true" || echo "false")
}
EOF
@@ -0,0 +1,370 @@
#!/bin/bash
# generate-manifest.sh - Generate production-ready Kubernetes manifests
# Usage: ./generate-manifest.sh <type> <name> [namespace]
set -e
TYPE=${1:-""}
NAME=${2:-""}
NAMESPACE=${3:-"default"}
VALID_TYPES="deployment statefulset service ingress configmap secret pvc networkpolicy hpa"
if [ -z "$TYPE" ] || [ -z "$NAME" ]; then
echo "Usage: $0 <type> <name> [namespace]" >&2
echo "" >&2
echo "Available types: $VALID_TYPES" >&2
exit 1
fi
echo "# Generated: $(date -u +"%Y-%m-%dT%H:%M:%SZ")"
echo "# Type: $TYPE | Name: $NAME | Namespace: $NAMESPACE"
echo ""
case $TYPE in
deployment)
cat << EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: $NAME
namespace: $NAMESPACE
labels:
app.kubernetes.io/name: $NAME
app.kubernetes.io/component: server
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app.kubernetes.io/name: $NAME
template:
metadata:
labels:
app.kubernetes.io/name: $NAME
spec:
serviceAccountName: $NAME
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
containers:
- name: $NAME
image: your-registry/$NAME:latest
imagePullPolicy: Always
ports:
- name: http
containerPort: 8080
protocol: TCP
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
livenessProbe:
httpGet:
path: /healthz
port: http
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
volumeMounts:
- name: tmp
mountPath: /tmp
volumes:
- name: tmp
emptyDir: {}
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/name: $NAME
topologyKey: kubernetes.io/hostname
EOF
;;
statefulset)
cat << EOF
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: $NAME
namespace: $NAMESPACE
labels:
app.kubernetes.io/name: $NAME
spec:
serviceName: $NAME-headless
replicas: 3
podManagementPolicy: OrderedReady
updateStrategy:
type: RollingUpdate
selector:
matchLabels:
app.kubernetes.io/name: $NAME
template:
metadata:
labels:
app.kubernetes.io/name: $NAME
spec:
serviceAccountName: $NAME
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
terminationGracePeriodSeconds: 30
containers:
- name: $NAME
image: your-registry/$NAME:latest
ports:
- name: tcp
containerPort: 5432
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
cpu: 1000m
memory: 2Gi
volumeMounts:
- name: data
mountPath: /data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: standard
resources:
requests:
storage: 10Gi
EOF
;;
service)
cat << EOF
apiVersion: v1
kind: Service
metadata:
name: $NAME
namespace: $NAMESPACE
labels:
app.kubernetes.io/name: $NAME
spec:
type: ClusterIP
ports:
- name: http
port: 80
targetPort: http
protocol: TCP
selector:
app.kubernetes.io/name: $NAME
EOF
;;
ingress)
cat << EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: $NAME
namespace: $NAMESPACE
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
ingressClassName: nginx
tls:
- hosts:
- $NAME.example.com
secretName: $NAME-tls
rules:
- host: $NAME.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: $NAME
port:
name: http
EOF
;;
configmap)
cat << EOF
apiVersion: v1
kind: ConfigMap
metadata:
name: $NAME
namespace: $NAMESPACE
labels:
app.kubernetes.io/name: $NAME
data:
config.yaml: |
# Add your configuration here
server:
port: 8080
host: "0.0.0.0"
EOF
;;
secret)
cat << EOF
apiVersion: v1
kind: Secret
metadata:
name: $NAME
namespace: $NAMESPACE
labels:
app.kubernetes.io/name: $NAME
type: Opaque
stringData:
# Replace with actual values before applying
API_KEY: "your-api-key-here"
DATABASE_URL: "postgresql://user:pass@host:5432/db"
EOF
;;
pvc)
cat << EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: $NAME
namespace: $NAMESPACE
labels:
app.kubernetes.io/name: $NAME
spec:
accessModes:
- ReadWriteOnce
storageClassName: standard
resources:
requests:
storage: 10Gi
EOF
;;
networkpolicy)
cat << EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: $NAME
namespace: $NAMESPACE
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: $NAME
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app.kubernetes.io/name: frontend
ports:
- protocol: TCP
port: 8080
egress:
- to:
- podSelector:
matchLabels:
app.kubernetes.io/name: database
ports:
- protocol: TCP
port: 5432
- to:
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
EOF
;;
hpa)
cat << EOF
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: $NAME
namespace: $NAMESPACE
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: $NAME
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 25
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
EOF
;;
*)
echo "Unknown type: $TYPE" >&2
echo "Valid types: $VALID_TYPES" >&2
exit 1
;;
esac
@@ -0,0 +1,97 @@
#!/bin/bash
# node-maintenance.sh - Safely drain and prepare node for maintenance
# Usage: ./node-maintenance.sh <node-name> [--force]
set -e
NODE=${1:-""}
FORCE=${2:-""}
if [ -z "$NODE" ]; then
echo "Usage: $0 <node-name> [--force]" >&2
echo "" >&2
echo "Available nodes:" >&2
kubectl get nodes --no-headers | awk '{print " " $1 " (" $2 ")"}'
exit 1
fi
echo "=== NODE MAINTENANCE: $NODE ===" >&2
echo "Timestamp: $(date -u +"%Y-%m-%dT%H:%M:%SZ")" >&2
echo "" >&2
# Verify node exists
if ! kubectl get node "$NODE" &>/dev/null; then
echo "Error: Node '$NODE' not found" >&2
exit 1
fi
# Show current status
echo "### Current Node Status ###" >&2
kubectl get node "$NODE" -o wide >&2
echo -e "\n### Pods on Node ###" >&2
POD_COUNT=$(kubectl get pods -A --field-selector spec.nodeName="$NODE" --no-headers | wc -l | tr -d ' ')
echo "Total pods: $POD_COUNT" >&2
kubectl get pods -A --field-selector spec.nodeName="$NODE" --no-headers | head -20 >&2
[ "$POD_COUNT" -gt 20 ] && echo "... and $((POD_COUNT - 20)) more" >&2
# Check for pods with PDBs that might block drain
echo -e "\n### Checking PodDisruptionBudgets ###" >&2
kubectl get pdb -A -o json 2>/dev/null | jq -r '.items[] | "\(.metadata.namespace)/\(.metadata.name): minAvailable=\(.spec.minAvailable // "N/A"), maxUnavailable=\(.spec.maxUnavailable // "N/A")"' >&2
# Confirmation
if [ "$FORCE" != "--force" ]; then
echo "" >&2
read -p "Proceed with cordoning and draining node $NODE? (yes/no): " confirm
if [ "$confirm" != "yes" ]; then
echo "Aborted." >&2
exit 0
fi
fi
# Step 1: Cordon the node
echo -e "\n### Step 1: Cordoning node ###" >&2
kubectl cordon "$NODE"
echo "✓ Node cordoned (unschedulable)" >&2
# Step 2: Drain the node
echo -e "\n### Step 2: Draining node ###" >&2
DRAIN_OPTS="--ignore-daemonsets --delete-emptydir-data --grace-period=60 --timeout=300s"
if [ "$FORCE" == "--force" ]; then
DRAIN_OPTS="$DRAIN_OPTS --force"
echo "Force mode enabled" >&2
fi
if kubectl drain "$NODE" $DRAIN_OPTS; then
echo "✓ Node drained successfully" >&2
else
echo "Warning: Drain completed with some issues" >&2
fi
# Step 3: Verify no pods remain (except daemonsets)
echo -e "\n### Step 3: Verification ###" >&2
REMAINING=$(kubectl get pods -A --field-selector spec.nodeName="$NODE" --no-headers 2>/dev/null | wc -l | tr -d ' ')
echo "Remaining pods on node: $REMAINING (should be daemonsets only)" >&2
kubectl get pods -A --field-selector spec.nodeName="$NODE" >&2
echo "" >&2
echo "========================================" >&2
echo "NODE MAINTENANCE READY" >&2
echo "========================================" >&2
echo "Node '$NODE' is now cordoned and drained." >&2
echo "" >&2
echo "Perform your maintenance tasks, then run:" >&2
echo " kubectl uncordon $NODE" >&2
echo "" >&2
# Output JSON
cat << EOF
{
"node": "$NODE",
"action": "drain",
"timestamp": "$(date -u +"%Y-%m-%dT%H:%M:%SZ")",
"remaining_pods": $REMAINING,
"status": "ready_for_maintenance"
}
EOF
@@ -0,0 +1,137 @@
#!/bin/bash
# pre-upgrade-check.sh - Pre-upgrade cluster validation
# Usage: ./pre-upgrade-check.sh
set -e
echo "=== PRE-UPGRADE CLUSTER VALIDATION ===" >&2
echo "Timestamp: $(date -u +"%Y-%m-%dT%H:%M:%SZ")" >&2
echo "" >&2
WARNINGS=0
BLOCKERS=0
# 1. Cluster Version
echo "### Cluster Version ###" >&2
SERVER_VERSION=$(kubectl version -o json 2>/dev/null | jq -r '.serverVersion.gitVersion')
echo "Server Version: $SERVER_VERSION" >&2
# 2. Node Status
echo -e "\n### Node Status ###" >&2
kubectl get nodes >&2
NOT_READY=$(kubectl get nodes --no-headers | grep -cv "Ready" || echo 0)
if [ "$NOT_READY" -gt 0 ]; then
BLOCKERS=$((BLOCKERS + 1))
echo "BLOCKER: $NOT_READY nodes not ready" >&2
fi
# 3. Control Plane Health
echo -e "\n### Control Plane Health ###" >&2
kubectl get pods -n kube-system -l tier=control-plane 2>/dev/null || \
kubectl get pods -n kube-system | grep -E "kube-apiserver|kube-controller|kube-scheduler|etcd" >&2
# 4. Pods Not Running
echo -e "\n### Pods Not Running ###" >&2
NOT_RUNNING=$(kubectl get pods -A --field-selector=status.phase!=Running,status.phase!=Succeeded --no-headers 2>/dev/null | wc -l | tr -d ' ')
if [ "$NOT_RUNNING" -gt 0 ]; then
WARNINGS=$((WARNINGS + 1))
echo "WARNING: $NOT_RUNNING pods not in Running/Succeeded state" >&2
kubectl get pods -A --field-selector=status.phase!=Running,status.phase!=Succeeded >&2
else
echo "✓ All pods running" >&2
fi
# 5. PodDisruptionBudgets
echo -e "\n### PodDisruptionBudgets ###" >&2
PDB_COUNT=$(kubectl get pdb -A --no-headers 2>/dev/null | wc -l | tr -d ' ')
echo "Found $PDB_COUNT PDBs" >&2
if [ "$PDB_COUNT" -gt 0 ]; then
kubectl get pdb -A >&2
fi
# 6. Pending PVCs
echo -e "\n### Pending PVCs ###" >&2
PENDING_PVC=$(kubectl get pvc -A --field-selector=status.phase=Pending --no-headers 2>/dev/null | wc -l | tr -d ' ')
if [ "$PENDING_PVC" -gt 0 ]; then
WARNINGS=$((WARNINGS + 1))
echo "WARNING: $PENDING_PVC PVCs pending" >&2
kubectl get pvc -A --field-selector=status.phase=Pending >&2
else
echo "✓ No pending PVCs" >&2
fi
# 7. Deprecated APIs
echo -e "\n### Deprecated API Usage ###" >&2
DEPRECATED=$(kubectl get --raw /metrics 2>/dev/null | grep -c "apiserver_requested_deprecated_apis" || echo 0)
if [ "$DEPRECATED" -gt 0 ]; then
WARNINGS=$((WARNINGS + 1))
echo "WARNING: Deprecated APIs may be in use" >&2
echo "Check: kubectl get --raw /metrics | grep apiserver_requested_deprecated_apis" >&2
else
echo "✓ No deprecated API metrics found" >&2
fi
# 8. etcd Health (if accessible)
echo -e "\n### etcd Health ###" >&2
ETCD_PODS=$(kubectl get pods -n kube-system -l component=etcd --no-headers 2>/dev/null | wc -l | tr -d ' ')
if [ "$ETCD_PODS" -gt 0 ]; then
kubectl get pods -n kube-system -l component=etcd >&2
else
echo "etcd pods not directly visible (managed platform)" >&2
fi
# 9. Resource Pressure
echo -e "\n### Node Resource Pressure ###" >&2
PRESSURE=$(kubectl get nodes -o json 2>/dev/null | jq -r '.items[] | select(.status.conditions[] | select(.type | contains("Pressure")) | .status == "True") | .metadata.name')
if [ -n "$PRESSURE" ]; then
WARNINGS=$((WARNINGS + 1))
echo "WARNING: Nodes under pressure:" >&2
echo "$PRESSURE" >&2
else
echo "✓ No resource pressure detected" >&2
fi
# OpenShift-specific checks
if command -v oc &> /dev/null && oc whoami &> /dev/null; then
echo -e "\n### OpenShift Cluster Operators ###" >&2
DEGRADED=$(oc get clusteroperators --no-headers 2>/dev/null | grep -c -E "False.*True|False.*False" || echo 0)
if [ "$DEGRADED" -gt 0 ]; then
BLOCKERS=$((BLOCKERS + 1))
echo "BLOCKER: $DEGRADED cluster operators degraded" >&2
oc get clusteroperators | grep -E "False.*True|False.*False" >&2
else
echo "✓ All cluster operators healthy" >&2
fi
fi
# Summary
echo "" >&2
echo "========================================" >&2
echo "PRE-UPGRADE CHECK SUMMARY" >&2
echo "========================================" >&2
echo "Blockers: $BLOCKERS" >&2
echo "Warnings: $WARNINGS" >&2
if [ "$BLOCKERS" -gt 0 ]; then
echo "" >&2
echo "❌ DO NOT PROCEED WITH UPGRADE" >&2
echo " Resolve blockers before upgrading" >&2
elif [ "$WARNINGS" -gt 0 ]; then
echo "" >&2
echo "⚠️ PROCEED WITH CAUTION" >&2
echo " Review warnings before upgrading" >&2
else
echo "" >&2
echo "✅ CLUSTER READY FOR UPGRADE" >&2
fi
# Output JSON
cat << EOF
{
"timestamp": "$(date -u +"%Y-%m-%dT%H:%M:%SZ")",
"server_version": "$SERVER_VERSION",
"blockers": $BLOCKERS,
"warnings": $WARNINGS,
"ready_for_upgrade": $([ $BLOCKERS -eq 0 ] && echo "true" || echo "false")
}
EOF
+149
View File
@@ -0,0 +1,149 @@
#!/bin/bash
# security-audit.sh - Kubernetes security posture assessment
# Usage: ./security-audit.sh [namespace]
set -e
NAMESPACE=${1:-""}
NS_FLAG=""
if [ -n "$NAMESPACE" ]; then
NS_FLAG="-n $NAMESPACE"
echo "=== SECURITY AUDIT: Namespace $NAMESPACE ===" >&2
else
NS_FLAG="-A"
echo "=== SECURITY AUDIT: All Namespaces ===" >&2
fi
echo "Timestamp: $(date -u +"%Y-%m-%dT%H:%M:%SZ")" >&2
echo "" >&2
FINDINGS=()
CRITICAL=0
WARNING=0
INFO=0
# 1. Privileged Containers (Critical)
echo "### Checking for privileged containers..." >&2
PRIVILEGED=$(kubectl get pods $NS_FLAG -o json 2>/dev/null | jq -r '.items[] | select(.spec.containers[].securityContext.privileged == true) | "\(.metadata.namespace)/\(.metadata.name)"')
if [ -n "$PRIVILEGED" ]; then
CRITICAL=$((CRITICAL + 1))
FINDINGS+=("CRITICAL: Privileged containers found")
echo "CRITICAL: Privileged containers:" >&2
echo "$PRIVILEGED" >&2
else
echo "✓ No privileged containers" >&2
fi
# 2. Containers Running as Root (Warning)
echo -e "\n### Checking for root containers..." >&2
ROOT_CONTAINERS=$(kubectl get pods $NS_FLAG -o json 2>/dev/null | jq -r '.items[] | select(.spec.securityContext.runAsNonRoot != true) | select(.spec.containers[].securityContext.runAsNonRoot != true) | "\(.metadata.namespace)/\(.metadata.name)"' | sort -u)
ROOT_COUNT=$(echo "$ROOT_CONTAINERS" | grep -c . || echo 0)
if [ "$ROOT_COUNT" -gt 0 ]; then
WARNING=$((WARNING + 1))
FINDINGS+=("WARNING: $ROOT_COUNT pods may run as root")
echo "WARNING: Pods without runAsNonRoot:" >&2
echo "$ROOT_CONTAINERS" | head -10 >&2
[ "$ROOT_COUNT" -gt 10 ] && echo "... and $((ROOT_COUNT - 10)) more" >&2
else
echo "✓ All pods have runAsNonRoot" >&2
fi
# 3. Host Namespace Access (Critical)
echo -e "\n### Checking for host namespace access..." >&2
HOST_ACCESS=$(kubectl get pods $NS_FLAG -o json 2>/dev/null | jq -r '.items[] | select(.spec.hostNetwork == true or .spec.hostPID == true or .spec.hostIPC == true) | "\(.metadata.namespace)/\(.metadata.name)"')
if [ -n "$HOST_ACCESS" ]; then
CRITICAL=$((CRITICAL + 1))
FINDINGS+=("CRITICAL: Host namespace access detected")
echo "CRITICAL: Pods with host namespace access:" >&2
echo "$HOST_ACCESS" >&2
else
echo "✓ No host namespace access" >&2
fi
# 4. Missing Resource Limits (Warning)
echo -e "\n### Checking for missing resource limits..." >&2
NO_LIMITS=$(kubectl get pods $NS_FLAG -o json 2>/dev/null | jq -r '[.items[] | select(.spec.containers[].resources.limits == null)] | length')
if [ "$NO_LIMITS" -gt 10 ]; then
WARNING=$((WARNING + 1))
FINDINGS+=("WARNING: $NO_LIMITS containers without resource limits")
echo "WARNING: $NO_LIMITS containers missing resource limits" >&2
else
echo "✓ Resource limits configured ($NO_LIMITS missing)" >&2
fi
# 5. Default Service Account Usage (Info)
echo -e "\n### Checking for default service account usage..." >&2
DEFAULT_SA=$(kubectl get pods $NS_FLAG -o json 2>/dev/null | jq -r '.items[] | select(.spec.serviceAccountName == "default" or .spec.serviceAccountName == null) | "\(.metadata.namespace)/\(.metadata.name)"')
DEFAULT_SA_COUNT=$(echo "$DEFAULT_SA" | grep -c . || echo 0)
if [ "$DEFAULT_SA_COUNT" -gt 0 ]; then
INFO=$((INFO + 1))
FINDINGS+=("INFO: $DEFAULT_SA_COUNT pods using default service account")
echo "INFO: Pods using default SA:" >&2
echo "$DEFAULT_SA" | head -10 >&2
else
echo "✓ No pods using default service account" >&2
fi
# 6. Wildcard RBAC (Critical)
echo -e "\n### Checking for overly permissive RBAC..." >&2
WILDCARD_ROLES=$(kubectl get clusterroles -o json 2>/dev/null | jq -r '.items[] | select(.rules[]?.verbs[]? == "*" and .rules[]?.resources[]? == "*") | .metadata.name')
if [ -n "$WILDCARD_ROLES" ]; then
CRITICAL=$((CRITICAL + 1))
FINDINGS+=("CRITICAL: Wildcard RBAC permissions found")
echo "CRITICAL: ClusterRoles with wildcard permissions:" >&2
echo "$WILDCARD_ROLES" >&2
else
echo "✓ No wildcard RBAC permissions" >&2
fi
# 7. Pods without NetworkPolicy (Info)
echo -e "\n### Checking NetworkPolicy coverage..." >&2
if [ -n "$NAMESPACE" ]; then
NP_COUNT=$(kubectl get networkpolicy -n $NAMESPACE --no-headers 2>/dev/null | wc -l | tr -d ' ')
if [ "$NP_COUNT" -eq 0 ]; then
INFO=$((INFO + 1))
FINDINGS+=("INFO: Namespace $NAMESPACE has no NetworkPolicies")
echo "INFO: No NetworkPolicies in $NAMESPACE" >&2
else
echo "$NP_COUNT NetworkPolicies found" >&2
fi
else
NS_WITHOUT_NP=0
for ns in $(kubectl get ns -o jsonpath='{.items[*].metadata.name}' 2>/dev/null); do
count=$(kubectl get networkpolicy -n $ns --no-headers 2>/dev/null | wc -l | tr -d ' ')
[ "$count" -eq 0 ] && NS_WITHOUT_NP=$((NS_WITHOUT_NP + 1))
done
if [ "$NS_WITHOUT_NP" -gt 0 ]; then
INFO=$((INFO + 1))
FINDINGS+=("INFO: $NS_WITHOUT_NP namespaces without NetworkPolicies")
echo "INFO: $NS_WITHOUT_NP namespaces lack NetworkPolicies" >&2
fi
fi
# Summary
echo "" >&2
echo "========================================" >&2
echo "SECURITY AUDIT SUMMARY" >&2
echo "========================================" >&2
echo "Critical Issues: $CRITICAL" >&2
echo "Warnings: $WARNING" >&2
echo "Informational: $INFO" >&2
echo "" >&2
if [ ${#FINDINGS[@]} -gt 0 ]; then
echo "FINDINGS:" >&2
for finding in "${FINDINGS[@]}"; do
echo " - $finding" >&2
done
fi
# Output JSON
cat << EOF
{
"timestamp": "$(date -u +"%Y-%m-%dT%H:%M:%SZ")",
"namespace": "${NAMESPACE:-all}",
"critical": $CRITICAL,
"warning": $WARNING,
"info": $INFO,
"compliant": $([ $CRITICAL -eq 0 ] && echo "true" || echo "false")
}
EOF
@@ -0,0 +1,7 @@
{
"version": 1,
"registry": "https://clawhub.ai",
"slug": "openclaw-backup",
"installedVersion": "1.0.0",
"installedAt": 1772497717962
}
+68
View File
@@ -0,0 +1,68 @@
---
name: openclaw-backup
description: Backup and restore OpenClaw data. Use when user asks to create backups, set up automatic backup schedules, restore from backup, or manage backup rotation. Handles ~/.openclaw directory archiving with proper exclusions.
---
# OpenClaw Backup
Backup and restore OpenClaw configuration, credentials, and workspace.
## Create Backup
Run the backup script:
```bash
./scripts/backup.sh [backup_dir]
```
Default backup location: `~/openclaw-backups/`
Output: `openclaw-YYYY-MM-DD_HHMM.tar.gz`
## What Gets Backed Up
- `openclaw.json` — main config
- `credentials/` — API keys, tokens
- `agents/` — agent configs, auth profiles
- `workspace/` — memory, SOUL.md, user files
- `telegram/` — session data
- `cron/` — scheduled tasks
## Excluded
- `completions/` — cache, regenerated automatically
- `*.log` — logs
## Setup Daily Backup with Cron
Use OpenClaw cron for daily backups with notification:
```json
{
"name": "daily-backup",
"schedule": {"kind": "cron", "expr": "0 3 * * *", "tz": "UTC"},
"payload": {
"kind": "agentTurn",
"message": "Run ~/.openclaw/backup.sh and report result to user."
},
"sessionTarget": "isolated",
"delivery": {"mode": "announce"}
}
```
## Restore
See [references/restore.md](references/restore.md) for step-by-step restore instructions.
Quick restore:
```bash
openclaw gateway stop
mv ~/.openclaw ~/.openclaw-old
tar -xzf ~/openclaw-backups/openclaw-YYYY-MM-DD_HHMM.tar.gz -C ~
openclaw gateway start
```
## Rotation
Script keeps last 7 backups automatically.
+6
View File
@@ -0,0 +1,6 @@
{
"ownerId": "kn767bpva5fcrrd9hhxcjnfnns80pzhp",
"slug": "openclaw-backup",
"version": "1.0.0",
"publishedAt": 1770475997563
}
@@ -0,0 +1,46 @@
# Restore OpenClaw from Backup
## Quick Restore
```bash
# 1. Stop OpenClaw
openclaw gateway stop
# 2. Backup current (safety)
mv ~/.openclaw ~/.openclaw-old
# 3. Extract backup
cd ~
tar -xzf ~/openclaw-backups/openclaw-YYYY-MM-DD_HHMM.tar.gz
# 4. Start OpenClaw
openclaw gateway start
# 5. Verify
openclaw status
```
## Rollback if Restore Fails
```bash
rm -rf ~/.openclaw
mv ~/.openclaw-old ~/.openclaw
openclaw gateway start
```
## What's in a Backup
```
~/.openclaw/
├── openclaw.json # Main config
├── credentials/ # API keys, tokens
├── agents/ # Agent configs, auth
├── workspace/ # Memory, SOUL.md, files
├── telegram/ # Telegram session
└── cron/ # Scheduled tasks
```
## Excluded from Backup
- `completions/` — API response cache (regenerated)
- `*.log` — Log files
+31
View File
@@ -0,0 +1,31 @@
#!/bin/bash
# OpenClaw Backup Script
# Usage: ./backup.sh [backup_dir]
BACKUP_DIR="${1:-$HOME/openclaw-backups}"
DATE=$(date +%Y-%m-%d_%H%M)
BACKUP_FILE="$BACKUP_DIR/openclaw-$DATE.tar.gz"
mkdir -p "$BACKUP_DIR"
# Create backup (exclude completions cache and logs)
tar -czf "$BACKUP_FILE" \
--exclude='completions' \
--exclude='*.log' \
-C "$HOME" .openclaw/ 2>/dev/null
if [ $? -eq 0 ]; then
SIZE=$(du -h "$BACKUP_FILE" | cut -f1)
# Rotate: keep only last 7 backups
ls -t "$BACKUP_DIR"/openclaw-*.tar.gz 2>/dev/null | tail -n +8 | xargs -r rm
COUNT=$(ls "$BACKUP_DIR"/openclaw-*.tar.gz 2>/dev/null | wc -l)
echo "✅ Backup created: $BACKUP_FILE ($SIZE)"
echo "📁 Total backups: $COUNT"
exit 0
else
echo "❌ Backup failed"
exit 1
fi
@@ -0,0 +1,7 @@
{
"version": 1,
"registry": "https://clawhub.ai",
"slug": "openclaw-healthcheck-cron",
"installedVersion": "1.0.0",
"installedAt": 1772497719586
}
+63
View File
@@ -0,0 +1,63 @@
---
name: openclaw-healthcheck-cron
description: Create and run a reusable OpenClaw deep healthcheck automation using a cron job plus a script. Use when setting up scheduled OpenClaw health audits, standardizing security/status checks, sanitizing environment-specific values for sharing, and packaging the setup for reuse.
---
# OpenClaw Healthcheck Cron Skill
Create a portable healthcheck automation that runs on a schedule and reports concise findings.
## Build the automation
1. Create a script at `scripts/healthcheck.sh` (or reuse the one in this skill).
2. Keep checks read-only by default.
3. Write artifacts to `/tmp/openclaw-healthcheck/YYYY-MM-DD/HHMMSS/`.
4. Return a compact summary with severity.
## Create the scheduled job
Use an isolated cron `agentTurn` job that runs twice daily (example: 6am and 7pm local time):
- `schedule.kind`: `cron`
- `schedule.expr`: `0 6,19 * * *`
- `schedule.tz`: set local timezone
- `sessionTarget`: `isolated`
- `payload.kind`: `agentTurn`
- `delivery.mode`: `announce` (or `none` if reporting is handled inside task)
Use this task pattern:
- Execute `bash scripts/healthcheck.sh`
- Parse summary line and emit:
- Verdict: `OK | MONITOR | NEEDS_ATTENTION`
- Counts: passed/warn/fail
- Artifact path
- Active issues + recommended next action
## Sanitize before publishing
Remove or parameterize all local identifiers:
- Usernames, hostnames, phone numbers, chat IDs
- API keys, tokens, webhook URLs
- Absolute personal paths (use placeholders or relative paths)
Replace with variables:
- `${HEALTHCHECK_OUTPUT_DIR:-/tmp/openclaw-healthcheck}`
- `${OPENCLAW_HEALTH_TZ:-America/New_York}`
- `${HEALTHCHECK_EXCLUDE:-small model,sandbox,groupPolicy}`
## Validate
1. Run script manually once.
2. Confirm artifact directory and summary format.
3. Run cron job once with `cron run`.
4. Verify final message is concise and actionable.
## Package
Package as a `.skill` zip containing only:
- `SKILL.md`
- `scripts/healthcheck.sh`
- `references/cron-job-example.json`
@@ -0,0 +1,6 @@
{
"ownerId": "kn79fk2chsvyrdvmrnrj03yzhn7zynfp",
"slug": "openclaw-healthcheck-cron",
"version": "1.0.0",
"publishedAt": 1771457831204
}
@@ -0,0 +1,18 @@
{
"name": "OpenClaw deep healthcheck (portable)",
"schedule": {
"kind": "cron",
"expr": "0 6,19 * * *",
"tz": "${OPENCLAW_HEALTH_TZ:-America/New_York}"
},
"payload": {
"kind": "agentTurn",
"message": "Run bash scripts/healthcheck.sh and report verdict, counts, artifact path, and active issues.",
"timeoutSeconds": 600
},
"sessionTarget": "isolated",
"delivery": {
"mode": "announce"
},
"enabled": true
}
@@ -0,0 +1,64 @@
#!/usr/bin/env bash
set -euo pipefail
OUT_ROOT="${HEALTHCHECK_OUTPUT_DIR:-/tmp/openclaw-healthcheck}"
EXCLUDE_RAW="${HEALTHCHECK_EXCLUDE:-small model,sandbox,groupPolicy}"
TS="$(date +%H%M%S)"
DAY="$(date +%F)"
OUT_DIR="${OUT_ROOT}/${DAY}/${TS}"
mkdir -p "$OUT_DIR"
LOG="$OUT_DIR/run.log"
SUMMARY_JSON="$OUT_DIR/summary.json"
# Keep this script read-only.
# If your environment has a richer checker, replace this section with that command.
passed=0
warn=0
fail=0
issues=()
check_cmd() {
local name="$1"
shift
if "$@" >/dev/null 2>&1; then
passed=$((passed+1))
echo "PASS: $name" >> "$LOG"
else
warn=$((warn+1))
issues+=("$name")
echo "WARN: $name" >> "$LOG"
fi
}
{
echo "[healthcheck] started: $(date -Is)"
echo "[healthcheck] out_dir: $OUT_DIR"
echo "[healthcheck] exclude: $EXCLUDE_RAW"
} > "$LOG"
check_cmd "openclaw binary available" command -v openclaw
check_cmd "gateway status command" openclaw gateway status
check_cmd "openclaw status command" openclaw status
if [[ $fail -gt 0 ]]; then
verdict="NEEDS_ATTENTION"
elif [[ $warn -gt 0 ]]; then
verdict="MONITOR"
else
verdict="OK"
fi
cat > "$SUMMARY_JSON" <<EOF
{
"verdict": "$verdict",
"passed": $passed,
"warn": $warn,
"fail": $fail,
"artifact_path": "$OUT_DIR",
"issues": [$(printf '"%s",' "${issues[@]:-}" | sed 's/,$//')]
}
EOF
echo "VERDICT=$verdict PASSED=$passed WARN=$warn FAIL=$fail ARTIFACT=$OUT_DIR"
@@ -0,0 +1,7 @@
{
"version": 1,
"registry": "https://clawhub.ai",
"slug": "searxng-local-search",
"installedVersion": "0.1.0",
"installedAt": 1772497721114
}
+80
View File
@@ -0,0 +1,80 @@
---
name: searxng-local-search
description: Search the web via the local self-hosted SearXNG instance and use Brave only as fallback. Use when gathering current information, docs, links, or fact checks, and when privacy/local-first search is preferred.
metadata:
openclaw:
requires:
bins: ["bb"]
env: ["SEARXNG_URL"]
emoji: "🔍"
nix:
plugin: "babashka"
---
# SearXNG Local Search
## Policy (default behavior)
1. Use **SearXNG first** for normal web lookups.
2. Fall back to **Brave** only when:
- SearXNG is unavailable,
- SearXNG returns very weak/empty results,
- or user explicitly asks for Brave/second opinion.
3. In research answers, label which source was used.
## Preconditions
- `SEARXNG_URL` points to the local instance.
- SearXNG JSON API is enabled.
- Script entrypoint is available: `scripts/search.sh`.
Preferred local value in this workspace is the LAN endpoint already documented in `TOOLS.md`.
## Quick usage
```bash
scripts/search.sh "your search query"
```
With options:
```bash
scripts/search.sh "your query" '{"category":"news","time_range":"week","num_results":8}'
```
Options:
- `category`: `general|news|images|videos|it|science`
- `time_range`: `day|week|month|year`
- `language`: ISO language code (default `en`)
- `num_results`: integer (default `5`)
## Smoke test routine
Run before first use in a fresh environment or after changes:
```bash
scripts/smoke.sh openclaw
```
Pass criteria:
- command exits successfully,
- returns at least one result,
- includes title + URL fields.
If smoke test fails:
1. Confirm `SEARXNG_URL` is reachable.
2. Confirm SearXNG container/service is healthy.
3. Retry with a broad query and no filters.
4. If still failing, switch to Brave fallback and report SearXNG incident.
## Troubleshooting
- **Connection/timeout**: verify endpoint + container health.
- **Empty results**: broaden query, remove filters, retry.
- **Bad JSON/format**: verify SearXNG JSON format support.
- **Rate concerns**: keep queries paced; avoid burst loops.
## Notes
- This skill defines behavior and checks; it does not replace the underlying SearXNG service deployment.
- For API details and response structure, see `references/api-guide.md`.
+6
View File
@@ -0,0 +1,6 @@
{
"ownerId": "kn78casstptqwp1nhzz6bxcjj1809hvc",
"slug": "searxng-local-search",
"version": "0.1.0",
"publishedAt": 1769835313265
}
@@ -0,0 +1,263 @@
# SearXNG API Reference
This document provides detailed information about the SearXNG JSON API used by the search skill.
## Endpoint
```
GET /search
```
## Query Parameters
### Required
| Parameter | Type | Description |
|-----------|------|-------------|
| `q` | string | The search query |
| `format` | string | Response format (use `json`) |
### Optional
| Parameter | Type | Description | Default |
|-----------|------|-------------|---------|
| `language` | string | Language code (en, es, de, fr, etc.) | `en` |
| `pageno` | integer | Page number for pagination | `1` |
| `time_range` | string | Time filter: `day`, `week`, `month`, `year` | None |
| `category_X` | string | Filter by category (set to `1` to enable) | None |
### Categories
Enable specific categories by setting `category_NAME=1`:
- `category_general` - General web search
- `category_images` - Image search
- `category_videos` - Video search
- `category_news` - News articles
- `category_map` - Maps and locations
- `category_music` - Music search
- `category_files` - File search
- `category_it` - IT/technical content
- `category_science` - Scientific articles
- `category_social` - Social media
## Response Format
```json
{
"query": "search query",
"number_of_results": 42,
"results": [
{
"url": "https://example.com/page",
"title": "Page Title",
"content": "Description snippet...",
"engine": "google",
"engines": ["google", "bing"],
"category": "general",
"score": 1.85,
"pretty_url": "https://example.com/page",
"parsed_url": ["https", "example.com", "/page", "", "", ""],
"publishedDate": "2024-01-15T12:00:00"
}
],
"answers": [],
"corrections": [],
"infoboxes": [],
"suggestions": ["related query 1", "related query 2"],
"unresponsive_engines": []
}
```
## Result Fields
| Field | Type | Description |
|-------|------|-------------|
| `url` | string | Full URL of the result |
| `title` | string | Page title |
| `content` | string | Description or snippet |
| `engine` | string | Primary search engine |
| `engines` | array | All engines that returned this result |
| `score` | float | Relevance score (higher is better) |
| `category` | string | Result category |
| `publishedDate` | string | Publication date (ISO 8601) |
## Example Requests
### Basic Search
```bash
curl "http://localhost:8888/search?q=NixOS&format=json"
```
### Category Filter
```bash
curl "http://localhost:8888/search?q=python&category_it=1&format=json"
```
### Time Range Filter
```bash
curl "http://localhost:8888/search?q=news&time_range=day&format=json"
```
### Multiple Filters
```bash
curl "http://localhost:8888/search?q=AI&category_news=1&time_range=week&language=en&format=json"
```
### Pagination
```bash
curl "http://localhost:8888/search?q=rust&pageno=2&format=json"
```
## Rate Limiting
SearXNG implements rate limiting to prevent abuse. The default configuration allows:
- IP-based rate limiting
- Bot detection via various heuristics
- Link token verification
If you receive a 429 (Too Many Requests) response:
- Wait a few seconds before retrying
- Implement exponential backoff
- Cache frequently-accessed results
## Error Responses
### 400 Bad Request
Missing required parameters or invalid format.
```json
{
"error": "Missing required parameter: q"
}
```
### 429 Too Many Requests
Rate limit exceeded.
```json
{
"error": "Rate limit exceeded"
}
```
### 500 Internal Server Error
SearXNG server error. Check logs:
```bash
journalctl -u searx -n 50
```
## Best Practices
### 1. Query Construction
- Keep queries concise (1-6 words is optimal)
- Use quotes for exact phrases: `"exact phrase"`
- Use boolean operators: `term1 OR term2`
- Exclude terms with minus: `query -excluded`
### 2. Result Handling
- Sort by score for best results
- Check multiple engines for reliability
- Handle empty results gracefully
- Respect `unresponsive_engines` field
### 3. Performance
- Cache results locally when possible
- Use appropriate timeouts (30s recommended)
- Implement retry logic with exponential backoff
- Monitor response times
### 4. Categories
Choose appropriate categories for your query:
| Query Type | Best Category |
|------------|---------------|
| Current events | `news` |
| Code/documentation | `it` |
| Research papers | `science` |
| How-to guides | `general` |
| Media content | `videos` or `images` |
### 5. Time Ranges
Use time filters for time-sensitive queries:
- `day` - Breaking news, stock prices
- `week` - Recent updates, current events
- `month` - Trends, ongoing stories
- `year` - Annual reports, yearly summaries
## Engine-Specific Notes
SearXNG aggregates results from multiple search engines. Common engines:
- **Google** - Broad coverage, good relevance
- **Bing** - Good for recent content
- **DuckDuckGo** - Privacy-focused
- **Wikipedia** - Encyclopedic content
- **Stack Overflow** - Programming Q&A
- **GitHub** - Code repositories
- **arXiv** - Scientific papers
Each result may come from multiple engines, indicated in the `engines` array.
## Troubleshooting
### No Results
1. Check query is not too specific
2. Remove filters and try again
3. Verify engines are responding:
```bash
journalctl -u searx | grep -i error
```
### Slow Responses
1. Check `unresponsive_engines` field
2. Increase timeout in client
3. Disable slow engines in SearXNG config
### Inconsistent Results
1. Results vary by engine availability
2. Check which engines responded: `engines` field
3. Consider using score for ranking
## Advanced Configuration
For custom SearXNG configurations, edit the NixOS module:
```nix
services.searx.settings.engines = [
{
name = "google";
weight = 1.5; # Boost Google results
}
{
name = "duckduckgo";
disabled = true; # Disable DDG
}
];
```
## Resources
- [SearXNG Documentation](https://docs.searxng.org/)
- [SearXNG GitHub](https://github.com/searxng/searxng)
- [Engine Configuration](https://docs.searxng.org/admin/engines/index.html)
- [API Documentation](https://docs.searxng.org/dev/search_api.html)
+152
View File
@@ -0,0 +1,152 @@
#!/usr/bin/env bb
(ns search
(:require [babashka.http-client :as http]
[cheshire.core :as json]
[clojure.string :as str]
[clojure.java.io :as io]))
(def default-endpoints
["http://localhost:8888"
"http://127.0.0.1:8888"
"http://192.168.153.113:18803"
"http://192.168.153.117:18803"])
(def min-delay-ms 1000)
(def timeout-ms 30000)
(def rate-file ".searxng-last-request")
(defn parse-options [s]
(if (or (nil? s) (str/blank? s))
{}
(try
(json/parse-string s true)
(catch Exception e
(binding [*out* *err*]
(println "Error: invalid options JSON")
(println (.getMessage e)))
(System/exit 2)))))
(defn now-ms [] (System/currentTimeMillis))
(defn last-request-ms []
(try
(when (.exists (io/file rate-file))
(Long/parseLong (str/trim (slurp rate-file))))
(catch Exception _ nil)))
(defn write-last-request! [ts]
(spit rate-file (str ts)))
(defn enforce-rate-limit! []
(when-let [last-ts (last-request-ms)]
(let [elapsed (- (now-ms) last-ts)]
(when (< elapsed min-delay-ms)
(Thread/sleep (- min-delay-ms elapsed))))))
(defn endpoint-candidates []
(let [env-url (some-> (System/getenv "SEARXNG_URL") str/trim)]
(if (and env-url (not (str/blank? env-url)))
(cons env-url default-endpoints)
default-endpoints)))
(defn category->param [category]
(when (and category (not= "general" category))
{(keyword (str "category_" category)) "1"}))
(defn build-params [query opts]
(merge
{:q query
:format "json"
:language (or (:language opts) "en")}
(when-let [tr (:time_range opts)] {:time_range tr})
(when-let [n (:num_results opts)] {:pageno 1 :count n})
(category->param (:category opts))))
(defn try-search [base-url params]
(let [url (str (str/replace base-url #"/$" "") "/search")]
(try
(let [resp (http/get url
{:query-params params
:timeout timeout-ms
:throw false
:headers {"accept" "application/json"}})]
(cond
(= 200 (:status resp))
{:ok true
:endpoint base-url
:body (json/parse-string (:body resp) true)}
(= 429 (:status resp))
{:ok false :retryable true :endpoint base-url :error "Rate limit exceeded (429)"}
:else
{:ok false :retryable true :endpoint base-url
:error (format "HTTP %s" (:status resp))}))
(catch Exception e
{:ok false :retryable true :endpoint base-url :error (.getMessage e)}))))
(defn top-results [results n]
(->> (or results [])
(sort-by (fn [r] (double (or (:score r) 0.0))) >)
(take n)))
(defn fmt-engines [r]
(let [engs (or (:engines r)
(when-let [e (:engine r)] [e])
[])]
(if (seq engs)
(str/join ", " engs)
"unknown")))
(defn print-results [query body num-results endpoint]
(let [total (or (:number_of_results body) (count (:results body)) 0)
results (top-results (:results body) num-results)]
(println (format "Search Results for \"%s\"" query))
(println (format "Found %s total results" total))
(println (format "Endpoint: %s" endpoint))
(println)
(if (seq results)
(doseq [[idx r] (map-indexed vector results)]
(println (format "%d. %s [Score: %.2f]"
(inc idx)
(or (:title r) "(untitled)")
(double (or (:score r) 0.0))))
(println (str " URL: " (or (:url r) "N/A")))
(println (str " " (or (:content r) "No description available.")))
(println (str " Engines: " (fmt-engines r)))
(println))
(println "No results found."))))
(defn usage []
(binding [*out* *err*]
(println "Usage: bb scripts/search.clj \"query\" '{\"category\":\"news\",\"time_range\":\"day\",\"num_results\":5}'")
(println)
(println "Options JSON keys: category, time_range, language, num_results")))
(defn -main [& args]
(let [[query opts-json] args]
(when (or (nil? query) (str/blank? query))
(usage)
(System/exit 1))
(let [opts (parse-options opts-json)
num-results (max 1 (min 20 (int (or (:num_results opts) 5))))
params (build-params query opts)]
(enforce-rate-limit!)
(write-last-request! (now-ms))
(loop [[endpoint & rest] (endpoint-candidates)
failures []]
(if (nil? endpoint)
(do
(binding [*out* *err*]
(println "Error: all SearXNG endpoints failed")
(doseq [{:keys [endpoint error]} failures]
(println (format "- %s -> %s" endpoint error))))
(System/exit 3))
(let [res (try-search endpoint params)]
(if (:ok res)
(print-results query (:body res) num-results endpoint)
(recur rest (conj failures (select-keys res [:endpoint :error]))))))))))
(apply -main *command-line-args*)
+21
View File
@@ -0,0 +1,21 @@
#!/usr/bin/env bash
set -euo pipefail
ROOT="/home/openclaw/.openclaw/workspace"
SKILL_DIR="$ROOT/skills/searxng-local-search"
ENV_FILE="$ROOT/.env"
if [[ -f "$ENV_FILE" ]]; then
set -a
# shellcheck disable=SC1090
source "$ENV_FILE"
set +a
fi
if [[ $# -lt 1 ]]; then
echo "Usage: scripts/search.sh \"query\" '[{"category":"news","time_range":"day","num_results":5}]'" >&2
echo "Example: scripts/search.sh \"openclaw ai\" '{\"num_results\":3}'" >&2
exit 1
fi
exec bb "$SKILL_DIR/scripts/search.clj" "$@"
+21
View File
@@ -0,0 +1,21 @@
#!/usr/bin/env bash
set -euo pipefail
SEARXNG_URL="${SEARXNG_URL:-http://192.168.153.113:18803}"
QUERY="${1:-test}"
echo "[smoke] endpoint: ${SEARXNG_URL}"
echo "[smoke] query: ${QUERY}"
echo "[smoke] curl json API..."
ENC_QUERY="$(python3 -c 'import urllib.parse,sys; print(urllib.parse.quote(sys.argv[1]))' "${QUERY}")"
curl -fsS --max-time 15 "${SEARXNG_URL%/}/search?q=${ENC_QUERY}&format=json" > /tmp/searx-smoke.json
echo "[smoke] validating response..."
python3 - <<'PY'
import json
p='/tmp/searx-smoke.json'
obj=json.load(open(p))
print('[ok] query:', obj.get('query'))
print('[ok] results:', len(obj.get('results', [])))
PY
+46
View File
@@ -0,0 +1,46 @@
---
name: swarm-kubectl-safe
description: Safe Kubernetes operations for the homelab cluster using the scoped swarm namespace. Use when deploying, inspecting, debugging, or scaling workloads in namespace swarm while keeping Raspberry Pi resource usage low and avoiding cluster-wide/destructive commands.
---
# Swarm Kubectl Safe
Use this skill for Kubernetes actions in the shared `swarm` namespace.
## Defaults
- Use kubeconfig: `~/.openclaw/credentials/kubeconfig-swarm.yaml`
- Use namespace: `swarm`
- Prefer lightweight deployments (Pi cluster):
- replicas: `1` by default
- CPU request: `25m-100m`
- Memory request: `64Mi-256Mi`
## Safe command wrapper
Use:
```bash
skills/swarm-kubectl-safe/scripts/kubectl-swarm.sh get pods
skills/swarm-kubectl-safe/scripts/kubectl-swarm.sh describe pod <name>
skills/swarm-kubectl-safe/scripts/kubectl-swarm.sh apply -f <file>
```
The wrapper automatically applies kubeconfig + namespace and blocks obviously dangerous cluster-wide delete operations.
## Deployment workflow
1. Inspect current namespace state:
- `.../kubectl-swarm.sh get deploy,po,svc,ingress`
2. Apply/update manifest with conservative resources.
3. Verify rollout:
- `.../kubectl-swarm.sh rollout status deploy/<name>`
4. Inspect logs/events if unhealthy:
- `.../kubectl-swarm.sh logs deploy/<name> --tail=100`
- `.../kubectl-swarm.sh get events --sort-by=.lastTimestamp | tail -n 30`
## Shared-namespace etiquette
- Use unique, prefixed names (example: `zap-<app>`).
- Avoid deleting resources you didnt create unless explicitly asked.
- Prefer `patch`/`rollout restart` over delete/recreate.
+27
View File
@@ -0,0 +1,27 @@
#!/usr/bin/env bash
set -euo pipefail
KUBECONFIG_PATH="${KUBECONFIG_PATH:-$HOME/.openclaw/credentials/kubeconfig-swarm.yaml}"
NAMESPACE="${NAMESPACE:-swarm}"
if [[ ! -f "$KUBECONFIG_PATH" ]]; then
echo "Missing kubeconfig: $KUBECONFIG_PATH" >&2
exit 1
fi
if [[ $# -eq 0 ]]; then
echo "Usage: $0 <kubectl args...>" >&2
exit 1
fi
# Block clearly dangerous cluster-wide delete attempts unless explicitly overridden.
if [[ "${ALLOW_UNSAFE_KUBECTL:-}" != "1" ]]; then
if [[ "$1" == "delete" ]]; then
if printf '%s\n' "$*" | grep -Eq '(^| )(-A|--all-namespaces)( |$)|\bnamespaces?\b|\bnodes?\b|\bpv\b|\bpersistentvolumes\b'; then
echo "Blocked potentially unsafe delete. Set ALLOW_UNSAFE_KUBECTL=1 to override." >&2
exit 2
fi
fi
fi
exec kubectl --kubeconfig "$KUBECONFIG_PATH" -n "$NAMESPACE" "$@"
+55
View File
@@ -0,0 +1,55 @@
---
name: task-capture-and-followup
description: Capture requests as structured tasks, track status over time, and send low-noise follow-ups for due/overdue/blocked items. Use when users ask to remember tasks, convert chat asks into todos, track commitments, or request reminders/check-ins.
---
# Task Capture and Follow-up
## Storage policy
- Keep active state in `memory/tasks.json`.
- Append notable updates to daily note `memory/YYYY-MM-DD.md` when useful for continuity.
- Never store secrets unless user explicitly asks.
## Task schema
Required fields:
- `id`
- `created_at`
- `title`
- `owner`
- `priority` (`low|med|high`)
- `status` (`open|in-progress|blocked|done`)
Optional fields:
- `details`
- `due_at`
- `next_check_at`
- `notes`
## Capture workflow
1. Detect action-oriented request.
2. Infer missing fields conservatively.
3. Ask only for critical missing info (typically due date or priority).
4. Persist task.
5. Return compact confirmation with id + next check moment.
## Follow-up policy
Nudge only when high signal:
- due within 24h,
- overdue,
- blocked,
- or explicit user-requested reminder window.
Anti-spam:
- no repeated nudge inside a short window,
- batch multiple items in one update,
- suppress low-importance repeats.
## Completion policy
- Mark done only on explicit user confirmation.
- Preserve short completion note for future context.
- Keep historical records unless user asks to prune.
+39
View File
@@ -0,0 +1,39 @@
---
name: telegram-ops
description: Set up, validate, and operate Telegram messaging in OpenClaw. Use when adding Telegram, debugging delivery/replies, mapping chat IDs, validating channel/account routing, or creating Telegram notification workflows with low-noise test sends.
---
# Telegram Ops
## Operating policy
1. Use first-class OpenClaw messaging tools only.
2. Explicitly set `channel: telegram` when channel ambiguity exists.
3. Ask before messaging a new target/chat.
4. Run one minimal test send per change, then stop.
5. Record stable environment-specific routing notes in `TOOLS.md`.
## Standard workflow
1. Identify intent: setup | debug | automation.
2. Verify target metadata (chat/user id, account, expected direction).
3. Validate outbound with a short plain-text test.
4. Validate inbound reply path back to current session.
5. Summarize status and exact next action.
## Debug decision tree
- Wrong destination -> re-check target id/name and account mapping.
- No delivery -> verify channel/account selection and retry minimal payload.
- Delivery works, reply missing -> check routing/session mapping for inbound path.
- Intermittent -> reduce formatting/media, confirm with plain text, then re-add complexity.
## Output contract
Return:
- **State:** working | partial | blocked
- **Validated:** exact checks that passed
- **Failed at:** first failing step
- **Next action:** smallest user/actionable fix
- **Safety hold:** what was not sent without approval
+30
View File
@@ -0,0 +1,30 @@
---
name: whisper-local-safe
description: Local audio transcription via whisper.cpp server on the LAN. Use when transcribing voice notes or audio files with the local whisper-server (no external API), with automatic endpoint fallback and plain text output.
---
# Whisper Local Safe
Use this skill to transcribe local audio files against:
- `http://192.168.153.113:18801`
- `http://192.168.153.117:18801`
## Run
```bash
skills/whisper-local-safe/scripts/transcribe-local.sh /path/to/audio.m4a
```
Optional env vars:
- `WHISPER_BASE_URL` (override endpoint)
- `WHISPER_MODEL` (default `base`)
- `WHISPER_LANGUAGE` (default `auto`)
- `WHISPER_OUTPUT` (path for transcript output)
## Behavior
- Tries `whisper.cpp` endpoint first: `/inference`
- Falls back to OpenAI-style endpoint: `/v1/audio/transcriptions`
- Returns transcript text to stdout and optional output file
+47
View File
@@ -0,0 +1,47 @@
#!/usr/bin/env bash
set -euo pipefail
AUDIO_FILE="${1:-}"
if [[ -z "$AUDIO_FILE" || ! -f "$AUDIO_FILE" ]]; then
echo "Usage: $0 /path/to/audio-file" >&2
exit 1
fi
WHISPER_BASE_URL="${WHISPER_BASE_URL:-http://192.168.153.117:18801}"
WHISPER_MODEL="${WHISPER_MODEL:-base}"
WHISPER_LANGUAGE="${WHISPER_LANGUAGE:-auto}"
WHISPER_OUTPUT="${WHISPER_OUTPUT:-}"
TMP_JSON="$(mktemp)"
trap 'rm -f "$TMP_JSON"' EXIT
# 1) whisper.cpp style
if curl -fsS -m 120 \
-F "file=@${AUDIO_FILE}" \
-F "model=${WHISPER_MODEL}" \
-F "language=${WHISPER_LANGUAGE}" \
"${WHISPER_BASE_URL}/inference" >"$TMP_JSON" 2>/dev/null; then
:
# 2) OpenAI style fallback
elif curl -fsS -m 120 \
-H "Accept: application/json" \
-F "file=@${AUDIO_FILE}" \
-F "model=${WHISPER_MODEL}" \
"${WHISPER_BASE_URL}/v1/audio/transcriptions" >"$TMP_JSON" 2>/dev/null; then
:
else
echo "Transcription request failed at ${WHISPER_BASE_URL}" >&2
exit 2
fi
TEXT="$(grep -o '"text"[[:space:]]*:[[:space:]]*"[^"]*"' "$TMP_JSON" | sed 's/.*"text"[[:space:]]*:[[:space:]]*"//; s/"$//')"
if [[ -z "$TEXT" ]]; then
# fallback: print raw json if text field parsing failed
cat "$TMP_JSON"
exit 0
fi
echo "$TEXT"
if [[ -n "$WHISPER_OUTPUT" ]]; then
printf '%s\n' "$TEXT" > "$WHISPER_OUTPUT"
fi