Compare commits

...

6 Commits

Author SHA1 Message Date
OpenCode Test
9ae8ff85c3 Update local config and plugin metadata
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 13:01:02 -08:00
OpenCode Test
f9e9be62bc Add pi50 resource optimization plan, mark monitoring design complete
- New plan: Improve pi50 control plane resource usage
- Completed: Workstation monitoring design status file

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 13:00:57 -08:00
OpenCode Test
5b9a85cd37 Update state: format future-considerations, add session history
- future-considerations: Pretty-print JSON, update fc-001 to pending status
- history/index: Add recent session entries

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 13:00:52 -08:00
OpenCode Test
91733f5460 Fix gtasks OAuth scope handling and add ArgoCD docs to RAG
- gtasks: Add force_reauth option to recover from invalid_scope errors
- rag-search: Index ArgoCD documentation for semantic search

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 13:00:46 -08:00
OpenCode Test
380e2005c8 Regenerate morning report for 2026-01-05
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 13:00:40 -08:00
OpenCode Test
62050faedc Add workstation monitoring design 2026-01-05 01:31:10 -08:00
11 changed files with 1067 additions and 26 deletions

View File

@@ -1,6 +1,6 @@
--- ---
active: true active: true
iteration: 16 iteration: 36
max_iterations: 0 max_iterations: 0
completion_promise: "The morning-report skill is fully implemented, tested, and registered" completion_promise: "The morning-report skill is fully implemented, tested, and registered"
started_at: "2026-01-03T08:16:44Z" started_at: "2026-01-03T08:16:44Z"

View File

@@ -0,0 +1,17 @@
{
"plan": "2026-01-05-workstation-monitoring-design.md",
"status": "COMPLETE",
"completed_at": "2026-01-05T14:09:00Z",
"implementation": {
"node_exporter": "installed and running (v1.10.2-1)",
"scrape_config": "deployed (workstation-scrape)",
"prometheus_rule": "deployed (workstation-alerts, 12 rules)",
"prometheus_target": "UP and scraping",
"git_commit": "9d17ac8",
"network_solution": "Tailscale (100.90.159.78:9100)"
},
"verification": {
"all_success_criteria_met": true,
"verified_at": "2026-01-05T14:09:19Z"
}
}

View File

@@ -0,0 +1,296 @@
# Workstation Monitoring Design
## Overview
Deploy comprehensive monitoring for the Arch Linux workstation (willlaptop) by integrating with the existing k8s monitoring stack. This will enable proactive alerting for resource exhaustion, long-term capacity planning, and performance debugging.
**Reference:** Future consideration `fc-001` (workstation monitoring)
## Current Infrastructure
- **Workstation:** Arch Linux on MacBookPro9,2 (hostname: willlaptop)
- **K8s Cluster:** kube-prometheus-stack deployed with Prometheus, Alertmanager, Grafana
- **Network:** Direct network connectivity between workstation and cluster nodes
- **Existing Monitoring:** 3 node_exporters on cluster nodes, cluster-level alerts configured
## Architecture
### Components
```
┌─────────────────┐ HTTP/9100 ┌──────────────────────┐
│ Workstation │ ──────────────────> │ K8s Prometheus │
│ (willlaptop) │ scrape every 15s │ (monitoring ns) │
│ │ │ │
│ node_exporter │ │ workstation rules │
│ systemd service│ │ + scrape config │
└─────────────────┘ └──────────────────────┘
v
┌──────────────────────┐
│ Alertmanager │
│ (existing setup) │
│ unified routing │
└──────────────────────┘
```
### Data Flow
1. **node_exporter** exposes metrics on `http://willlaptop:9100/metrics`
2. **Prometheus** scrapes metrics every 15s via static target configuration
3. **PrometheusRule** evaluates workstation-specific alert rules
4. **Alertmanager** routes alerts to existing notification channels
## Workstation Deployment
### node_exporter Service
**Installation:**
```bash
pacman -S prometheus-node-exporter
```
**Systemd Configuration:**
- Service: `node_exporter.service`
- User: `node_exporter` (created by package)
- Listen address: `0.0.0.0:9100`
- Restart policy: `always` with 10s delay
- Logging: systemd journal (`journalctl -u node_exporter`)
**ExecStart flags:**
```bash
/usr/bin/node_exporter --collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($|/)
```
Excludes system mounts to reduce noise.
**Firewall Configuration:**
- Allow TCP 9100 from cluster nodes
- Use ufw or iptables to restrict access
**Metrics Collected:**
All default collectors except resource-intensive ones:
- CPU, memory, filesystem, network
- System stats (uptime, load average, systemd)
- Thermal (if available on hardware)
- Disk I/O
## Prometheus Configuration
### Static Scrape Target
**Job configuration:**
- Job name: `workstation/willlaptop`
- Target: `willlaptop:9100` (DNS resolution) or workstation IP
- Scrape interval: `15s` (matches cluster node_exporter)
- Scrape timeout: `10s`
- Metrics path: `/metrics`
- Honor labels: `true`
**Relabeling rules:**
- Add `env: "workstation"` label for identification
- Preserve `instance: "willlaptop"` from target
**Integration:**
Add to existing Prometheus CRD configuration in kube-prometheus-stack. This can be done via:
- PrometheusRule with additional scrape config, or
- Direct modification of Prometheus configuration
## Alert Rules
### PrometheusRule Resource
**Namespace:** `monitoring`
**Kind:** `PrometheusRule`
**Labels:** Standard discovery labels for Prometheus operator
### Alert Categories
#### Critical Alerts (Paging)
1. **WorkstationDiskSpaceCritical**
- Condition: `<5%` free on any mounted filesystem
- Duration: 5m
- Severity: critical
2. **WorkstationMemoryCritical**
- Condition: `>95%` memory usage
- Duration: 5m
- Severity: critical
3. **WorkstationCPUCritical**
- Condition: `>95%` CPU usage
- Duration: 10m
- Severity: critical
4. **WorkstationSystemdFailed**
- Condition: Failed systemd units detected
- Duration: 5m
- Severity: critical
#### Warning Alerts (Email/Slack)
1. **WorkstationDiskSpaceWarning**
- Condition: `<10%` free on any mounted filesystem
- Duration: 10m
- Severity: warning
2. **WorkstationMemoryWarning**
- Condition: `>85%` memory usage
- Duration: 10m
- Severity: warning
3. **WorkstationCPUWarning**
- Condition: `>80%` CPU usage
- Duration: 15m
- Severity: warning
4. **WorkstationLoadHigh**
- Condition: 5m load average > # CPU cores
- Duration: 10m
- Severity: warning
5. **WorkstationDiskInodeWarning**
- Condition: `<10%` inodes free
- Duration: 10m
- Severity: warning
6. **WorkstationNetworkErrors**
- Condition: High packet loss or error rate
- Duration: 10m
- Severity: warning
#### Info Alerts (Log Only)
1. **WorkstationDiskSpaceInfo**
- Condition: `<20%` free on any mounted filesystem
- Duration: 15m
- Severity: info
2. **WorkstationUptime**
- Condition: System uptime metric (recording rule)
- Severity: info
### Alert Annotations
Each alert includes:
- `summary`: Brief description
- `description`: Detailed explanation with metric values
- `runbook_url`: Link to troubleshooting documentation (if available)
## Versioning
### Repository Structure
```
~/.claude/repos/homelab/charts/willlaptop-monitoring/
├── prometheus-rules.yaml # PrometheusRule for workstation alerts
├── values.yaml # Configuration values
└── README.md # Documentation
```
### Values.yaml Configuration
Configurable parameters:
```yaml
workstation:
hostname: willlaptop
ip: <workstation_ip> # optional, fallback to DNS
scrape:
interval: 15s
timeout: 10s
alerts:
disk:
critical_percent: 5
warning_percent: 10
info_percent: 20
memory:
critical_percent: 95
warning_percent: 85
cpu:
critical_percent: 95
critical_duration: 10m
warning_percent: 80
warning_duration: 15m
```
### Integration with ArgoCD
Follows existing GitOps pattern (charts/kube-prometheus-stack). Can be added to ArgoCD for automated deployments if desired.
## Testing and Verification
### Phase 1 - Workstation Deployment
1. Verify node_exporter installation:
```bash
pacman -Q prometheus-node-exporter
```
2. Check systemd service status:
```bash
systemctl status node_exporter
```
3. Verify metrics endpoint locally:
```bash
curl http://localhost:9100/metrics | head -20
```
4. Test accessibility from cluster:
```bash
kubectl run -it --rm debug --image=curlimages/curl -- curl willlaptop:9100/metrics
```
### Phase 2 - Prometheus Integration
1. Verify Prometheus target:
- Access Prometheus UI → Targets → workstation/willlaptop
- Confirm target is UP
2. Verify metric ingestion:
```bash
# Query in Prometheus UI
node_up{instance="willlaptop"}
```
3. Verify label injection:
- Confirm `env="workstation"` label appears on metrics
### Phase 3 - Alert Verification
1. Review PrometheusRule:
```bash
kubectl get prometheusrule workstation-alerts -n monitoring -o yaml
```
2. Verify rule evaluation:
- Access Prometheus UI → Rules
- Confirm workstation rules are active
3. Test critical alert:
- Temporarily trigger a low disk alert (or simulate)
- Verify alert fires in Prometheus UI
4. Verify Alertmanager integration:
- Check Alertmanager UI → Alerts
- Confirm workstation alerts are received
## Success Criteria
- [ ] node_exporter running on workstation
- [ ] Metrics accessible from cluster nodes
- [ ] Prometheus scraping workstation metrics
- [ ] Alert rules evaluated and firing correctly
- [ ] Alerts routing through Alertmanager
- [ ] Configuration versioned in homelab repository
- [ ] Documentation complete
## Future Enhancements
- Grafana dashboards for workstation metrics
- Alert tuning based on observed patterns
- Additional collectors (e.g., temperature sensors if available)
- Integration with morning-report skill for health status

View File

@@ -0,0 +1,171 @@
# Plan: Improve pi50 (Control Plane) Resource Usage
## Problem Summary
pi50 (control plane) is running at **73% CPU / 81% memory** while worker nodes have significant headroom:
- pi3: 7% CPU / 65% memory (but only 800MB RAM - memory constrained)
- pi51: 18% CPU / 64% memory (8GB RAM - plenty of capacity)
**Root cause**: pi50 has **NO control-plane taint**, so the scheduler treats it as a general worker node. It currently runs ~85 pods vs 38 on pi51.
## Current State
| Node | Role | CPUs | Memory | CPU Used | Mem Used | Pods |
|------|------|------|--------|----------|----------|------|
| pi50 | control-plane | 4 | 8GB | 73% | 81% | ~85 |
| pi3 | worker | 4 | 800MB | 7% | 65% | 13 |
| pi51 | worker | 4 | 8GB | 18% | 64% | 38 |
## Recommended Approach
### Option A: Add PreferNoSchedule Taint (Recommended)
Add a soft taint to pi50 that tells the scheduler to prefer other nodes for new workloads, while allowing existing pods to remain.
```bash
kubectl taint nodes pi50 node-role.kubernetes.io/control-plane=:PreferNoSchedule
```
**Pros:**
- Non-disruptive - existing pods continue running
- New pods will prefer pi51/pi3
- Gradual rebalancing as pods are recreated
- Easy to remove if needed
**Cons:**
- Won't immediately reduce load
- Existing pods stay where they are
### Option B: Move Heavy Workloads Immediately
Identify and relocate the heaviest workloads from pi50 to pi51:
**Top CPU consumers on pi50:**
1. ArgoCD application-controller (157m CPU, 364Mi) - should stay (manages cluster)
2. Longhorn instance-manager (139m CPU, 707Mi) - must stay (storage)
3. ai-stack workloads (ollama, litellm, open-webui, etc.)
**Candidates to move to pi51:**
- `ai-stack/ollama` - can run on any node with storage
- `ai-stack/litellm` - stateless, can move
- `ai-stack/open-webui` - can move
- `ai-stack/claude-code`, `codex`, `gemini-cli`, `opencode` - can move
- `minio` - can move (uses PVC)
- `pihole2` - can move
**Method**: Add `nodeSelector` or `nodeAffinity` to deployments:
```yaml
spec:
template:
spec:
nodeSelector:
kubernetes.io/hostname: pi51
```
Or use anti-affinity to avoid pi50:
```yaml
spec:
template:
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: DoesNotExist
```
### Option C: Combined Approach (Best)
1. Add `PreferNoSchedule` taint to pi50 (prevents future imbalance)
2. Immediately move 2-3 heaviest moveable workloads to pi51
3. Let remaining workloads naturally migrate over time
## Execution Steps
### Step 1: Add taint to pi50
```bash
kubectl taint nodes pi50 node-role.kubernetes.io/control-plane=:PreferNoSchedule
```
### Step 2: Verify existing workloads still running
```bash
kubectl get pods -A -o wide --field-selector spec.nodeName=pi50 | grep -v Running
```
### Step 3: Move heavy ai-stack workloads (optional, for immediate relief)
For each deployment to move, patch with node anti-affinity or selector:
```bash
kubectl patch deployment -n ai-stack ollama --type=merge -p '{"spec":{"template":{"spec":{"nodeSelector":{"kubernetes.io/hostname":"pi51"}}}}}'
```
Or delete pods to trigger rescheduling (if PreferNoSchedule taint is set):
```bash
kubectl delete pod -n ai-stack <pod-name>
```
### Step 4: Monitor
```bash
kubectl top nodes
```
## Workloads That MUST Stay on pi50
- `kube-system/*` - Core cluster components
- `longhorn-system/csi-*` - Storage controllers
- `longhorn-system/longhorn-driver-deployer` - Storage management
- `local-path-storage/*` - Local storage provisioner
## Expected Outcome
After changes:
- pi50: ~50-60% CPU, ~65-70% memory (control plane + essential services)
- pi51: ~40-50% CPU, ~70-75% memory (absorbs application workloads)
- New pods prefer pi51 automatically
## Risks
- **Low**: PreferNoSchedule is a soft taint - pods with tolerations can still schedule on pi50
- **Low**: Moving workloads may cause brief service interruption during pod recreation
- **Note**: pi3 cannot absorb much due to 800MB RAM limit
## Selected Approach: A + B (Combined)
User selected combined approach:
1. Add `PreferNoSchedule` taint to pi50
2. Move heavy ai-stack workloads to pi51 immediately
## Execution Plan
### Phase 1: Add Taint
```bash
kubectl taint nodes pi50 node-role.kubernetes.io/control-plane=:PreferNoSchedule
```
### Phase 2: Move Heavy Workloads to pi51
Target workloads (heaviest on pi50):
- `ai-stack/ollama`
- `ai-stack/open-webui`
- `ai-stack/litellm`
- `ai-stack/claude-code`
- `ai-stack/codex`
- `ai-stack/gemini-cli`
- `ai-stack/opencode`
- `ai-stack/searxng`
- `minio/minio`
Method: Delete pods to trigger rescheduling (taint will push them to pi51):
```bash
kubectl delete pod -n ai-stack -l app.kubernetes.io/name=ollama
# etc for each workload
```
### Phase 3: Verify
```bash
kubectl top nodes
kubectl get pods -A -o wide | grep -E "ollama|open-webui|litellm"
```

View File

@@ -5,7 +5,7 @@
"repo": "anthropics/claude-plugins-official" "repo": "anthropics/claude-plugins-official"
}, },
"installLocation": "/home/will/.claude/plugins/marketplaces/claude-plugins-official", "installLocation": "/home/will/.claude/plugins/marketplaces/claude-plugins-official",
"lastUpdated": "2026-01-05T07:22:46.460Z" "lastUpdated": "2026-01-05T20:44:45.874Z"
}, },
"superpowers-marketplace": { "superpowers-marketplace": {
"source": { "source": {

View File

@@ -0,0 +1,27 @@
# Morning Report - Mon Jan 05, 2026
## 🌤 Weather
Overcast, 44°F (feels 41°F), rain likely—bring umbrella ☔
## 📧 Email
⚠️ Could not fetch emails: No module named 'pydantic_core._pydantic_core'
## 📅 Today
⚠️ Could not fetch calendar: No module named 'pydantic_core._pydantic_core'
## 📈 Stocks
CRWV $77.64 ▼2.1% | NVDA $187.84 ▼0.5% | MSFT $473.50 ▲0.1%
## 🖥 Infrastructure
K8s: 🟡 | Workstation: 🟢
└ K8s: 2 pods not running
## 📰 Tech News
• O-Ring Automation (Hacker News)
• Novo Nordisk launches Wegovy weight-loss pill in US, triggering price war (Hacker News)
• Refactoring Not on the backlog (Hacker News)
• It's hard to justify Tahoe icons (Lobsters)
• Databases in 2025: A Year in Review (Lobsters)
---
*Generated: 2026-01-05 12:44:47 PT*

View File

@@ -1,30 +1,27 @@
# Morning Report - Sun Jan 04, 2026 # Morning Report - Mon Jan 05, 2026
## 🌤 Weather ## 🌤 Weather
Seattle: 51°F, Partly cloudy | High 52° Low 43° Overcast, 44°F (feels 41°F), rain likely—bring umbrella ☔
## 📧 Email ## 📧 Email
15 unread ⚠️ Could not fetch emails: No module named 'pydantic_core._pydantic_core'
• Capital One | Quicks - Your requested balance summary
• Uber Receipts - [Personal] Your Saturday evening trip wi
• Experian - William, it's time to check your utiliza
• Experteer Search Age - William, we have 2 new opportunities for
• Chase - You can start your mortgage preapproval
## 📅 Today ## 📅 Today
• 2:00 PM - Seattle Saturday (SAM + QED + Lecosho) (5h) ⚠️ Could not fetch calendar: No module named 'pydantic_core._pydantic_core'
## 📈 Stocks ## 📈 Stocks
CRWV $79.32 +10.8% ▲ NVDA $188.85 +1.3% ▲ MSFT $472.94 -2.2% ▼ CRWV $77.64 ▼2.1% | NVDA $187.84 ▼0.5% | MSFT $473.50 ▲0.1%
## 🖥 Infrastructure ## 🖥 Infrastructure
K8s: 🟢 | Workstation: 🟢 K8s: 🟡 | Workstation: 🟢
└ K8s: 2 pods not running
## 📰 Tech News ## 📰 Tech News
• C-Sentinel: System prober that captures "system fingerprints... (Hacker News) • O-Ring Automation (Hacker News)
• Show HN: An LLM-Powered PCB Schematic Checker (Major Update) (Hacker News) • Novo Nordisk launches Wegovy weight-loss pill in US, triggering price war (Hacker News)
• Can I finally start using Wayland in 2026? (Lobsters) • Refactoring Not on the backlog (Hacker News)
• Saying goodbye to the servers at our physical datacenter (Lobsters) • It's hard to justify Tahoe icons (Lobsters)
• Databases in 2025: A Year in Review (Lobsters)
--- ---
*Generated: 2026-01-04 14:40:48 PT* *Generated: 2026-01-05 12:44:47 PT*

View File

@@ -29,17 +29,26 @@ TOKEN_PATH = Path.home() / ".gmail-mcp/tasks_token.json"
CREDS_PATH = Path.home() / ".gmail-mcp/credentials.json" CREDS_PATH = Path.home() / ".gmail-mcp/credentials.json"
def get_credentials(): def get_credentials(force_reauth: bool = False):
"""Get or refresh Google credentials for Tasks API.""" """Get or refresh Google credentials for Tasks API.
If ``force_reauth`` is True, skip refresh and run a new OAuth flow.
This is useful when a stored refresh token is bound to a different
scope set and refresh keeps failing with invalid_scope.
"""
creds = None creds = None
if TOKEN_PATH.exists(): if TOKEN_PATH.exists():
creds = Credentials.from_authorized_user_file(str(TOKEN_PATH), SCOPES) creds = Credentials.from_authorized_user_file(str(TOKEN_PATH), SCOPES)
if not creds or not creds.valid: if not creds or not creds.valid or force_reauth:
if creds and creds.expired and creds.refresh_token: if not force_reauth and creds and creds.expired and creds.refresh_token:
creds.refresh(Request()) try:
else: creds.refresh(Request())
except Exception:
creds = None
if not creds or not creds.valid:
if not CREDS_PATH.exists(): if not CREDS_PATH.exists():
return None return None
flow = InstalledAppFlow.from_client_secrets_file(str(CREDS_PATH), SCOPES) flow = InstalledAppFlow.from_client_secrets_file(str(CREDS_PATH), SCOPES)
@@ -168,7 +177,9 @@ if __name__ == "__main__":
if "--auth" in sys.argv: if "--auth" in sys.argv:
print("Starting Tasks API authentication...") print("Starting Tasks API authentication...")
creds = get_credentials() # Force a fresh OAuth flow so we can recover from invalid_scope
# errors caused by stale refresh tokens.
creds = get_credentials(force_reauth=True)
if creds: if creds:
print(f"✅ Authentication successful! Token saved to {TOKEN_PATH}") print(f"✅ Authentication successful! Token saved to {TOKEN_PATH}")
else: else:

View File

@@ -9,6 +9,15 @@
"glob": "**/*.md", "glob": "**/*.md",
"version": "main", "version": "main",
"last_indexed": "2026-01-04T23:27:40.175671" "last_indexed": "2026-01-04T23:27:40.175671"
},
{
"id": "argocd",
"name": "ArgoCD Documentation",
"type": "git",
"url": "https://github.com/argoproj/argo-cd.git",
"path": "docs/",
"glob": "**/*.md",
"last_indexed": "2026-01-05T01:04:53.930441"
} }
] ]
} }

File diff suppressed because one or more lines are too long

View File

@@ -224,6 +224,48 @@
"ended": null, "ended": null,
"summarized": false, "summarized": false,
"topics": [] "topics": []
},
{
"id": "2026-01-04_23-56-03",
"started": "2026-01-04T23:56:03-08:00",
"ended": null,
"summarized": false,
"topics": []
},
{
"id": "2026-01-05_00-28-11",
"started": "2026-01-05T00:28:11-08:00",
"ended": null,
"summarized": false,
"topics": []
},
{
"id": "2026-01-05_00-50-41",
"started": "2026-01-05T00:50:41-08:00",
"ended": null,
"summarized": false,
"topics": []
},
{
"id": "2026-01-05_00-51-05",
"started": "2026-01-05T00:51:05-08:00",
"ended": null,
"summarized": false,
"topics": []
},
{
"id": "2026-01-05_12-10-41",
"started": "2026-01-05T12:10:41-08:00",
"ended": null,
"summarized": false,
"topics": []
},
{
"id": "2026-01-05_12-11-40",
"started": "2026-01-05T12:11:40-08:00",
"ended": null,
"summarized": false,
"topics": []
} }
] ]
} }