Files
splash/INSTRUCTIONS.md
William Valentin 334aa698fa feat: implement news aggregator API with conventional commits
- Add FastAPI application with complete router structure
- Implement search, articles, ask, feedback, and health endpoints
- Add comprehensive Pydantic schemas for API requests/responses
- Include stub service implementations for all business logic
- Add full test suite with pytest-asyncio integration
- Configure conventional commits enforcement via git hooks
- Add project documentation and contribution guidelines
- Support both OpenAI and Gemini LLM integration options
2025-11-02 23:11:39 -08:00

720 lines
26 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# SPEC-1 Classy Perplexitystyle News Aggregator (Raspberry Pi 5 K8s)
## Background
You want a Perplexitystyle web app that aggregates news from a defined pool of reference websites and presents results in a classy, attractive, highly responsive UI. The target runtime is a Raspberry Pi 5 Kubernetes cluster, so the system must be lightweight, ARM64friendly, and resilient to node churn or SDcard fragility. The product should feel like a modern AI assistant for news discovery: fast search, crisp summaries, clear source attributions, and mobilefirst ergonomics.
Initial working assumptions (to be confirmed):
* Content sources are a curated list of reputable outlets and blogs that permit aggregation with proper linking and snippetlength quoting.
* We will index headlines, metadata, and short excerpts; fulltext storage will be minimized or avoided unless licensed.
* The app will support semantic search + conversational Q\&A over the indexed corpus, with citations to original articles.
* Realtime(ish) freshness target: new articles discoverable within 25 minutes of publication.
* UI aims to echo Perplexitys clean card layout, with source badges, inline citations, and a composer panel for queries.
* Deployment must fit on 24 ARM64 nodes, using lightweight containers and a small replicated datastore.
## Requirements
**Scope for MVP**: Start with **Reuters** as the single source. Use official **RSS/Atom feeds and daily sitemaps** when available; gracefully fall back to HTML scraping for sections without feeds, storing only metadata/snippets with links. Freshness target 25 minutes. UI mirrors Perplexitys card+chat layout with inline citations.
### MoSCoW
**Musthave**
* Aggregate from Reuters via RSS/Atom + sitemaps; fallback HTML scraper with robots.txt compliance toggle.
* ARM64ready containers deployable on Raspberry Pi 5 K8s (k3s or MicroK8s).
* Ingest pipeline with deduplication, canonical URL normalization, and ratelimit/backoff.
* Index headlines, authors, timestamps, topics, short excerpt (<= 320 chars), and source URL.
* Fulltext search over stored fields; semantic search embeddings over titles+snippets.
* Summarization and onpage Q\&A with **clear citations** to source URLs.
* Classy, responsive UI with Perplexitystyle query composer, results cards, and source badges.
* Observability: structured logs, basic metrics (ingest latency, queue depth, 95p response), and alerting.
* Legal safety rails: configurable snippet length, perdomain robots policy, and killswitch per source.
**Shouldhave**
* Topic taxonomy and tags (World, Business, Tech, etc.).
* Incremental sitemap polling (by date) + changelist RSS polling with jitter to avoid burst load.
* Reader mode extraction (readabilitystyle) used **only for summarization** in memory, not stored.
* Caching layer (HTTP + summary cache) to keep Raspberry Pi costs low.
* Multinode HA for index and queue; rolling updates.
**Couldhave**
* User accounts for saved searches and daily digests.
* Multisource expansion via declarative YAML for new sites.
* Relatedstory clustering and timeline views.
* Basic mobile PWA installability and offline readlater for snippets.
**Wonthave (MVP)**
* Paywalled content bypassing or fulltext storage of copyrighted articles.
* Personalized recommendations or email digests.
* Editorial curation tooling beyond tags and pinning.
## Method
### Highlevel architecture
```plantuml
@startuml
skinparam componentStyle rectangle
skinparam shadowing false
skinparam ArrowColor #888
skinparam DefaultFontName Inter
rectangle "k0s Cluster (ARM64 Raspberry Pi 5)" as K8S {
node "Namespace: news" as NS {
[Ingest Scheduler]
(CronJobs)
[Feed+Sitemap Poller]
(FastAPI Worker)
[HTML Scraper]
(Worker, Trafilatura)
[Normalizer/Dedupe]
(Worker)
[Embedder]
(Worker -> OpenAI embeddings/Gemini flash)
[Summarizer]
(Worker -> OpenAI gpt-4o-mini/Gemini pro)
database "PostgreSQL + pgvector" as PG
[Redis]
(Cache + Queue)
[API Gateway]
(FastAPI)
[Web UI]
(Next.js, Tailwind, shadcn)
}
}
[Feed+Sitemap Poller] --> [HTML Scraper]
[HTML Scraper] --> [Normalizer/Dedupe]
[Normalizer/Dedupe] --> PG
[Embedder] --> PG
[Summarizer] --> PG
[Ingest Scheduler] --> [Feed+Sitemap Poller]
[Embedder] --> [OpenAI Embeddings API/Gemini API]
[Summarizer] --> [OpenAI Chat Completions/Gemini API]
[API Gateway] --> PG
[API Gateway] --> Redis
[Web UI] --> [API Gateway]
@enduml
```
**Why these choices (MVP):**
* **Source**: Start with **Reuters** using news sitemaps (with pagination parameters) and RSS; where feeds dont exist, scrape respectfully with robots awareness.
* **Storage**: **PostgreSQL + pgvector** keeps the stack compact (one DB for metadata, text search, and vectors). Postgres fulltext covers keyword search; pgvector powers semantic search.
* **Workers**: Python **FastAPI** workers using **Trafilatura** for robust article extraction and metadata parsing. **Redis** as the lightweight queue/cache (Dramatiq or RQ).
* **Summaries/Q\&A**: Ondemand summaries and answer synthesis via **gpt4omini or Gemini pro** with **inline citations**. Embeddings via **textembedding3small or Gemini flash**. Both accessed through API keys/secrets in Kubernetes.
* **UI**: **Next.js 14 App Router**, Tailwind + shadcn for a Perplexitystyle, lowlatency interface.
* **k0s**: ARM64friendly. Use **nginxingress** for HTTP routing, with optional **HAProxy Ingress** for TCP/advanced policies.
### Data model (PostgreSQL)
```sql
-- Sources (static for MVP)
CREATE TABLE sources (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL UNIQUE, -- e.g., 'Reuters'
base_url TEXT NOT NULL, -- e.g., https://www.reuters.com
rss_urls TEXT[] NOT NULL DEFAULT '{}',
sitemap_urls TEXT[] NOT NULL DEFAULT '{}',
robots_txt TEXT,
enabled BOOLEAN NOT NULL DEFAULT true
);
-- Raw fetch jobs (observability + retries)
CREATE TABLE fetch_jobs (
id BIGSERIAL PRIMARY KEY,
source_id INT REFERENCES sources(id),
url TEXT NOT NULL,
kind TEXT NOT NULL CHECK (kind IN ('rss','sitemap','article')),
status TEXT NOT NULL CHECK (status IN ('queued','fetched','parsed','failed')),
http_status INT,
etag TEXT,
last_modified TIMESTAMPTZ,
attempts INT NOT NULL DEFAULT 0,
error TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX ON fetch_jobs (status, created_at);
-- Canonical articles (no copyrighted full text stored)
CREATE TABLE articles (
id BIGSERIAL PRIMARY KEY,
source_id INT REFERENCES sources(id) NOT NULL,
canonical_url TEXT NOT NULL,
url_hash BYTEA NOT NULL, -- SHA-256 of canonical_url
title TEXT NOT NULL,
author TEXT,
category TEXT, -- World, Business, Tech, etc.
published_at TIMESTAMPTZ,
fetched_at TIMESTAMPTZ NOT NULL DEFAULT now(),
snippet TEXT, -- <= 320 chars, from feed/lede
summary TEXT, -- model-generated abstract
image_url TEXT,
language TEXT DEFAULT 'en',
UNIQUE (source_id, url_hash)
);
CREATE INDEX ON articles (published_at DESC);
CREATE INDEX ON articles USING GIN (to_tsvector('english', coalesce(title,'') || ' ' || coalesce(snippet,'')));
-- Embeddings for semantic search (title+snippet)
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE article_embeddings (
article_id BIGINT PRIMARY KEY REFERENCES articles(id) ON DELETE CASCADE,
embedding vector(1536) -- dimension for text-embedding-3-small or Gemini flash
);
CREATE INDEX ON article_embeddings USING ivfflat (embedding vector_cosine_ops);
-- Tags and mapping (optional but handy)
CREATE TABLE tags (
id SERIAL PRIMARY KEY,
name TEXT UNIQUE
);
CREATE TABLE article_tags (
article_id BIGINT REFERENCES articles(id) ON DELETE CASCADE,
tag_id INT REFERENCES tags(id) ON DELETE CASCADE,
PRIMARY KEY (article_id, tag_id)
);
```
### Ingestion flow
1. **Discovery**
* Poll **RSS/Atom** endpoints with ETag/LastModified to minimize bandwidth.
* Poll **news sitemaps** using incremental parameters (e.g., `from=` offsets when supported). Maintain perendpoint cursors.
* For sections without feeds, enqueue **HTML pages** discovered from site index pages (ratelimited) and respect `robots.txt` (configurable).
2. **Fetch & Extract**
* HTTP client with retry + exponential backoff and perhost concurrency caps (e.g., 24). Respect `Cache-Control` where present.
* Use **Trafilatura** with `favor_precision=true` to extract main content for **inmemory summarization only**; do not persist full text.
* Generate a **canonical URL** (resolve redirects, strip tracking params) and compute `url_hash`.
3. **Normalize & Deduplicate**
* If `(source_id, url_hash)` exists, skip insert; else create `articles` row with metadata and **snippet** (<=320 chars).
* Classify category using rulebased hints (URL path, RSS category) with a fallback lightweight classifier.
4. **Summaries & Embeddings**
* Create a short **summary** (6090 words, neutral tone) with inline citation marker `[1]` → canonical URL.
* Compute **embedding** on `(title + "
" + snippet)` and upsert into `article_embeddings`.
5. **Indexing & Cache**
* Postgres GIN index supports keyword search; pgvector handles ANN semantic search.
* Cache hot queries and summaries in Redis for 515 minutes.
### API design (FastAPI)
* `GET /v1/search?q=&mode=hybrid&page=` — Hybrid search (keyword + vector rerank), returns cards with title, snippet, badges, and citations.
* `GET /v1/articles/{id}` — Metadata + summary.
* `POST /v1/ask` — Conversational answer over topk retrieved articles, always with citations.
* `POST /v1/feedback` — Thumbs up/down and optional comment.
### UI flows (Next.js 14)
* **Home**: Center composer, query suggestions, trending topics.
* **Results**: Perplexitystyle answer at top with source chips; below, cards for each cited article; sticky composer for followups.
* **Interactions**: Cmd/CtrlK global search, `?` keyboard help, skeleton loaders, optimistic UI.
### Kubernetes (k0s) deployment sketch
* **Namespaces**: `news`, `news-observe`.
* **Ingress**: `nginx-ingress` for HTTPS; optional parallel **HAProxy Ingress** for TCP/advanced use. Certs via certmanager + DNS01 or HTTP01.
* **Deployments** (ARM64 images):
* `api` (FastAPI, Uvicorn Gunicorn): 2 replicas, HPA on CPU 60% & p95 latency SLI.
* `web` (Next.js): 2 replicas, static export (optional) behind Node adapter.
* `worker` (ingest/summarize/embed): 24 replicas, separate queues for `poll`, `scrape`, `summ`, `embed`.
* `postgres` (Bitnami ARM64) with persistent volume; enable `pgvector` extension.
* `redis` (Bitnami ARM64) for cache/queue.
* **RBAC/Secrets**: Kubernetes Secrets for API keys; service accounts per deployment.
* **Resources (starting)**: api 200m/512Mi; web 100m/256Mi; worker 300m/1Gi; redis 50m/256Mi; postgres 250m/2Gi.
* **Autoscaling**: HPA + VPA recommendations; cluster metrics via kubemetricsserver.
### Ranking & answer synthesis
* **Hybrid search**: BM25 (Postgres fulltext) for recall → take top 50; compute cosine similarity on vectors → rerank → top 8.
* **Answer**: Prompt model with the top 6 snippets + titles and URLs; enforce **citation after each sentence** where evidence exists. Refuse to answer beyond source material.
### Rate limiting & ethics
* Persource QPS caps (e.g., 0.51 rps) and adaptive backoff.
* Honor robots.txt by default; switchable per your policy. Always link prominently to original.
* Snippets limited; no storage of full article text.
## Implementation
### 0) Repo layout
```
news-agg/
apps/
api/ # FastAPI (Python 3.11)
web/ # Next.js 14 UI
workers/ # poll/scrape/summarize/embed (FastAPI tasks + RQ/Dramatiq)
deploy/
base/ # K8s Kustomize base (namespaces, RBAC, NetworkPolicies)
overlays/
pi-prod/
kustomization.yaml
postgres.yaml
redis.yaml
api.yaml
web.yaml
workers.yaml
cron-poller.yaml
ingress-nginx.yaml
ingress-haproxy.yaml (optional)
secrets.example.yaml
ops/
helm-values/
bitnami-postgresql.yaml
bitnami-redis.yaml
scripts/
build.sh # multi-arch docker buildx
db_migrate.sql # tables + pgvector
```
### 1) Container images (ARM64)
* **Python base**: `python:3.11-slim` + `uv`/`pip-tools`; compile wheels at build time.
* **Node**: `node:18-alpine` → `next build` then run with `node` or export static.
* Use **`docker buildx`** to produce `linux/arm64` images. Example:
```
docker buildx build --platform linux/arm64 -t registry/pi/news-api:0.1 -f apps/api/Dockerfile --push .
```
**apps/api/Dockerfile** (snippet)
```Dockerfile
FROM python:3.11-slim
RUN apt-get update && apt-get install -y build-essential libpq-dev && rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY apps/api/pyproject.toml apps/api/uv.lock ./
RUN pip install -U pip && pip install uv
RUN uv pip install --system -r requirements.txt || true
COPY apps/api/ .
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]
```
### 2) k0s cluster prep (once)
* Install **nginxingress** and (optionally) **HAProxy Ingress** via manifests/Helm.
* Install **cert-manager** for TLS if exposing publicly.
* Add **metricsserver** for HPA and **KEDA** (optional) for queue-based scaling.
### 3) Datastores
**PostgreSQL (Bitnami, pgvector)**
```yaml
# deploy/overlays/pi-prod/postgres.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata: { name: pgdata, namespace: news }
spec:
accessModes: ["ReadWriteOnce"]
resources: { requests: { storage: 20Gi } }
---
apiVersion: v1
kind: ConfigMap
metadata: { name: pg-init, namespace: news }
data:
00-init.sql: |
CREATE EXTENSION IF NOT EXISTS vector;
-- migrations applied by apps on startup too
---
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata: { name: pg, namespace: kube-system }
spec:
chart: oci://registry-1.docker.io/bitnamicharts/postgresql
targetNamespace: news
version: 15.x.x
valuesContent: |
image:
repository: bitnami/postgresql
tag: 15-debian-12
primary:
extraVolumes:
- name: pg-init
configMap: { name: pg-init }
extraVolumeMounts:
- name: pg-init
mountPath: /docker-entrypoint-initdb.d
persistence:
existingClaim: pgdata
auth:
username: news
password: ${PG_PASSWORD}
database: news
```
**Redis (Bitnami)**
```yaml
# deploy/overlays/pi-prod/redis.yaml
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata: { name: redis, namespace: kube-system }
spec:
chart: oci://registry-1.docker.io/bitnamicharts/redis
targetNamespace: news
version: 18.x.x
valuesContent: |
architecture: standalone
auth:
enabled: false
```
### 4) Secrets & Config
```yaml
# deploy/overlays/pi-prod/secrets.example.yaml (copy to secrets.yaml and fill)
apiVersion: v1
kind: Secret
metadata: { name: app-secrets, namespace: news }
type: Opaque
data:
OPENAI_API_KEY: <base64>
GEMINI_API_KEY: <base64>
APP_SIGNING_KEY: <base64>
---
apiVersion: v1
kind: ConfigMap
metadata: { name: app-config, namespace: news }
data:
SNIPPET_MAX: "320"
SOURCES: |
- name: Reuters
base_url: https://www.reuters.com
rss:
- https://www.reuters.com/rss/worldNews
sitemaps:
- https://www.reuters.com/sitemap_news.xml
robots_policy: honor
RANKING: "hybrid"
```
### 5) Workers (poll, scrape, summarize, embed)
```yaml
# deploy/overlays/pi-prod/workers.yaml
apiVersion: apps/v1
kind: Deployment
metadata: { name: workers, namespace: news }
spec:
replicas: 3
selector: { matchLabels: { app: workers } }
template:
metadata: { labels: { app: workers } }
spec:
containers:
- name: workers
image: registry/pi/news-workers:0.1
envFrom:
- secretRef: { name: app-secrets }
- configMapRef: { name: app-config }
env:
- { name: REDIS_URL, value: redis://redis-master.news.svc.cluster.local:6379/0 }
- { name: DATABASE_URL, value: postgresql://news:$(PG_PASSWORD)@pg-postgresql.news.svc.cluster.local:5432/news }
resources:
requests: { cpu: "300m", memory: "1Gi" }
limits: { cpu: "900m", memory: "2Gi" }
livenessProbe: { httpGet: { path: /healthz, port: 8080 }, initialDelaySeconds: 15 }
readinessProbe:{ httpGet: { path: /readyz, port: 8080 }, initialDelaySeconds: 5 }
```
**Cron: feed/sitemap polling**
```yaml
apiVersion: batch/v1
kind: CronJob
metadata: { name: poller, namespace: news }
spec:
schedule: "*/2 * * * *" # every 2 minutes
jobTemplate:
spec:
template:
spec:
restartPolicy: OnFailure
containers:
- name: poll
image: registry/pi/news-workers:0.1
args: ["poll"]
envFrom:
- secretRef: { name: app-secrets }
- configMapRef: { name: app-config }
```
### 6) API service (FastAPI)
```yaml
# deploy/overlays/pi-prod/api.yaml
apiVersion: apps/v1
kind: Deployment
metadata: { name: api, namespace: news }
spec:
replicas: 2
selector: { matchLabels: { app: api } }
template:
metadata: { labels: { app: api } }
spec:
containers:
- name: api
image: registry/pi/news-api:0.1
ports: [{ containerPort: 8080 }]
envFrom:
- secretRef: { name: app-secrets }
- configMapRef: { name: app-config }
env:
- { name: REDIS_URL, value: redis://redis-master.news.svc.cluster.local:6379/0 }
- { name: DATABASE_URL, value: postgresql://news:$(PG_PASSWORD)@pg-postgresql.news.svc.cluster.local:5432/news }
resources:
requests: { cpu: "200m", memory: "512Mi" }
limits: { cpu: "600m", memory: "1Gi" }
---
apiVersion: v1
kind: Service
metadata: { name: api, namespace: news }
spec:
selector: { app: api }
ports:
- name: http
port: 80
targetPort: 8080
```
**FastAPI search (sketch)**
```python
# apps/api/search.py
from pgvector.psycopg import register_vector
import psycopg, numpy as np
EMBED_DIM = 1536
def hybrid_search(conn, q, k=8):
with conn.cursor() as cur:
# 1) Embedding
v = embed(q) # call OpenAI embeddings or Gemini flash
# 2) Keyword recall
cur.execute("""
SELECT id, title, snippet, canonical_url,
ts_rank(to_tsvector('english', coalesce(title,'')||' '||coalesce(snippet,'')), plainto_tsquery(%s)) AS rank
FROM articles
WHERE to_tsvector('english', coalesce(title,'')||' '||coalesce(snippet,'')) @@ plainto_tsquery(%s)
ORDER BY rank DESC
LIMIT 50
""", (q, q))
rows = cur.fetchall()
ids = [r[0] for r in rows] or [-1]
# 3) Vector rerank
cur.execute("""
SELECT a.id, a.title, a.snippet, a.canonical_url,
1 - (e.embedding <=> %s::vector) AS sim
FROM articles a
JOIN article_embeddings e ON e.article_id = a.id
WHERE a.id = ANY(%s)
ORDER BY sim DESC LIMIT %s
""", (np.array(v), ids, k))
return cur.fetchall()
```
### 7) Web UI (Next.js 14)
* App Router, Tailwind, shadcn/ui. Server actions call API.
* Components: `Composer`, `AnswerBox` (with sentence-level citations), `ResultCard`, `SourceChip`.
* Add **PWA** manifest + basic offline cache for shell.
### 8) Ingress (nginx primary, HAProxy optional)
```yaml
# deploy/overlays/pi-prod/ingress-nginx.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: news
namespace: news
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/proxy-body-size: "1m"
spec:
tls:
- hosts: [news.local]
secretName: news-tls
rules:
- host: news.local
http:
paths:
- path: /
pathType: Prefix
backend: { service: { name: web, port: { number: 80 } } }
- path: /v1
pathType: Prefix
backend: { service: { name: api, port: { number: 80 } } }
```
### 9) Observability
* **Logging**: JSON logs via `structlog` (API/workers), `stdout` aggregated by k0s.
* **Metrics**: Prometheus scraping (use `prometheus-fastapi-instrumentator`), Grafana dashboards.
* **Tracing**: OpenTelemetry SDK exporting to Tempo/OTLP (optional).
* SLOs: p95 search < 600ms (warm); ingest freshness p95 < 5 min.
### 10) CI/CD (GitHub Actions)
* Build multi-arch images with `setup-buildx-action`, push to your registry.
* Deploy via `kubectl` or ArgoCD (optional). Gate with manual approval.
### 11) Prompts & safety rails
* **Summary prompt**: 6090 words, neutral tone, forbid speculation, 12 citations with URLs.
* **Answer prompt**: Use only retrieved snippets; every sentence claims must cite `[n]`. If insufficient evidence, say so.
* **Guardrails**: Max 6 articles per answer; truncate inputs to token budget.
## Gemini LLM Integration
As an alternative to OpenAI models, this project supports Google's Gemini LLM for both embeddings and conversational tasks:
### Available Models
- **gemini-2.5-flash**: Lightweight model optimized for fast responses and high throughput
- **gemini-2.5-pro**: Advanced "thinking" model with enhanced reasoning capabilities
### Command Usage
Use the following commands to interact with Gemini models:
```bash
# For fast, lightweight responses (embeddings, quick summaries)
gemini --model gemini-2.5-flash -p "<PROMPT>"
# For complex reasoning and detailed analysis (conversational answers)
gemini --model gemini-2.5-pro -p "<PROMPT>"
```
### Integration Notes
- Gemini models can be used as drop-in replacements for OpenAI equivalents
- Flash model recommended for embeddings worker (text-embedding-3-small equivalent)
- Pro model recommended for summarizer worker (gpt-4o-mini equivalent)
- Configure via GEMINI_API_KEY in Kubernetes secrets alongside OPENAI_API_KEY
- Network policies should allow egress to generativelanguage.googleapis.com
### 12) Performance knobs (Raspberry Pi friendly)
* Enable HTTP caching (ETag/IfModifiedSince).
* Redis cache TTL 10m for hot queries.
* Perhost concurrency: 2 (scraper); global QPS: 0.51 for Reuters.
* Use gzip/deflate when fetching; strip images when scraping.
### 13) Data retention
* Keep `articles` 30 days rolling (configurable). Older rows archived to `articles_archive` without embeddings.
### 14) Security
* NetworkPolicies: only API/worker → DB/Redis; web → API; deny egress by default except OpenAI and Gemini domains (api.openai.com, generativelanguage.googleapis.com).
* Secrets from Kubernetes; rotate quarterly. Readonly service accounts for web. Include both OPENAI_API_KEY and GEMINI_API_KEY in secret management.
* TLS everywhere; CSP headers on web.
## Milestones
**MVP timeline: 2 weeks (LAN only, no TLS)**
### Week 1 — Foundations & ingest
* **Day 12**: Cluster prep (k0s), namespaces, nginx Ingress (HTTP only), metricsserver. Registry access + buildx pipeline.
* **Day 3**: Postgres (pgvector) + Redis live; migrations applied.
* **Day 4**: Workers scaffolded (poll, scrape) with Reuters RSS + sitemap pollers; ETag/LastModified implemented; robots policy set to *honor*.
* **Day 5**: Normalizer/dedupe; article schema writes; minimal admin page to view ingest logs.
**Exit criteria**: Reuters articles flowing into DB with title/snippet/category/published\_at; p95 freshness under 10 min.
### Week 2 — Search, summaries, UI polish
* **Day 6**: Embeddings worker + index (pgvector ivfflat). Hybrid search in API.
* **Day 7**: Summarizer worker; store 6090 word summaries; cache.
* **Day 8**: Next.js UI (composer, answer box, cards, source chips). Basic keyboard nav.
* **Day 9**: Observability: Prometheus scrape + Grafana dashboard; SLOs wired.
* **Day 10**: Hardening (quotas, retries), data retention job; smoke tests; cut **MVP v0.1.0**.
**Exit criteria**: Query returns an answer with citations in < 800ms warm path; summaries stable; LAN users can search and read cited sources.
## Gathering Results
### KPIs (Primary)
* **Freshness (p95)**: time from article publication → available in search. Target: ≤ 5 minutes; stretch ≤ 2 minutes.
* **Answer Accuracy**: % of answer sentences that have at least one valid citation to the retrieved set. Target: ≥ 95%.
### KPIs (Secondary)
* **Coverage**: % of Reuters articles discovered vs. listed in sitemaps over last 24h. Target: ≥ 98%.
* **Latency (p95)**: query → first contentful paint (UI) and API response time. Targets: API ≤ 600ms warm; UI FCP ≤ 1.5s on LAN.
* **Stability**: worker error rate < 1%; scraper retry rate < 10%.
### Instrumentation
* **Prometheus metrics**
* `ingest_freshness_seconds{source=…}` (histogram)
* `ingest_discovered_total{kind= rss|sitemap|scrape}`
* `scrape_http_status_total{code=…}`
* `search_latency_seconds` (histogram)
* `answer_citation_coverage_ratio` (gauge)
* `worker_queue_depth{queue=…}`
* **Structured logs** (JSON): include `trace_id`, `job_id`, and normalized URL.
* **Dashboards (Grafana)**: Freshness, Search Latency, Coverage vs Sitemap, Error budget burn.
### Accuracy evaluation
* **Automatic**:
* Parse answer into sentences; verify each sentence has at least one citation.
* Check that citation URLs match the topk retrieved set and that snippets contain supporting tokens (simple ROUGElike overlap).
* Flag lowevidence sentences for review.
* **Human review** (12×/week):
* 50 sampled answers; label: correct / partially supported / unsupported / offtopic.
* Compute **hallucination rate** (unsupported sentences ÷ total) and track trend.
### Feedback loop
* UI **thumbs up/down** with optional comment saved to `feedback` table:
```sql
CREATE TABLE feedback (
id BIGSERIAL PRIMARY KEY,
query TEXT NOT NULL,
answer_id TEXT,
verdict TEXT CHECK (verdict IN ('up','down')),
comment TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
```
* Downvotes autocreate a JIRA/GitHub issue if `answer_citation_coverage_ratio < 0.9`.
### Experimentation
* **Prompt variants** A/B via header flag in API (e.g., `x-prompt=v2`).
* **Ranking tweaks**: switch BM25 weight vs vector weight; record NDCG\@10 on labeled queries.
### Postmortems & safety
* Blameless postmortem for any incident where hallucination rate > 10% in a day or freshness p95 > 10 min for >1h.
* Daily data retention job verified; no fulltext persists beyond inmemory summary context.