- Add FastAPI application with complete router structure - Implement search, articles, ask, feedback, and health endpoints - Add comprehensive Pydantic schemas for API requests/responses - Include stub service implementations for all business logic - Add full test suite with pytest-asyncio integration - Configure conventional commits enforcement via git hooks - Add project documentation and contribution guidelines - Support both OpenAI and Gemini LLM integration options
26 KiB
SPEC-1 – Classy Perplexity‑style News Aggregator (Raspberry Pi 5 K8s)
Background
You want a Perplexity‑style web app that aggregates news from a defined pool of reference websites and presents results in a classy, attractive, highly responsive UI. The target runtime is a Raspberry Pi 5 Kubernetes cluster, so the system must be lightweight, ARM64‑friendly, and resilient to node churn or SD‑card fragility. The product should feel like a modern AI assistant for news discovery: fast search, crisp summaries, clear source attributions, and mobile‑first ergonomics.
Initial working assumptions (to be confirmed):
- Content sources are a curated list of reputable outlets and blogs that permit aggregation with proper linking and snippet‑length quoting.
- We will index headlines, metadata, and short excerpts; full‑text storage will be minimized or avoided unless licensed.
- The app will support semantic search + conversational Q&A over the indexed corpus, with citations to original articles.
- Real‑time(ish) freshness target: new articles discoverable within 2–5 minutes of publication.
- UI aims to echo Perplexity’s clean card layout, with source badges, inline citations, and a composer panel for queries.
- Deployment must fit on 2–4 ARM64 nodes, using lightweight containers and a small replicated datastore.
Requirements
Scope for MVP: Start with Reuters as the single source. Use official RSS/Atom feeds and daily sitemaps when available; gracefully fall back to HTML scraping for sections without feeds, storing only metadata/snippets with links. Freshness target 2–5 minutes. UI mirrors Perplexity’s card+chat layout with inline citations.
MoSCoW
Must‑have
- Aggregate from Reuters via RSS/Atom + sitemaps; fallback HTML scraper with robots.txt compliance toggle.
- ARM64‑ready containers deployable on Raspberry Pi 5 K8s (k3s or MicroK8s).
- Ingest pipeline with deduplication, canonical URL normalization, and rate‑limit/backoff.
- Index headlines, authors, timestamps, topics, short excerpt (<= 320 chars), and source URL.
- Full‑text search over stored fields; semantic search embeddings over titles+snippets.
- Summarization and on‑page Q&A with clear citations to source URLs.
- Classy, responsive UI with Perplexity‑style query composer, results cards, and source badges.
- Observability: structured logs, basic metrics (ingest latency, queue depth, 95p response), and alerting.
- Legal safety rails: configurable snippet length, per‑domain robots policy, and kill‑switch per source.
Should‑have
- Topic taxonomy and tags (World, Business, Tech, etc.).
- Incremental sitemap polling (by date) + change‑list RSS polling with jitter to avoid burst load.
- Reader mode extraction (readability‑style) used only for summarization in memory, not stored.
- Caching layer (HTTP + summary cache) to keep Raspberry Pi costs low.
- Multi‑node HA for index and queue; rolling updates.
Could‑have
- User accounts for saved searches and daily digests.
- Multi‑source expansion via declarative YAML for new sites.
- Related‑story clustering and timeline views.
- Basic mobile PWA installability and offline read‑later for snippets.
Won’t‑have (MVP)
- Paywalled content bypassing or full‑text storage of copyrighted articles.
- Personalized recommendations or email digests.
- Editorial curation tooling beyond tags and pinning.
Method
High‑level architecture
@startuml
skinparam componentStyle rectangle
skinparam shadowing false
skinparam ArrowColor #888
skinparam DefaultFontName Inter
rectangle "k0s Cluster (ARM64 Raspberry Pi 5)" as K8S {
node "Namespace: news" as NS {
[Ingest Scheduler]
(CronJobs)
[Feed+Sitemap Poller]
(FastAPI Worker)
[HTML Scraper]
(Worker, Trafilatura)
[Normalizer/Dedupe]
(Worker)
[Embedder]
(Worker -> OpenAI embeddings/Gemini flash)
[Summarizer]
(Worker -> OpenAI gpt-4o-mini/Gemini pro)
database "PostgreSQL + pgvector" as PG
[Redis]
(Cache + Queue)
[API Gateway]
(FastAPI)
[Web UI]
(Next.js, Tailwind, shadcn)
}
}
[Feed+Sitemap Poller] --> [HTML Scraper]
[HTML Scraper] --> [Normalizer/Dedupe]
[Normalizer/Dedupe] --> PG
[Embedder] --> PG
[Summarizer] --> PG
[Ingest Scheduler] --> [Feed+Sitemap Poller]
[Embedder] --> [OpenAI Embeddings API/Gemini API]
[Summarizer] --> [OpenAI Chat Completions/Gemini API]
[API Gateway] --> PG
[API Gateway] --> Redis
[Web UI] --> [API Gateway]
@enduml
Why these choices (MVP):
- Source: Start with Reuters using news sitemaps (with pagination parameters) and RSS; where feeds don’t exist, scrape respectfully with robots awareness.
- Storage: PostgreSQL + pgvector keeps the stack compact (one DB for metadata, text search, and vectors). Postgres full‑text covers keyword search; pgvector powers semantic search.
- Workers: Python FastAPI workers using Trafilatura for robust article extraction and metadata parsing. Redis as the lightweight queue/cache (Dramatiq or RQ).
- Summaries/Q&A: On‑demand summaries and answer synthesis via gpt‑4o‑mini or Gemini pro with inline citations. Embeddings via text‑embedding‑3‑small or Gemini flash. Both accessed through API keys/secrets in Kubernetes.
- UI: Next.js 14 App Router, Tailwind + shadcn for a Perplexity‑style, low‑latency interface.
- k0s: ARM64‑friendly. Use nginx‑ingress for HTTP routing, with optional HAProxy Ingress for TCP/advanced policies.
Data model (PostgreSQL)
-- Sources (static for MVP)
CREATE TABLE sources (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL UNIQUE, -- e.g., 'Reuters'
base_url TEXT NOT NULL, -- e.g., https://www.reuters.com
rss_urls TEXT[] NOT NULL DEFAULT '{}',
sitemap_urls TEXT[] NOT NULL DEFAULT '{}',
robots_txt TEXT,
enabled BOOLEAN NOT NULL DEFAULT true
);
-- Raw fetch jobs (observability + retries)
CREATE TABLE fetch_jobs (
id BIGSERIAL PRIMARY KEY,
source_id INT REFERENCES sources(id),
url TEXT NOT NULL,
kind TEXT NOT NULL CHECK (kind IN ('rss','sitemap','article')),
status TEXT NOT NULL CHECK (status IN ('queued','fetched','parsed','failed')),
http_status INT,
etag TEXT,
last_modified TIMESTAMPTZ,
attempts INT NOT NULL DEFAULT 0,
error TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX ON fetch_jobs (status, created_at);
-- Canonical articles (no copyrighted full text stored)
CREATE TABLE articles (
id BIGSERIAL PRIMARY KEY,
source_id INT REFERENCES sources(id) NOT NULL,
canonical_url TEXT NOT NULL,
url_hash BYTEA NOT NULL, -- SHA-256 of canonical_url
title TEXT NOT NULL,
author TEXT,
category TEXT, -- World, Business, Tech, etc.
published_at TIMESTAMPTZ,
fetched_at TIMESTAMPTZ NOT NULL DEFAULT now(),
snippet TEXT, -- <= 320 chars, from feed/lede
summary TEXT, -- model-generated abstract
image_url TEXT,
language TEXT DEFAULT 'en',
UNIQUE (source_id, url_hash)
);
CREATE INDEX ON articles (published_at DESC);
CREATE INDEX ON articles USING GIN (to_tsvector('english', coalesce(title,'') || ' ' || coalesce(snippet,'')));
-- Embeddings for semantic search (title+snippet)
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE article_embeddings (
article_id BIGINT PRIMARY KEY REFERENCES articles(id) ON DELETE CASCADE,
embedding vector(1536) -- dimension for text-embedding-3-small or Gemini flash
);
CREATE INDEX ON article_embeddings USING ivfflat (embedding vector_cosine_ops);
-- Tags and mapping (optional but handy)
CREATE TABLE tags (
id SERIAL PRIMARY KEY,
name TEXT UNIQUE
);
CREATE TABLE article_tags (
article_id BIGINT REFERENCES articles(id) ON DELETE CASCADE,
tag_id INT REFERENCES tags(id) ON DELETE CASCADE,
PRIMARY KEY (article_id, tag_id)
);
Ingestion flow
- Discovery
- Poll RSS/Atom endpoints with ETag/Last‑Modified to minimize bandwidth.
- Poll news sitemaps using incremental parameters (e.g.,
from=offsets when supported). Maintain per‑endpoint cursors. - For sections without feeds, enqueue HTML pages discovered from site index pages (rate‑limited) and respect
robots.txt(configurable).
- Fetch & Extract
- HTTP client with retry + exponential backoff and per‑host concurrency caps (e.g., 2–4). Respect
Cache-Controlwhere present. - Use Trafilatura with
favor_precision=trueto extract main content for in‑memory summarization only; do not persist full text. - Generate a canonical URL (resolve redirects, strip tracking params) and compute
url_hash.
- Normalize & Deduplicate
- If
(source_id, url_hash)exists, skip insert; else createarticlesrow with metadata and snippet (<=320 chars). - Classify category using rule‑based hints (URL path, RSS category) with a fallback lightweight classifier.
- Summaries & Embeddings
- Create a short summary (60–90 words, neutral tone) with inline citation marker
[1]→ canonical URL. - Compute embedding on
(title + " " + snippet)and upsert intoarticle_embeddings.
- Indexing & Cache
- Postgres GIN index supports keyword search; pgvector handles ANN semantic search.
- Cache hot queries and summaries in Redis for 5–15 minutes.
API design (FastAPI)
GET /v1/search?q=&mode=hybrid&page=— Hybrid search (keyword + vector rerank), returns cards with title, snippet, badges, and citations.GET /v1/articles/{id}— Metadata + summary.POST /v1/ask— Conversational answer over top‑k retrieved articles, always with citations.POST /v1/feedback— Thumbs up/down and optional comment.
UI flows (Next.js 14)
- Home: Center composer, query suggestions, trending topics.
- Results: Perplexity‑style answer at top with source chips; below, cards for each cited article; sticky composer for follow‑ups.
- Interactions: Cmd/Ctrl‑K global search,
?keyboard help, skeleton loaders, optimistic UI.
Kubernetes (k0s) deployment sketch
-
Namespaces:
news,news-observe. -
Ingress:
nginx-ingressfor HTTPS; optional parallel HAProxy Ingress for TCP/advanced use. Certs via cert‑manager + DNS‑01 or HTTP‑01. -
Deployments (ARM64 images):
api(FastAPI, Uvicorn Gunicorn): 2 replicas, HPA on CPU 60% & p95 latency SLI.web(Next.js): 2 replicas, static export (optional) behind Node adapter.worker(ingest/summarize/embed): 2–4 replicas, separate queues forpoll,scrape,summ,embed.postgres(Bitnami ARM64) with persistent volume; enablepgvectorextension.redis(Bitnami ARM64) for cache/queue.
-
RBAC/Secrets: Kubernetes Secrets for API keys; service accounts per deployment.
-
Resources (starting): api 200m/512Mi; web 100m/256Mi; worker 300m/1Gi; redis 50m/256Mi; postgres 250m/2Gi.
-
Autoscaling: HPA + VPA recommendations; cluster metrics via kube‑metrics‑server.
Ranking & answer synthesis
- Hybrid search: BM25 (Postgres full‑text) for recall → take top 50; compute cosine similarity on vectors → rerank → top 8.
- Answer: Prompt model with the top 6 snippets + titles and URLs; enforce citation after each sentence where evidence exists. Refuse to answer beyond source material.
Rate limiting & ethics
- Per‑source QPS caps (e.g., 0.5–1 rps) and adaptive backoff.
- Honor robots.txt by default; switchable per your policy. Always link prominently to original.
- Snippets limited; no storage of full article text.
Implementation
0) Repo layout
news-agg/
apps/
api/ # FastAPI (Python 3.11)
web/ # Next.js 14 UI
workers/ # poll/scrape/summarize/embed (FastAPI tasks + RQ/Dramatiq)
deploy/
base/ # K8s Kustomize base (namespaces, RBAC, NetworkPolicies)
overlays/
pi-prod/
kustomization.yaml
postgres.yaml
redis.yaml
api.yaml
web.yaml
workers.yaml
cron-poller.yaml
ingress-nginx.yaml
ingress-haproxy.yaml (optional)
secrets.example.yaml
ops/
helm-values/
bitnami-postgresql.yaml
bitnami-redis.yaml
scripts/
build.sh # multi-arch docker buildx
db_migrate.sql # tables + pgvector
1) Container images (ARM64)
- Python base:
python:3.11-slim+uv/pip-tools; compile wheels at build time. - Node:
node:18-alpine→next buildthen run withnodeor export static. - Use
docker buildxto producelinux/arm64images. Example:
docker buildx build --platform linux/arm64 -t registry/pi/news-api:0.1 -f apps/api/Dockerfile --push .
apps/api/Dockerfile (snippet)
FROM python:3.11-slim
RUN apt-get update && apt-get install -y build-essential libpq-dev && rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY apps/api/pyproject.toml apps/api/uv.lock ./
RUN pip install -U pip && pip install uv
RUN uv pip install --system -r requirements.txt || true
COPY apps/api/ .
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]
2) k0s cluster prep (once)
- Install nginx‑ingress and (optionally) HAProxy Ingress via manifests/Helm.
- Install cert-manager for TLS if exposing publicly.
- Add metrics‑server for HPA and KEDA (optional) for queue-based scaling.
3) Datastores
PostgreSQL (Bitnami, pgvector)
# deploy/overlays/pi-prod/postgres.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata: { name: pgdata, namespace: news }
spec:
accessModes: ["ReadWriteOnce"]
resources: { requests: { storage: 20Gi } }
---
apiVersion: v1
kind: ConfigMap
metadata: { name: pg-init, namespace: news }
data:
00-init.sql: |
CREATE EXTENSION IF NOT EXISTS vector;
-- migrations applied by apps on startup too
---
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata: { name: pg, namespace: kube-system }
spec:
chart: oci://registry-1.docker.io/bitnamicharts/postgresql
targetNamespace: news
version: 15.x.x
valuesContent: |
image:
repository: bitnami/postgresql
tag: 15-debian-12
primary:
extraVolumes:
- name: pg-init
configMap: { name: pg-init }
extraVolumeMounts:
- name: pg-init
mountPath: /docker-entrypoint-initdb.d
persistence:
existingClaim: pgdata
auth:
username: news
password: ${PG_PASSWORD}
database: news
Redis (Bitnami)
# deploy/overlays/pi-prod/redis.yaml
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata: { name: redis, namespace: kube-system }
spec:
chart: oci://registry-1.docker.io/bitnamicharts/redis
targetNamespace: news
version: 18.x.x
valuesContent: |
architecture: standalone
auth:
enabled: false
4) Secrets & Config
# deploy/overlays/pi-prod/secrets.example.yaml (copy to secrets.yaml and fill)
apiVersion: v1
kind: Secret
metadata: { name: app-secrets, namespace: news }
type: Opaque
data:
OPENAI_API_KEY: <base64>
GEMINI_API_KEY: <base64>
APP_SIGNING_KEY: <base64>
---
apiVersion: v1
kind: ConfigMap
metadata: { name: app-config, namespace: news }
data:
SNIPPET_MAX: "320"
SOURCES: |
- name: Reuters
base_url: https://www.reuters.com
rss:
- https://www.reuters.com/rss/worldNews
sitemaps:
- https://www.reuters.com/sitemap_news.xml
robots_policy: honor
RANKING: "hybrid"
5) Workers (poll, scrape, summarize, embed)
# deploy/overlays/pi-prod/workers.yaml
apiVersion: apps/v1
kind: Deployment
metadata: { name: workers, namespace: news }
spec:
replicas: 3
selector: { matchLabels: { app: workers } }
template:
metadata: { labels: { app: workers } }
spec:
containers:
- name: workers
image: registry/pi/news-workers:0.1
envFrom:
- secretRef: { name: app-secrets }
- configMapRef: { name: app-config }
env:
- { name: REDIS_URL, value: redis://redis-master.news.svc.cluster.local:6379/0 }
- { name: DATABASE_URL, value: postgresql://news:$(PG_PASSWORD)@pg-postgresql.news.svc.cluster.local:5432/news }
resources:
requests: { cpu: "300m", memory: "1Gi" }
limits: { cpu: "900m", memory: "2Gi" }
livenessProbe: { httpGet: { path: /healthz, port: 8080 }, initialDelaySeconds: 15 }
readinessProbe:{ httpGet: { path: /readyz, port: 8080 }, initialDelaySeconds: 5 }
Cron: feed/sitemap polling
apiVersion: batch/v1
kind: CronJob
metadata: { name: poller, namespace: news }
spec:
schedule: "*/2 * * * *" # every 2 minutes
jobTemplate:
spec:
template:
spec:
restartPolicy: OnFailure
containers:
- name: poll
image: registry/pi/news-workers:0.1
args: ["poll"]
envFrom:
- secretRef: { name: app-secrets }
- configMapRef: { name: app-config }
6) API service (FastAPI)
# deploy/overlays/pi-prod/api.yaml
apiVersion: apps/v1
kind: Deployment
metadata: { name: api, namespace: news }
spec:
replicas: 2
selector: { matchLabels: { app: api } }
template:
metadata: { labels: { app: api } }
spec:
containers:
- name: api
image: registry/pi/news-api:0.1
ports: [{ containerPort: 8080 }]
envFrom:
- secretRef: { name: app-secrets }
- configMapRef: { name: app-config }
env:
- { name: REDIS_URL, value: redis://redis-master.news.svc.cluster.local:6379/0 }
- { name: DATABASE_URL, value: postgresql://news:$(PG_PASSWORD)@pg-postgresql.news.svc.cluster.local:5432/news }
resources:
requests: { cpu: "200m", memory: "512Mi" }
limits: { cpu: "600m", memory: "1Gi" }
---
apiVersion: v1
kind: Service
metadata: { name: api, namespace: news }
spec:
selector: { app: api }
ports:
- name: http
port: 80
targetPort: 8080
FastAPI search (sketch)
# apps/api/search.py
from pgvector.psycopg import register_vector
import psycopg, numpy as np
EMBED_DIM = 1536
def hybrid_search(conn, q, k=8):
with conn.cursor() as cur:
# 1) Embedding
v = embed(q) # call OpenAI embeddings or Gemini flash
# 2) Keyword recall
cur.execute("""
SELECT id, title, snippet, canonical_url,
ts_rank(to_tsvector('english', coalesce(title,'')||' '||coalesce(snippet,'')), plainto_tsquery(%s)) AS rank
FROM articles
WHERE to_tsvector('english', coalesce(title,'')||' '||coalesce(snippet,'')) @@ plainto_tsquery(%s)
ORDER BY rank DESC
LIMIT 50
""", (q, q))
rows = cur.fetchall()
ids = [r[0] for r in rows] or [-1]
# 3) Vector rerank
cur.execute("""
SELECT a.id, a.title, a.snippet, a.canonical_url,
1 - (e.embedding <=> %s::vector) AS sim
FROM articles a
JOIN article_embeddings e ON e.article_id = a.id
WHERE a.id = ANY(%s)
ORDER BY sim DESC LIMIT %s
""", (np.array(v), ids, k))
return cur.fetchall()
7) Web UI (Next.js 14)
- App Router, Tailwind, shadcn/ui. Server actions call API.
- Components:
Composer,AnswerBox(with sentence-level citations),ResultCard,SourceChip. - Add PWA manifest + basic offline cache for shell.
8) Ingress (nginx primary, HAProxy optional)
# deploy/overlays/pi-prod/ingress-nginx.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: news
namespace: news
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/proxy-body-size: "1m"
spec:
tls:
- hosts: [news.local]
secretName: news-tls
rules:
- host: news.local
http:
paths:
- path: /
pathType: Prefix
backend: { service: { name: web, port: { number: 80 } } }
- path: /v1
pathType: Prefix
backend: { service: { name: api, port: { number: 80 } } }
9) Observability
- Logging: JSON logs via
structlog(API/workers),stdoutaggregated by k0s. - Metrics: Prometheus scraping (use
prometheus-fastapi-instrumentator), Grafana dashboards. - Tracing: OpenTelemetry SDK exporting to Tempo/OTLP (optional).
- SLOs: p95 search < 600ms (warm); ingest freshness p95 < 5 min.
10) CI/CD (GitHub Actions)
- Build multi-arch images with
setup-buildx-action, push to your registry. - Deploy via
kubectlor ArgoCD (optional). Gate with manual approval.
11) Prompts & safety rails
- Summary prompt: 60–90 words, neutral tone, forbid speculation, 1–2 citations with URLs.
- Answer prompt: Use only retrieved snippets; every sentence claims must cite
[n]. If insufficient evidence, say so. - Guardrails: Max 6 articles per answer; truncate inputs to token budget.
Gemini LLM Integration
As an alternative to OpenAI models, this project supports Google's Gemini LLM for both embeddings and conversational tasks:
Available Models
- gemini-2.5-flash: Lightweight model optimized for fast responses and high throughput
- gemini-2.5-pro: Advanced "thinking" model with enhanced reasoning capabilities
Command Usage
Use the following commands to interact with Gemini models:
# For fast, lightweight responses (embeddings, quick summaries)
gemini --model gemini-2.5-flash -p "<PROMPT>"
# For complex reasoning and detailed analysis (conversational answers)
gemini --model gemini-2.5-pro -p "<PROMPT>"
Integration Notes
- Gemini models can be used as drop-in replacements for OpenAI equivalents
- Flash model recommended for embeddings worker (text-embedding-3-small equivalent)
- Pro model recommended for summarizer worker (gpt-4o-mini equivalent)
- Configure via GEMINI_API_KEY in Kubernetes secrets alongside OPENAI_API_KEY
- Network policies should allow egress to generativelanguage.googleapis.com
12) Performance knobs (Raspberry Pi friendly)
- Enable HTTP caching (ETag/If‑Modified‑Since).
- Redis cache TTL 10m for hot queries.
- Per‑host concurrency: 2 (scraper); global QPS: 0.5–1 for Reuters.
- Use gzip/deflate when fetching; strip images when scraping.
13) Data retention
- Keep
articles30 days rolling (configurable). Older rows archived toarticles_archivewithout embeddings.
14) Security
- NetworkPolicies: only API/worker → DB/Redis; web → API; deny egress by default except OpenAI and Gemini domains (api.openai.com, generativelanguage.googleapis.com).
- Secrets from Kubernetes; rotate quarterly. Read‑only service accounts for web. Include both OPENAI_API_KEY and GEMINI_API_KEY in secret management.
- TLS everywhere; CSP headers on web.
Milestones
MVP timeline: 2 weeks (LAN only, no TLS)
Week 1 — Foundations & ingest
- Day 1–2: Cluster prep (k0s), namespaces, nginx Ingress (HTTP only), metrics‑server. Registry access + buildx pipeline.
- Day 3: Postgres (pgvector) + Redis live; migrations applied.
- Day 4: Workers scaffolded (poll, scrape) with Reuters RSS + sitemap pollers; ETag/Last‑Modified implemented; robots policy set to honor.
- Day 5: Normalizer/dedupe; article schema writes; minimal admin page to view ingest logs.
Exit criteria: Reuters articles flowing into DB with title/snippet/category/published_at; p95 freshness under 10 min.
Week 2 — Search, summaries, UI polish
- Day 6: Embeddings worker + index (pgvector ivfflat). Hybrid search in API.
- Day 7: Summarizer worker; store 60–90 word summaries; cache.
- Day 8: Next.js UI (composer, answer box, cards, source chips). Basic keyboard nav.
- Day 9: Observability: Prometheus scrape + Grafana dashboard; SLOs wired.
- Day 10: Hardening (quotas, retries), data retention job; smoke tests; cut MVP v0.1.0.
Exit criteria: Query returns an answer with citations in < 800ms warm path; summaries stable; LAN users can search and read cited sources.
Gathering Results
KPIs (Primary)
- Freshness (p95): time from article publication → available in search. Target: ≤ 5 minutes; stretch ≤ 2 minutes.
- Answer Accuracy: % of answer sentences that have at least one valid citation to the retrieved set. Target: ≥ 95%.
KPIs (Secondary)
- Coverage: % of Reuters articles discovered vs. listed in sitemaps over last 24h. Target: ≥ 98%.
- Latency (p95): query → first contentful paint (UI) and API response time. Targets: API ≤ 600ms warm; UI FCP ≤ 1.5s on LAN.
- Stability: worker error rate < 1%; scraper retry rate < 10%.
Instrumentation
-
Prometheus metrics
ingest_freshness_seconds{source=…}(histogram)ingest_discovered_total{kind= rss|sitemap|scrape}scrape_http_status_total{code=…}search_latency_seconds(histogram)answer_citation_coverage_ratio(gauge)worker_queue_depth{queue=…}
-
Structured logs (JSON): include
trace_id,job_id, and normalized URL. -
Dashboards (Grafana): Freshness, Search Latency, Coverage vs Sitemap, Error budget burn.
Accuracy evaluation
-
Automatic:
- Parse answer into sentences; verify each sentence has at least one citation.
- Check that citation URLs match the top‑k retrieved set and that snippets contain supporting tokens (simple ROUGE‑like overlap).
- Flag low‑evidence sentences for review.
-
Human review (1–2×/week):
- 50 sampled answers; label: correct / partially supported / unsupported / off‑topic.
- Compute hallucination rate (unsupported sentences ÷ total) and track trend.
Feedback loop
- UI thumbs up/down with optional comment saved to
feedbacktable:
CREATE TABLE feedback (
id BIGSERIAL PRIMARY KEY,
query TEXT NOT NULL,
answer_id TEXT,
verdict TEXT CHECK (verdict IN ('up','down')),
comment TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
- Downvotes auto‑create a JIRA/GitHub issue if
answer_citation_coverage_ratio < 0.9.
Experimentation
- Prompt variants A/B via header flag in API (e.g.,
x-prompt=v2). - Ranking tweaks: switch BM25 weight vs vector weight; record NDCG@10 on labeled queries.
Post‑mortems & safety
- Blameless post‑mortem for any incident where hallucination rate > 10% in a day or freshness p95 > 10 min for >1h.
- Daily data retention job verified; no full‑text persists beyond in‑memory summary context.