Files

William Valentin 334aa698fa feat: implement news aggregator API with conventional commits

- Add FastAPI application with complete router structure
- Implement search, articles, ask, feedback, and health endpoints
- Add comprehensive Pydantic schemas for API requests/responses
- Include stub service implementations for all business logic
- Add full test suite with pytest-asyncio integration
- Configure conventional commits enforcement via git hooks
- Add project documentation and contribution guidelines
- Support both OpenAI and Gemini LLM integration options

2025-11-02 23:11:39 -08:00

26 KiB

Raw Permalink Blame History

SPEC-1 – Classy Perplexity‑style News Aggregator (Raspberry Pi 5 K8s)

Background

You want a Perplexity‑style web app that aggregates news from a defined pool of reference websites and presents results in a classy, attractive, highly responsive UI. The target runtime is a Raspberry Pi 5 Kubernetes cluster, so the system must be lightweight, ARM64‑friendly, and resilient to node churn or SD‑card fragility. The product should feel like a modern AI assistant for news discovery: fast search, crisp summaries, clear source attributions, and mobile‑first ergonomics.

Initial working assumptions (to be confirmed):

Content sources are a curated list of reputable outlets and blogs that permit aggregation with proper linking and snippet‑length quoting.
We will index headlines, metadata, and short excerpts; full‑text storage will be minimized or avoided unless licensed.
The app will support semantic search + conversational Q&A over the indexed corpus, with citations to original articles.
Real‑time(ish) freshness target: new articles discoverable within 2–5 minutes of publication.
UI aims to echo Perplexity’s clean card layout, with source badges, inline citations, and a composer panel for queries.
Deployment must fit on 2–4 ARM64 nodes, using lightweight containers and a small replicated datastore.

Requirements

Scope for MVP: Start with Reuters as the single source. Use official RSS/Atom feeds and daily sitemaps when available; gracefully fall back to HTML scraping for sections without feeds, storing only metadata/snippets with links. Freshness target 2–5 minutes. UI mirrors Perplexity’s card+chat layout with inline citations.

MoSCoW

Must‑have

Aggregate from Reuters via RSS/Atom + sitemaps; fallback HTML scraper with robots.txt compliance toggle.
ARM64‑ready containers deployable on Raspberry Pi 5 K8s (k3s or MicroK8s).
Ingest pipeline with deduplication, canonical URL normalization, and rate‑limit/backoff.
Index headlines, authors, timestamps, topics, short excerpt (<= 320 chars), and source URL.
Full‑text search over stored fields; semantic search embeddings over titles+snippets.
Summarization and on‑page Q&A with clear citations to source URLs.
Classy, responsive UI with Perplexity‑style query composer, results cards, and source badges.
Observability: structured logs, basic metrics (ingest latency, queue depth, 95p response), and alerting.
Legal safety rails: configurable snippet length, per‑domain robots policy, and kill‑switch per source.

Should‑have

Topic taxonomy and tags (World, Business, Tech, etc.).
Incremental sitemap polling (by date) + change‑list RSS polling with jitter to avoid burst load.
Reader mode extraction (readability‑style) used only for summarization in memory, not stored.
Caching layer (HTTP + summary cache) to keep Raspberry Pi costs low.
Multi‑node HA for index and queue; rolling updates.

Could‑have

User accounts for saved searches and daily digests.
Multi‑source expansion via declarative YAML for new sites.
Related‑story clustering and timeline views.
Basic mobile PWA installability and offline read‑later for snippets.

Won’t‑have (MVP)

Paywalled content bypassing or full‑text storage of copyrighted articles.
Personalized recommendations or email digests.
Editorial curation tooling beyond tags and pinning.

Method

High‑level architecture

@startuml
skinparam componentStyle rectangle
skinparam shadowing false
skinparam ArrowColor #888
skinparam DefaultFontName Inter

rectangle "k0s Cluster (ARM64 Raspberry Pi 5)" as K8S {
  node "Namespace: news" as NS {
    [Ingest Scheduler]
(CronJobs)
    [Feed+Sitemap Poller]
(FastAPI Worker)
    [HTML Scraper]
(Worker, Trafilatura)
    [Normalizer/Dedupe]
(Worker)
    [Embedder]
(Worker -> OpenAI embeddings/Gemini flash)
    [Summarizer]
(Worker -> OpenAI gpt-4o-mini/Gemini pro)

    database "PostgreSQL + pgvector" as PG
    [Redis]
(Cache + Queue)

    [API Gateway]
(FastAPI)
    [Web UI]
(Next.js, Tailwind, shadcn)
  }
}

[Feed+Sitemap Poller] --> [HTML Scraper]
[HTML Scraper] --> [Normalizer/Dedupe]
[Normalizer/Dedupe] --> PG
[Embedder] --> PG
[Summarizer] --> PG

[Ingest Scheduler] --> [Feed+Sitemap Poller]
[Embedder] --> [OpenAI Embeddings API/Gemini API]
[Summarizer] --> [OpenAI Chat Completions/Gemini API]

[API Gateway] --> PG
[API Gateway] --> Redis
[Web UI] --> [API Gateway]
@enduml

Why these choices (MVP):

Source: Start with Reuters using news sitemaps (with pagination parameters) and RSS; where feeds don’t exist, scrape respectfully with robots awareness.
Storage: PostgreSQL + pgvector keeps the stack compact (one DB for metadata, text search, and vectors). Postgres full‑text covers keyword search; pgvector powers semantic search.
Workers: Python FastAPI workers using Trafilatura for robust article extraction and metadata parsing. Redis as the lightweight queue/cache (Dramatiq or RQ).
Summaries/Q&A: On‑demand summaries and answer synthesis via gpt‑4o‑mini or Gemini pro with inline citations. Embeddings via text‑embedding‑3‑small or Gemini flash. Both accessed through API keys/secrets in Kubernetes.
UI: Next.js 14 App Router, Tailwind + shadcn for a Perplexity‑style, low‑latency interface.
k0s: ARM64‑friendly. Use nginx‑ingress for HTTP routing, with optional HAProxy Ingress for TCP/advanced policies.

Data model (PostgreSQL)

-- Sources (static for MVP)
CREATE TABLE sources (
  id SERIAL PRIMARY KEY,
  name TEXT NOT NULL UNIQUE,        -- e.g., 'Reuters'
  base_url TEXT NOT NULL,           -- e.g., https://www.reuters.com
  rss_urls TEXT[] NOT NULL DEFAULT '{}',
  sitemap_urls TEXT[] NOT NULL DEFAULT '{}',
  robots_txt TEXT,
  enabled BOOLEAN NOT NULL DEFAULT true
);

-- Raw fetch jobs (observability + retries)
CREATE TABLE fetch_jobs (
  id BIGSERIAL PRIMARY KEY,
  source_id INT REFERENCES sources(id),
  url TEXT NOT NULL,
  kind TEXT NOT NULL CHECK (kind IN ('rss','sitemap','article')),
  status TEXT NOT NULL CHECK (status IN ('queued','fetched','parsed','failed')),
  http_status INT,
  etag TEXT,
  last_modified TIMESTAMPTZ,
  attempts INT NOT NULL DEFAULT 0,
  error TEXT,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX ON fetch_jobs (status, created_at);

-- Canonical articles (no copyrighted full text stored)
CREATE TABLE articles (
  id BIGSERIAL PRIMARY KEY,
  source_id INT REFERENCES sources(id) NOT NULL,
  canonical_url TEXT NOT NULL,
  url_hash BYTEA NOT NULL,          -- SHA-256 of canonical_url
  title TEXT NOT NULL,
  author TEXT,
  category TEXT,                    -- World, Business, Tech, etc.
  published_at TIMESTAMPTZ,
  fetched_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  snippet TEXT,                     -- <= 320 chars, from feed/lede
  summary TEXT,                     -- model-generated abstract
  image_url TEXT,
  language TEXT DEFAULT 'en',
  UNIQUE (source_id, url_hash)
);
CREATE INDEX ON articles (published_at DESC);
CREATE INDEX ON articles USING GIN (to_tsvector('english', coalesce(title,'') || ' ' || coalesce(snippet,'')));

-- Embeddings for semantic search (title+snippet)
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE article_embeddings (
  article_id BIGINT PRIMARY KEY REFERENCES articles(id) ON DELETE CASCADE,
  embedding vector(1536) -- dimension for text-embedding-3-small or Gemini flash
);
CREATE INDEX ON article_embeddings USING ivfflat (embedding vector_cosine_ops);

-- Tags and mapping (optional but handy)
CREATE TABLE tags (
  id SERIAL PRIMARY KEY,
  name TEXT UNIQUE
);
CREATE TABLE article_tags (
  article_id BIGINT REFERENCES articles(id) ON DELETE CASCADE,
  tag_id INT REFERENCES tags(id) ON DELETE CASCADE,
  PRIMARY KEY (article_id, tag_id)
);

Ingestion flow

Discovery

Poll RSS/Atom endpoints with ETag/Last‑Modified to minimize bandwidth.
Poll news sitemaps using incremental parameters (e.g., from= offsets when supported). Maintain per‑endpoint cursors.
For sections without feeds, enqueue HTML pages discovered from site index pages (rate‑limited) and respect robots.txt (configurable).

Fetch & Extract

HTTP client with retry + exponential backoff and per‑host concurrency caps (e.g., 2–4). Respect Cache-Control where present.
Use Trafilatura with favor_precision=true to extract main content for in‑memory summarization only; do not persist full text.
Generate a canonical URL (resolve redirects, strip tracking params) and compute url_hash.

Normalize & Deduplicate

If (source_id, url_hash) exists, skip insert; else create articles row with metadata and snippet (<=320 chars).
Classify category using rule‑based hints (URL path, RSS category) with a fallback lightweight classifier.

Summaries & Embeddings

Create a short summary (60–90 words, neutral tone) with inline citation marker [1] → canonical URL.
Compute embedding on (title + " " + snippet) and upsert into article_embeddings.

Indexing & Cache

Postgres GIN index supports keyword search; pgvector handles ANN semantic search.
Cache hot queries and summaries in Redis for 5–15 minutes.

API design (FastAPI)

GET /v1/search?q=&mode=hybrid&page= — Hybrid search (keyword + vector rerank), returns cards with title, snippet, badges, and citations.
GET /v1/articles/{id} — Metadata + summary.
POST /v1/ask — Conversational answer over top‑k retrieved articles, always with citations.
POST /v1/feedback — Thumbs up/down and optional comment.

UI flows (Next.js 14)

Home: Center composer, query suggestions, trending topics.
Results: Perplexity‑style answer at top with source chips; below, cards for each cited article; sticky composer for follow‑ups.
Interactions: Cmd/Ctrl‑K global search, ? keyboard help, skeleton loaders, optimistic UI.

Kubernetes (k0s) deployment sketch

Namespaces: news, news-observe.
Ingress: nginx-ingress for HTTPS; optional parallel HAProxy Ingress for TCP/advanced use. Certs via cert‑manager + DNS‑01 or HTTP‑01.
Deployments (ARM64 images):
- api (FastAPI, Uvicorn Gunicorn): 2 replicas, HPA on CPU 60% & p95 latency SLI.
- web (Next.js): 2 replicas, static export (optional) behind Node adapter.
- worker (ingest/summarize/embed): 2–4 replicas, separate queues for poll, scrape, summ, embed.
- postgres (Bitnami ARM64) with persistent volume; enable pgvector extension.
- redis (Bitnami ARM64) for cache/queue.
RBAC/Secrets: Kubernetes Secrets for API keys; service accounts per deployment.
Resources (starting): api 200m/512Mi; web 100m/256Mi; worker 300m/1Gi; redis 50m/256Mi; postgres 250m/2Gi.
Autoscaling: HPA + VPA recommendations; cluster metrics via kube‑metrics‑server.

Ranking & answer synthesis

Hybrid search: BM25 (Postgres full‑text) for recall → take top 50; compute cosine similarity on vectors → rerank → top 8.
Answer: Prompt model with the top 6 snippets + titles and URLs; enforce citation after each sentence where evidence exists. Refuse to answer beyond source material.

Rate limiting & ethics

Per‑source QPS caps (e.g., 0.5–1 rps) and adaptive backoff.
Honor robots.txt by default; switchable per your policy. Always link prominently to original.
Snippets limited; no storage of full article text.

Implementation

0) Repo layout

news-agg/
  apps/
    api/            # FastAPI (Python 3.11)
    web/            # Next.js 14 UI
    workers/        # poll/scrape/summarize/embed (FastAPI tasks + RQ/Dramatiq)
  deploy/
    base/           # K8s Kustomize base (namespaces, RBAC, NetworkPolicies)
    overlays/
      pi-prod/
        kustomization.yaml
        postgres.yaml
        redis.yaml
        api.yaml
        web.yaml
        workers.yaml
        cron-poller.yaml
        ingress-nginx.yaml
        ingress-haproxy.yaml (optional)
        secrets.example.yaml
  ops/
    helm-values/
      bitnami-postgresql.yaml
      bitnami-redis.yaml
  scripts/
    build.sh        # multi-arch docker buildx
    db_migrate.sql  # tables + pgvector

1) Container images (ARM64)

Python base: python:3.11-slim + uv/pip-tools; compile wheels at build time.
Node: node:18-alpine → next build then run with node or export static.
Use docker buildx to produce linux/arm64 images. Example:

docker buildx build --platform linux/arm64 -t registry/pi/news-api:0.1 -f apps/api/Dockerfile --push .

apps/api/Dockerfile (snippet)

FROM python:3.11-slim
RUN apt-get update && apt-get install -y build-essential libpq-dev && rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY apps/api/pyproject.toml apps/api/uv.lock ./
RUN pip install -U pip && pip install uv
RUN uv pip install --system -r requirements.txt || true
COPY apps/api/ .
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]

2) k0s cluster prep (once)

Install nginx‑ingress and (optionally) HAProxy Ingress via manifests/Helm.
Install cert-manager for TLS if exposing publicly.
Add metrics‑server for HPA and KEDA (optional) for queue-based scaling.

3) Datastores

PostgreSQL (Bitnami, pgvector)

# deploy/overlays/pi-prod/postgres.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata: { name: pgdata, namespace: news }
spec:
  accessModes: ["ReadWriteOnce"]
  resources: { requests: { storage: 20Gi } }
---
apiVersion: v1
kind: ConfigMap
metadata: { name: pg-init, namespace: news }
data:
  00-init.sql: |
    CREATE EXTENSION IF NOT EXISTS vector;
    -- migrations applied by apps on startup too
---
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata: { name: pg, namespace: kube-system }
spec:
  chart: oci://registry-1.docker.io/bitnamicharts/postgresql
  targetNamespace: news
  version: 15.x.x
  valuesContent: |
    image:
      repository: bitnami/postgresql
      tag: 15-debian-12
    primary:
      extraVolumes:
        - name: pg-init
          configMap: { name: pg-init }
      extraVolumeMounts:
        - name: pg-init
          mountPath: /docker-entrypoint-initdb.d
      persistence:
        existingClaim: pgdata
    auth:
      username: news
      password: ${PG_PASSWORD}
      database: news

Redis (Bitnami)

# deploy/overlays/pi-prod/redis.yaml
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata: { name: redis, namespace: kube-system }
spec:
  chart: oci://registry-1.docker.io/bitnamicharts/redis
  targetNamespace: news
  version: 18.x.x
  valuesContent: |
    architecture: standalone
    auth:
      enabled: false

4) Secrets & Config

# deploy/overlays/pi-prod/secrets.example.yaml (copy to secrets.yaml and fill)
apiVersion: v1
kind: Secret
metadata: { name: app-secrets, namespace: news }
type: Opaque
data:
  OPENAI_API_KEY: <base64>
  GEMINI_API_KEY: <base64>
  APP_SIGNING_KEY: <base64>
---
apiVersion: v1
kind: ConfigMap
metadata: { name: app-config, namespace: news }
data:
  SNIPPET_MAX: "320"
  SOURCES: |
    - name: Reuters
      base_url: https://www.reuters.com
      rss:
        - https://www.reuters.com/rss/worldNews
      sitemaps:
        - https://www.reuters.com/sitemap_news.xml
      robots_policy: honor
  RANKING: "hybrid"

5) Workers (poll, scrape, summarize, embed)

# deploy/overlays/pi-prod/workers.yaml
apiVersion: apps/v1
kind: Deployment
metadata: { name: workers, namespace: news }
spec:
  replicas: 3
  selector: { matchLabels: { app: workers } }
  template:
    metadata: { labels: { app: workers } }
    spec:
      containers:
        - name: workers
          image: registry/pi/news-workers:0.1
          envFrom:
            - secretRef: { name: app-secrets }
            - configMapRef: { name: app-config }
          env:
            - { name: REDIS_URL, value: redis://redis-master.news.svc.cluster.local:6379/0 }
            - { name: DATABASE_URL, value: postgresql://news:$(PG_PASSWORD)@pg-postgresql.news.svc.cluster.local:5432/news }
          resources:
            requests: { cpu: "300m", memory: "1Gi" }
            limits:   { cpu: "900m", memory: "2Gi" }
          livenessProbe: { httpGet: { path: /healthz, port: 8080 }, initialDelaySeconds: 15 }
          readinessProbe:{ httpGet: { path: /readyz,  port: 8080 }, initialDelaySeconds: 5 }

Cron: feed/sitemap polling

apiVersion: batch/v1
kind: CronJob
metadata: { name: poller, namespace: news }
spec:
  schedule: "*/2 * * * *"  # every 2 minutes
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          containers:
            - name: poll
              image: registry/pi/news-workers:0.1
              args: ["poll"]
              envFrom:
                - secretRef: { name: app-secrets }
                - configMapRef: { name: app-config }

6) API service (FastAPI)

# deploy/overlays/pi-prod/api.yaml
apiVersion: apps/v1
kind: Deployment
metadata: { name: api, namespace: news }
spec:
  replicas: 2
  selector: { matchLabels: { app: api } }
  template:
    metadata: { labels: { app: api } }
    spec:
      containers:
        - name: api
          image: registry/pi/news-api:0.1
          ports: [{ containerPort: 8080 }]
          envFrom:
            - secretRef: { name: app-secrets }
            - configMapRef: { name: app-config }
          env:
            - { name: REDIS_URL, value: redis://redis-master.news.svc.cluster.local:6379/0 }
            - { name: DATABASE_URL, value: postgresql://news:$(PG_PASSWORD)@pg-postgresql.news.svc.cluster.local:5432/news }
          resources:
            requests: { cpu: "200m", memory: "512Mi" }
            limits:   { cpu: "600m", memory: "1Gi" }
---
apiVersion: v1
kind: Service
metadata: { name: api, namespace: news }
spec:
  selector: { app: api }
  ports:
    - name: http
      port: 80
      targetPort: 8080

FastAPI search (sketch)

# apps/api/search.py
from pgvector.psycopg import register_vector
import psycopg, numpy as np

EMBED_DIM = 1536

def hybrid_search(conn, q, k=8):
    with conn.cursor() as cur:
        # 1) Embedding
        v = embed(q)  # call OpenAI embeddings or Gemini flash
        # 2) Keyword recall
        cur.execute("""
          SELECT id, title, snippet, canonical_url,
                 ts_rank(to_tsvector('english', coalesce(title,'')||' '||coalesce(snippet,'')), plainto_tsquery(%s)) AS rank
          FROM articles
          WHERE to_tsvector('english', coalesce(title,'')||' '||coalesce(snippet,'')) @@ plainto_tsquery(%s)
          ORDER BY rank DESC
          LIMIT 50
        """, (q, q))
        rows = cur.fetchall()
        ids = [r[0] for r in rows] or [-1]
        # 3) Vector rerank
        cur.execute("""
          SELECT a.id, a.title, a.snippet, a.canonical_url,
                 1 - (e.embedding <=> %s::vector) AS sim
          FROM articles a
          JOIN article_embeddings e ON e.article_id = a.id
          WHERE a.id = ANY(%s)
          ORDER BY sim DESC LIMIT %s
        """, (np.array(v), ids, k))
        return cur.fetchall()

7) Web UI (Next.js 14)

App Router, Tailwind, shadcn/ui. Server actions call API.
Components: Composer, AnswerBox (with sentence-level citations), ResultCard, SourceChip.
Add PWA manifest + basic offline cache for shell.

8) Ingress (nginx primary, HAProxy optional)

# deploy/overlays/pi-prod/ingress-nginx.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: news
  namespace: news
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/proxy-body-size: "1m"
spec:
  tls:
    - hosts: [news.local]
      secretName: news-tls
  rules:
    - host: news.local
      http:
        paths:
          - path: /
            pathType: Prefix
            backend: { service: { name: web, port: { number: 80 } } }
          - path: /v1
            pathType: Prefix
            backend: { service: { name: api, port: { number: 80 } } }

9) Observability

Logging: JSON logs via structlog (API/workers), stdout aggregated by k0s.
Metrics: Prometheus scraping (use prometheus-fastapi-instrumentator), Grafana dashboards.
Tracing: OpenTelemetry SDK exporting to Tempo/OTLP (optional).
SLOs: p95 search < 600ms (warm); ingest freshness p95 < 5 min.

10) CI/CD (GitHub Actions)

Build multi-arch images with setup-buildx-action, push to your registry.
Deploy via kubectl or ArgoCD (optional). Gate with manual approval.

11) Prompts & safety rails

Summary prompt: 60–90 words, neutral tone, forbid speculation, 1–2 citations with URLs.
Answer prompt: Use only retrieved snippets; every sentence claims must cite [n]. If insufficient evidence, say so.
Guardrails: Max 6 articles per answer; truncate inputs to token budget.

Gemini LLM Integration

As an alternative to OpenAI models, this project supports Google's Gemini LLM for both embeddings and conversational tasks:

Available Models

gemini-2.5-flash: Lightweight model optimized for fast responses and high throughput
gemini-2.5-pro: Advanced "thinking" model with enhanced reasoning capabilities

Command Usage

Use the following commands to interact with Gemini models:

# For fast, lightweight responses (embeddings, quick summaries)
gemini --model gemini-2.5-flash -p "<PROMPT>"

# For complex reasoning and detailed analysis (conversational answers)
gemini --model gemini-2.5-pro -p "<PROMPT>"

Integration Notes

Gemini models can be used as drop-in replacements for OpenAI equivalents
Flash model recommended for embeddings worker (text-embedding-3-small equivalent)
Pro model recommended for summarizer worker (gpt-4o-mini equivalent)
Configure via GEMINI_API_KEY in Kubernetes secrets alongside OPENAI_API_KEY
Network policies should allow egress to generativelanguage.googleapis.com

12) Performance knobs (Raspberry Pi friendly)

Enable HTTP caching (ETag/If‑Modified‑Since).
Redis cache TTL 10m for hot queries.
Per‑host concurrency: 2 (scraper); global QPS: 0.5–1 for Reuters.
Use gzip/deflate when fetching; strip images when scraping.

13) Data retention

Keep articles 30 days rolling (configurable). Older rows archived to articles_archive without embeddings.

14) Security

NetworkPolicies: only API/worker → DB/Redis; web → API; deny egress by default except OpenAI and Gemini domains (api.openai.com, generativelanguage.googleapis.com).
Secrets from Kubernetes; rotate quarterly. Read‑only service accounts for web. Include both OPENAI_API_KEY and GEMINI_API_KEY in secret management.
TLS everywhere; CSP headers on web.

Milestones

MVP timeline: 2 weeks (LAN only, no TLS)

Week 1 — Foundations & ingest

Day 1–2: Cluster prep (k0s), namespaces, nginx Ingress (HTTP only), metrics‑server. Registry access + buildx pipeline.
Day 3: Postgres (pgvector) + Redis live; migrations applied.
Day 4: Workers scaffolded (poll, scrape) with Reuters RSS + sitemap pollers; ETag/Last‑Modified implemented; robots policy set to honor.
Day 5: Normalizer/dedupe; article schema writes; minimal admin page to view ingest logs.

Exit criteria: Reuters articles flowing into DB with title/snippet/category/published_at; p95 freshness under 10 min.

Week 2 — Search, summaries, UI polish

Day 6: Embeddings worker + index (pgvector ivfflat). Hybrid search in API.
Day 7: Summarizer worker; store 60–90 word summaries; cache.
Day 8: Next.js UI (composer, answer box, cards, source chips). Basic keyboard nav.
Day 9: Observability: Prometheus scrape + Grafana dashboard; SLOs wired.
Day 10: Hardening (quotas, retries), data retention job; smoke tests; cut MVP v0.1.0.

Exit criteria: Query returns an answer with citations in < 800ms warm path; summaries stable; LAN users can search and read cited sources.

Gathering Results

KPIs (Primary)

Freshness (p95): time from article publication → available in search. Target: ≤ 5 minutes; stretch ≤ 2 minutes.
Answer Accuracy: % of answer sentences that have at least one valid citation to the retrieved set. Target: ≥ 95%.

KPIs (Secondary)

Coverage: % of Reuters articles discovered vs. listed in sitemaps over last 24h. Target: ≥ 98%.
Latency (p95): query → first contentful paint (UI) and API response time. Targets: API ≤ 600ms warm; UI FCP ≤ 1.5s on LAN.
Stability: worker error rate < 1%; scraper retry rate < 10%.

Instrumentation

Prometheus metrics
- ingest_freshness_seconds{source=…} (histogram)
- ingest_discovered_total{kind= rss|sitemap|scrape}
- scrape_http_status_total{code=…}
- search_latency_seconds (histogram)
- answer_citation_coverage_ratio (gauge)
- worker_queue_depth{queue=…}
Structured logs (JSON): include trace_id, job_id, and normalized URL.
Dashboards (Grafana): Freshness, Search Latency, Coverage vs Sitemap, Error budget burn.

Accuracy evaluation

Automatic:
- Parse answer into sentences; verify each sentence has at least one citation.
- Check that citation URLs match the top‑k retrieved set and that snippets contain supporting tokens (simple ROUGE‑like overlap).
- Flag low‑evidence sentences for review.
Human review (1–2×/week):
- 50 sampled answers; label: correct / partially supported / unsupported / off‑topic.
- Compute hallucination rate (unsupported sentences ÷ total) and track trend.

Feedback loop

UI thumbs up/down with optional comment saved to feedback table:

CREATE TABLE feedback (
  id BIGSERIAL PRIMARY KEY,
  query TEXT NOT NULL,
  answer_id TEXT,
  verdict TEXT CHECK (verdict IN ('up','down')),
  comment TEXT,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

Downvotes auto‑create a JIRA/GitHub issue if answer_citation_coverage_ratio < 0.9.

Experimentation

Prompt variants A/B via header flag in API (e.g., x-prompt=v2).
Ranking tweaks: switch BM25 weight vs vector weight; record NDCG@10 on labeled queries.

Post‑mortems & safety

Blameless post‑mortem for any incident where hallucination rate > 10% in a day or freshness p95 > 10 min for >1h.
Daily data retention job verified; no full‑text persists beyond in‑memory summary context.

26 KiB Raw Permalink Blame History Unescape Escape