Files
splash/INSTRUCTIONS.md
William Valentin 334aa698fa feat: implement news aggregator API with conventional commits
- Add FastAPI application with complete router structure
- Implement search, articles, ask, feedback, and health endpoints
- Add comprehensive Pydantic schemas for API requests/responses
- Include stub service implementations for all business logic
- Add full test suite with pytest-asyncio integration
- Configure conventional commits enforcement via git hooks
- Add project documentation and contribution guidelines
- Support both OpenAI and Gemini LLM integration options
2025-11-02 23:11:39 -08:00

26 KiB
Raw Permalink Blame History

SPEC-1 Classy Perplexitystyle News Aggregator (Raspberry Pi 5 K8s)

Background

You want a Perplexitystyle web app that aggregates news from a defined pool of reference websites and presents results in a classy, attractive, highly responsive UI. The target runtime is a Raspberry Pi 5 Kubernetes cluster, so the system must be lightweight, ARM64friendly, and resilient to node churn or SDcard fragility. The product should feel like a modern AI assistant for news discovery: fast search, crisp summaries, clear source attributions, and mobilefirst ergonomics.

Initial working assumptions (to be confirmed):

  • Content sources are a curated list of reputable outlets and blogs that permit aggregation with proper linking and snippetlength quoting.
  • We will index headlines, metadata, and short excerpts; fulltext storage will be minimized or avoided unless licensed.
  • The app will support semantic search + conversational Q&A over the indexed corpus, with citations to original articles.
  • Realtime(ish) freshness target: new articles discoverable within 25 minutes of publication.
  • UI aims to echo Perplexitys clean card layout, with source badges, inline citations, and a composer panel for queries.
  • Deployment must fit on 24 ARM64 nodes, using lightweight containers and a small replicated datastore.

Requirements

Scope for MVP: Start with Reuters as the single source. Use official RSS/Atom feeds and daily sitemaps when available; gracefully fall back to HTML scraping for sections without feeds, storing only metadata/snippets with links. Freshness target 25 minutes. UI mirrors Perplexitys card+chat layout with inline citations.

MoSCoW

Musthave

  • Aggregate from Reuters via RSS/Atom + sitemaps; fallback HTML scraper with robots.txt compliance toggle.
  • ARM64ready containers deployable on Raspberry Pi 5 K8s (k3s or MicroK8s).
  • Ingest pipeline with deduplication, canonical URL normalization, and ratelimit/backoff.
  • Index headlines, authors, timestamps, topics, short excerpt (<= 320 chars), and source URL.
  • Fulltext search over stored fields; semantic search embeddings over titles+snippets.
  • Summarization and onpage Q&A with clear citations to source URLs.
  • Classy, responsive UI with Perplexitystyle query composer, results cards, and source badges.
  • Observability: structured logs, basic metrics (ingest latency, queue depth, 95p response), and alerting.
  • Legal safety rails: configurable snippet length, perdomain robots policy, and killswitch per source.

Shouldhave

  • Topic taxonomy and tags (World, Business, Tech, etc.).
  • Incremental sitemap polling (by date) + changelist RSS polling with jitter to avoid burst load.
  • Reader mode extraction (readabilitystyle) used only for summarization in memory, not stored.
  • Caching layer (HTTP + summary cache) to keep Raspberry Pi costs low.
  • Multinode HA for index and queue; rolling updates.

Couldhave

  • User accounts for saved searches and daily digests.
  • Multisource expansion via declarative YAML for new sites.
  • Relatedstory clustering and timeline views.
  • Basic mobile PWA installability and offline readlater for snippets.

Wonthave (MVP)

  • Paywalled content bypassing or fulltext storage of copyrighted articles.
  • Personalized recommendations or email digests.
  • Editorial curation tooling beyond tags and pinning.

Method

Highlevel architecture

@startuml
skinparam componentStyle rectangle
skinparam shadowing false
skinparam ArrowColor #888
skinparam DefaultFontName Inter

rectangle "k0s Cluster (ARM64 Raspberry Pi 5)" as K8S {
  node "Namespace: news" as NS {
    [Ingest Scheduler]
(CronJobs)
    [Feed+Sitemap Poller]
(FastAPI Worker)
    [HTML Scraper]
(Worker, Trafilatura)
    [Normalizer/Dedupe]
(Worker)
    [Embedder]
(Worker -> OpenAI embeddings/Gemini flash)
    [Summarizer]
(Worker -> OpenAI gpt-4o-mini/Gemini pro)

    database "PostgreSQL + pgvector" as PG
    [Redis]
(Cache + Queue)

    [API Gateway]
(FastAPI)
    [Web UI]
(Next.js, Tailwind, shadcn)
  }
}

[Feed+Sitemap Poller] --> [HTML Scraper]
[HTML Scraper] --> [Normalizer/Dedupe]
[Normalizer/Dedupe] --> PG
[Embedder] --> PG
[Summarizer] --> PG

[Ingest Scheduler] --> [Feed+Sitemap Poller]
[Embedder] --> [OpenAI Embeddings API/Gemini API]
[Summarizer] --> [OpenAI Chat Completions/Gemini API]

[API Gateway] --> PG
[API Gateway] --> Redis
[Web UI] --> [API Gateway]
@enduml

Why these choices (MVP):

  • Source: Start with Reuters using news sitemaps (with pagination parameters) and RSS; where feeds dont exist, scrape respectfully with robots awareness.
  • Storage: PostgreSQL + pgvector keeps the stack compact (one DB for metadata, text search, and vectors). Postgres fulltext covers keyword search; pgvector powers semantic search.
  • Workers: Python FastAPI workers using Trafilatura for robust article extraction and metadata parsing. Redis as the lightweight queue/cache (Dramatiq or RQ).
  • Summaries/Q&A: Ondemand summaries and answer synthesis via gpt4omini or Gemini pro with inline citations. Embeddings via textembedding3small or Gemini flash. Both accessed through API keys/secrets in Kubernetes.
  • UI: Next.js 14 App Router, Tailwind + shadcn for a Perplexitystyle, lowlatency interface.
  • k0s: ARM64friendly. Use nginxingress for HTTP routing, with optional HAProxy Ingress for TCP/advanced policies.

Data model (PostgreSQL)

-- Sources (static for MVP)
CREATE TABLE sources (
  id SERIAL PRIMARY KEY,
  name TEXT NOT NULL UNIQUE,        -- e.g., 'Reuters'
  base_url TEXT NOT NULL,           -- e.g., https://www.reuters.com
  rss_urls TEXT[] NOT NULL DEFAULT '{}',
  sitemap_urls TEXT[] NOT NULL DEFAULT '{}',
  robots_txt TEXT,
  enabled BOOLEAN NOT NULL DEFAULT true
);

-- Raw fetch jobs (observability + retries)
CREATE TABLE fetch_jobs (
  id BIGSERIAL PRIMARY KEY,
  source_id INT REFERENCES sources(id),
  url TEXT NOT NULL,
  kind TEXT NOT NULL CHECK (kind IN ('rss','sitemap','article')),
  status TEXT NOT NULL CHECK (status IN ('queued','fetched','parsed','failed')),
  http_status INT,
  etag TEXT,
  last_modified TIMESTAMPTZ,
  attempts INT NOT NULL DEFAULT 0,
  error TEXT,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX ON fetch_jobs (status, created_at);

-- Canonical articles (no copyrighted full text stored)
CREATE TABLE articles (
  id BIGSERIAL PRIMARY KEY,
  source_id INT REFERENCES sources(id) NOT NULL,
  canonical_url TEXT NOT NULL,
  url_hash BYTEA NOT NULL,          -- SHA-256 of canonical_url
  title TEXT NOT NULL,
  author TEXT,
  category TEXT,                    -- World, Business, Tech, etc.
  published_at TIMESTAMPTZ,
  fetched_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  snippet TEXT,                     -- <= 320 chars, from feed/lede
  summary TEXT,                     -- model-generated abstract
  image_url TEXT,
  language TEXT DEFAULT 'en',
  UNIQUE (source_id, url_hash)
);
CREATE INDEX ON articles (published_at DESC);
CREATE INDEX ON articles USING GIN (to_tsvector('english', coalesce(title,'') || ' ' || coalesce(snippet,'')));

-- Embeddings for semantic search (title+snippet)
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE article_embeddings (
  article_id BIGINT PRIMARY KEY REFERENCES articles(id) ON DELETE CASCADE,
  embedding vector(1536) -- dimension for text-embedding-3-small or Gemini flash
);
CREATE INDEX ON article_embeddings USING ivfflat (embedding vector_cosine_ops);

-- Tags and mapping (optional but handy)
CREATE TABLE tags (
  id SERIAL PRIMARY KEY,
  name TEXT UNIQUE
);
CREATE TABLE article_tags (
  article_id BIGINT REFERENCES articles(id) ON DELETE CASCADE,
  tag_id INT REFERENCES tags(id) ON DELETE CASCADE,
  PRIMARY KEY (article_id, tag_id)
);

Ingestion flow

  1. Discovery
  • Poll RSS/Atom endpoints with ETag/LastModified to minimize bandwidth.
  • Poll news sitemaps using incremental parameters (e.g., from= offsets when supported). Maintain perendpoint cursors.
  • For sections without feeds, enqueue HTML pages discovered from site index pages (ratelimited) and respect robots.txt (configurable).
  1. Fetch & Extract
  • HTTP client with retry + exponential backoff and perhost concurrency caps (e.g., 24). Respect Cache-Control where present.
  • Use Trafilatura with favor_precision=true to extract main content for inmemory summarization only; do not persist full text.
  • Generate a canonical URL (resolve redirects, strip tracking params) and compute url_hash.
  1. Normalize & Deduplicate
  • If (source_id, url_hash) exists, skip insert; else create articles row with metadata and snippet (<=320 chars).
  • Classify category using rulebased hints (URL path, RSS category) with a fallback lightweight classifier.
  1. Summaries & Embeddings
  • Create a short summary (6090 words, neutral tone) with inline citation marker [1] → canonical URL.
  • Compute embedding on (title + " " + snippet) and upsert into article_embeddings.
  1. Indexing & Cache
  • Postgres GIN index supports keyword search; pgvector handles ANN semantic search.
  • Cache hot queries and summaries in Redis for 515 minutes.

API design (FastAPI)

  • GET /v1/search?q=&mode=hybrid&page= — Hybrid search (keyword + vector rerank), returns cards with title, snippet, badges, and citations.
  • GET /v1/articles/{id} — Metadata + summary.
  • POST /v1/ask — Conversational answer over topk retrieved articles, always with citations.
  • POST /v1/feedback — Thumbs up/down and optional comment.

UI flows (Next.js 14)

  • Home: Center composer, query suggestions, trending topics.
  • Results: Perplexitystyle answer at top with source chips; below, cards for each cited article; sticky composer for followups.
  • Interactions: Cmd/CtrlK global search, ? keyboard help, skeleton loaders, optimistic UI.

Kubernetes (k0s) deployment sketch

  • Namespaces: news, news-observe.

  • Ingress: nginx-ingress for HTTPS; optional parallel HAProxy Ingress for TCP/advanced use. Certs via certmanager + DNS01 or HTTP01.

  • Deployments (ARM64 images):

    • api (FastAPI, Uvicorn Gunicorn): 2 replicas, HPA on CPU 60% & p95 latency SLI.
    • web (Next.js): 2 replicas, static export (optional) behind Node adapter.
    • worker (ingest/summarize/embed): 24 replicas, separate queues for poll, scrape, summ, embed.
    • postgres (Bitnami ARM64) with persistent volume; enable pgvector extension.
    • redis (Bitnami ARM64) for cache/queue.
  • RBAC/Secrets: Kubernetes Secrets for API keys; service accounts per deployment.

  • Resources (starting): api 200m/512Mi; web 100m/256Mi; worker 300m/1Gi; redis 50m/256Mi; postgres 250m/2Gi.

  • Autoscaling: HPA + VPA recommendations; cluster metrics via kubemetricsserver.

Ranking & answer synthesis

  • Hybrid search: BM25 (Postgres fulltext) for recall → take top 50; compute cosine similarity on vectors → rerank → top 8.
  • Answer: Prompt model with the top 6 snippets + titles and URLs; enforce citation after each sentence where evidence exists. Refuse to answer beyond source material.

Rate limiting & ethics

  • Persource QPS caps (e.g., 0.51 rps) and adaptive backoff.
  • Honor robots.txt by default; switchable per your policy. Always link prominently to original.
  • Snippets limited; no storage of full article text.

Implementation

0) Repo layout

news-agg/
  apps/
    api/            # FastAPI (Python 3.11)
    web/            # Next.js 14 UI
    workers/        # poll/scrape/summarize/embed (FastAPI tasks + RQ/Dramatiq)
  deploy/
    base/           # K8s Kustomize base (namespaces, RBAC, NetworkPolicies)
    overlays/
      pi-prod/
        kustomization.yaml
        postgres.yaml
        redis.yaml
        api.yaml
        web.yaml
        workers.yaml
        cron-poller.yaml
        ingress-nginx.yaml
        ingress-haproxy.yaml (optional)
        secrets.example.yaml
  ops/
    helm-values/
      bitnami-postgresql.yaml
      bitnami-redis.yaml
  scripts/
    build.sh        # multi-arch docker buildx
    db_migrate.sql  # tables + pgvector

1) Container images (ARM64)

  • Python base: python:3.11-slim + uv/pip-tools; compile wheels at build time.
  • Node: node:18-alpinenext build then run with node or export static.
  • Use docker buildx to produce linux/arm64 images. Example:
docker buildx build --platform linux/arm64 -t registry/pi/news-api:0.1 -f apps/api/Dockerfile --push .

apps/api/Dockerfile (snippet)

FROM python:3.11-slim
RUN apt-get update && apt-get install -y build-essential libpq-dev && rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY apps/api/pyproject.toml apps/api/uv.lock ./
RUN pip install -U pip && pip install uv
RUN uv pip install --system -r requirements.txt || true
COPY apps/api/ .
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]

2) k0s cluster prep (once)

  • Install nginxingress and (optionally) HAProxy Ingress via manifests/Helm.
  • Install cert-manager for TLS if exposing publicly.
  • Add metricsserver for HPA and KEDA (optional) for queue-based scaling.

3) Datastores

PostgreSQL (Bitnami, pgvector)

# deploy/overlays/pi-prod/postgres.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata: { name: pgdata, namespace: news }
spec:
  accessModes: ["ReadWriteOnce"]
  resources: { requests: { storage: 20Gi } }
---
apiVersion: v1
kind: ConfigMap
metadata: { name: pg-init, namespace: news }
data:
  00-init.sql: |
    CREATE EXTENSION IF NOT EXISTS vector;
    -- migrations applied by apps on startup too
---
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata: { name: pg, namespace: kube-system }
spec:
  chart: oci://registry-1.docker.io/bitnamicharts/postgresql
  targetNamespace: news
  version: 15.x.x
  valuesContent: |
    image:
      repository: bitnami/postgresql
      tag: 15-debian-12
    primary:
      extraVolumes:
        - name: pg-init
          configMap: { name: pg-init }
      extraVolumeMounts:
        - name: pg-init
          mountPath: /docker-entrypoint-initdb.d
      persistence:
        existingClaim: pgdata
    auth:
      username: news
      password: ${PG_PASSWORD}
      database: news

Redis (Bitnami)

# deploy/overlays/pi-prod/redis.yaml
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata: { name: redis, namespace: kube-system }
spec:
  chart: oci://registry-1.docker.io/bitnamicharts/redis
  targetNamespace: news
  version: 18.x.x
  valuesContent: |
    architecture: standalone
    auth:
      enabled: false

4) Secrets & Config

# deploy/overlays/pi-prod/secrets.example.yaml (copy to secrets.yaml and fill)
apiVersion: v1
kind: Secret
metadata: { name: app-secrets, namespace: news }
type: Opaque
data:
  OPENAI_API_KEY: <base64>
  GEMINI_API_KEY: <base64>
  APP_SIGNING_KEY: <base64>
---
apiVersion: v1
kind: ConfigMap
metadata: { name: app-config, namespace: news }
data:
  SNIPPET_MAX: "320"
  SOURCES: |
    - name: Reuters
      base_url: https://www.reuters.com
      rss:
        - https://www.reuters.com/rss/worldNews
      sitemaps:
        - https://www.reuters.com/sitemap_news.xml
      robots_policy: honor
  RANKING: "hybrid"

5) Workers (poll, scrape, summarize, embed)

# deploy/overlays/pi-prod/workers.yaml
apiVersion: apps/v1
kind: Deployment
metadata: { name: workers, namespace: news }
spec:
  replicas: 3
  selector: { matchLabels: { app: workers } }
  template:
    metadata: { labels: { app: workers } }
    spec:
      containers:
        - name: workers
          image: registry/pi/news-workers:0.1
          envFrom:
            - secretRef: { name: app-secrets }
            - configMapRef: { name: app-config }
          env:
            - { name: REDIS_URL, value: redis://redis-master.news.svc.cluster.local:6379/0 }
            - { name: DATABASE_URL, value: postgresql://news:$(PG_PASSWORD)@pg-postgresql.news.svc.cluster.local:5432/news }
          resources:
            requests: { cpu: "300m", memory: "1Gi" }
            limits:   { cpu: "900m", memory: "2Gi" }
          livenessProbe: { httpGet: { path: /healthz, port: 8080 }, initialDelaySeconds: 15 }
          readinessProbe:{ httpGet: { path: /readyz,  port: 8080 }, initialDelaySeconds: 5 }

Cron: feed/sitemap polling

apiVersion: batch/v1
kind: CronJob
metadata: { name: poller, namespace: news }
spec:
  schedule: "*/2 * * * *"  # every 2 minutes
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          containers:
            - name: poll
              image: registry/pi/news-workers:0.1
              args: ["poll"]
              envFrom:
                - secretRef: { name: app-secrets }
                - configMapRef: { name: app-config }

6) API service (FastAPI)

# deploy/overlays/pi-prod/api.yaml
apiVersion: apps/v1
kind: Deployment
metadata: { name: api, namespace: news }
spec:
  replicas: 2
  selector: { matchLabels: { app: api } }
  template:
    metadata: { labels: { app: api } }
    spec:
      containers:
        - name: api
          image: registry/pi/news-api:0.1
          ports: [{ containerPort: 8080 }]
          envFrom:
            - secretRef: { name: app-secrets }
            - configMapRef: { name: app-config }
          env:
            - { name: REDIS_URL, value: redis://redis-master.news.svc.cluster.local:6379/0 }
            - { name: DATABASE_URL, value: postgresql://news:$(PG_PASSWORD)@pg-postgresql.news.svc.cluster.local:5432/news }
          resources:
            requests: { cpu: "200m", memory: "512Mi" }
            limits:   { cpu: "600m", memory: "1Gi" }
---
apiVersion: v1
kind: Service
metadata: { name: api, namespace: news }
spec:
  selector: { app: api }
  ports:
    - name: http
      port: 80
      targetPort: 8080

FastAPI search (sketch)

# apps/api/search.py
from pgvector.psycopg import register_vector
import psycopg, numpy as np

EMBED_DIM = 1536

def hybrid_search(conn, q, k=8):
    with conn.cursor() as cur:
        # 1) Embedding
        v = embed(q)  # call OpenAI embeddings or Gemini flash
        # 2) Keyword recall
        cur.execute("""
          SELECT id, title, snippet, canonical_url,
                 ts_rank(to_tsvector('english', coalesce(title,'')||' '||coalesce(snippet,'')), plainto_tsquery(%s)) AS rank
          FROM articles
          WHERE to_tsvector('english', coalesce(title,'')||' '||coalesce(snippet,'')) @@ plainto_tsquery(%s)
          ORDER BY rank DESC
          LIMIT 50
        """, (q, q))
        rows = cur.fetchall()
        ids = [r[0] for r in rows] or [-1]
        # 3) Vector rerank
        cur.execute("""
          SELECT a.id, a.title, a.snippet, a.canonical_url,
                 1 - (e.embedding <=> %s::vector) AS sim
          FROM articles a
          JOIN article_embeddings e ON e.article_id = a.id
          WHERE a.id = ANY(%s)
          ORDER BY sim DESC LIMIT %s
        """, (np.array(v), ids, k))
        return cur.fetchall()

7) Web UI (Next.js 14)

  • App Router, Tailwind, shadcn/ui. Server actions call API.
  • Components: Composer, AnswerBox (with sentence-level citations), ResultCard, SourceChip.
  • Add PWA manifest + basic offline cache for shell.

8) Ingress (nginx primary, HAProxy optional)

# deploy/overlays/pi-prod/ingress-nginx.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: news
  namespace: news
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/proxy-body-size: "1m"
spec:
  tls:
    - hosts: [news.local]
      secretName: news-tls
  rules:
    - host: news.local
      http:
        paths:
          - path: /
            pathType: Prefix
            backend: { service: { name: web, port: { number: 80 } } }
          - path: /v1
            pathType: Prefix
            backend: { service: { name: api, port: { number: 80 } } }

9) Observability

  • Logging: JSON logs via structlog (API/workers), stdout aggregated by k0s.
  • Metrics: Prometheus scraping (use prometheus-fastapi-instrumentator), Grafana dashboards.
  • Tracing: OpenTelemetry SDK exporting to Tempo/OTLP (optional).
  • SLOs: p95 search < 600ms (warm); ingest freshness p95 < 5 min.

10) CI/CD (GitHub Actions)

  • Build multi-arch images with setup-buildx-action, push to your registry.
  • Deploy via kubectl or ArgoCD (optional). Gate with manual approval.

11) Prompts & safety rails

  • Summary prompt: 6090 words, neutral tone, forbid speculation, 12 citations with URLs.
  • Answer prompt: Use only retrieved snippets; every sentence claims must cite [n]. If insufficient evidence, say so.
  • Guardrails: Max 6 articles per answer; truncate inputs to token budget.

Gemini LLM Integration

As an alternative to OpenAI models, this project supports Google's Gemini LLM for both embeddings and conversational tasks:

Available Models

  • gemini-2.5-flash: Lightweight model optimized for fast responses and high throughput
  • gemini-2.5-pro: Advanced "thinking" model with enhanced reasoning capabilities

Command Usage

Use the following commands to interact with Gemini models:

# For fast, lightweight responses (embeddings, quick summaries)
gemini --model gemini-2.5-flash -p "<PROMPT>"

# For complex reasoning and detailed analysis (conversational answers)
gemini --model gemini-2.5-pro -p "<PROMPT>"

Integration Notes

  • Gemini models can be used as drop-in replacements for OpenAI equivalents
  • Flash model recommended for embeddings worker (text-embedding-3-small equivalent)
  • Pro model recommended for summarizer worker (gpt-4o-mini equivalent)
  • Configure via GEMINI_API_KEY in Kubernetes secrets alongside OPENAI_API_KEY
  • Network policies should allow egress to generativelanguage.googleapis.com

12) Performance knobs (Raspberry Pi friendly)

  • Enable HTTP caching (ETag/IfModifiedSince).
  • Redis cache TTL 10m for hot queries.
  • Perhost concurrency: 2 (scraper); global QPS: 0.51 for Reuters.
  • Use gzip/deflate when fetching; strip images when scraping.

13) Data retention

  • Keep articles 30 days rolling (configurable). Older rows archived to articles_archive without embeddings.

14) Security

  • NetworkPolicies: only API/worker → DB/Redis; web → API; deny egress by default except OpenAI and Gemini domains (api.openai.com, generativelanguage.googleapis.com).
  • Secrets from Kubernetes; rotate quarterly. Readonly service accounts for web. Include both OPENAI_API_KEY and GEMINI_API_KEY in secret management.
  • TLS everywhere; CSP headers on web.

Milestones

MVP timeline: 2 weeks (LAN only, no TLS)

Week 1 — Foundations & ingest

  • Day 12: Cluster prep (k0s), namespaces, nginx Ingress (HTTP only), metricsserver. Registry access + buildx pipeline.
  • Day 3: Postgres (pgvector) + Redis live; migrations applied.
  • Day 4: Workers scaffolded (poll, scrape) with Reuters RSS + sitemap pollers; ETag/LastModified implemented; robots policy set to honor.
  • Day 5: Normalizer/dedupe; article schema writes; minimal admin page to view ingest logs.

Exit criteria: Reuters articles flowing into DB with title/snippet/category/published_at; p95 freshness under 10 min.

Week 2 — Search, summaries, UI polish

  • Day 6: Embeddings worker + index (pgvector ivfflat). Hybrid search in API.
  • Day 7: Summarizer worker; store 6090 word summaries; cache.
  • Day 8: Next.js UI (composer, answer box, cards, source chips). Basic keyboard nav.
  • Day 9: Observability: Prometheus scrape + Grafana dashboard; SLOs wired.
  • Day 10: Hardening (quotas, retries), data retention job; smoke tests; cut MVP v0.1.0.

Exit criteria: Query returns an answer with citations in < 800ms warm path; summaries stable; LAN users can search and read cited sources.

Gathering Results

KPIs (Primary)

  • Freshness (p95): time from article publication → available in search. Target: ≤ 5 minutes; stretch ≤ 2 minutes.
  • Answer Accuracy: % of answer sentences that have at least one valid citation to the retrieved set. Target: ≥ 95%.

KPIs (Secondary)

  • Coverage: % of Reuters articles discovered vs. listed in sitemaps over last 24h. Target: ≥ 98%.
  • Latency (p95): query → first contentful paint (UI) and API response time. Targets: API ≤ 600ms warm; UI FCP ≤ 1.5s on LAN.
  • Stability: worker error rate < 1%; scraper retry rate < 10%.

Instrumentation

  • Prometheus metrics

    • ingest_freshness_seconds{source=…} (histogram)
    • ingest_discovered_total{kind= rss|sitemap|scrape}
    • scrape_http_status_total{code=…}
    • search_latency_seconds (histogram)
    • answer_citation_coverage_ratio (gauge)
    • worker_queue_depth{queue=…}
  • Structured logs (JSON): include trace_id, job_id, and normalized URL.

  • Dashboards (Grafana): Freshness, Search Latency, Coverage vs Sitemap, Error budget burn.

Accuracy evaluation

  • Automatic:

    • Parse answer into sentences; verify each sentence has at least one citation.
    • Check that citation URLs match the topk retrieved set and that snippets contain supporting tokens (simple ROUGElike overlap).
    • Flag lowevidence sentences for review.
  • Human review (12×/week):

    • 50 sampled answers; label: correct / partially supported / unsupported / offtopic.
    • Compute hallucination rate (unsupported sentences ÷ total) and track trend.

Feedback loop

  • UI thumbs up/down with optional comment saved to feedback table:
CREATE TABLE feedback (
  id BIGSERIAL PRIMARY KEY,
  query TEXT NOT NULL,
  answer_id TEXT,
  verdict TEXT CHECK (verdict IN ('up','down')),
  comment TEXT,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
  • Downvotes autocreate a JIRA/GitHub issue if answer_citation_coverage_ratio < 0.9.

Experimentation

  • Prompt variants A/B via header flag in API (e.g., x-prompt=v2).
  • Ranking tweaks: switch BM25 weight vs vector weight; record NDCG@10 on labeled queries.

Postmortems & safety

  • Blameless postmortem for any incident where hallucination rate > 10% in a day or freshness p95 > 10 min for >1h.
  • Daily data retention job verified; no fulltext persists beyond inmemory summary context.