chore(workspace): add hardened startup/security workflows and skill suite

This commit is contained in:
zap
2026-03-04 19:13:33 +00:00
parent 4903e9d75d
commit 808af5ee13
58 changed files with 3787 additions and 3 deletions

View File

@@ -0,0 +1,7 @@
{
"version": 1,
"registry": "https://clawhub.ai",
"slug": "searxng-local-search",
"installedVersion": "0.1.0",
"installedAt": 1772497721114
}

View File

@@ -0,0 +1,80 @@
---
name: searxng-local-search
description: Search the web via the local self-hosted SearXNG instance and use Brave only as fallback. Use when gathering current information, docs, links, or fact checks, and when privacy/local-first search is preferred.
metadata:
openclaw:
requires:
bins: ["bb"]
env: ["SEARXNG_URL"]
emoji: "🔍"
nix:
plugin: "babashka"
---
# SearXNG Local Search
## Policy (default behavior)
1. Use **SearXNG first** for normal web lookups.
2. Fall back to **Brave** only when:
- SearXNG is unavailable,
- SearXNG returns very weak/empty results,
- or user explicitly asks for Brave/second opinion.
3. In research answers, label which source was used.
## Preconditions
- `SEARXNG_URL` points to the local instance.
- SearXNG JSON API is enabled.
- Script entrypoint is available: `scripts/search.sh`.
Preferred local value in this workspace is the LAN endpoint already documented in `TOOLS.md`.
## Quick usage
```bash
scripts/search.sh "your search query"
```
With options:
```bash
scripts/search.sh "your query" '{"category":"news","time_range":"week","num_results":8}'
```
Options:
- `category`: `general|news|images|videos|it|science`
- `time_range`: `day|week|month|year`
- `language`: ISO language code (default `en`)
- `num_results`: integer (default `5`)
## Smoke test routine
Run before first use in a fresh environment or after changes:
```bash
scripts/smoke.sh openclaw
```
Pass criteria:
- command exits successfully,
- returns at least one result,
- includes title + URL fields.
If smoke test fails:
1. Confirm `SEARXNG_URL` is reachable.
2. Confirm SearXNG container/service is healthy.
3. Retry with a broad query and no filters.
4. If still failing, switch to Brave fallback and report SearXNG incident.
## Troubleshooting
- **Connection/timeout**: verify endpoint + container health.
- **Empty results**: broaden query, remove filters, retry.
- **Bad JSON/format**: verify SearXNG JSON format support.
- **Rate concerns**: keep queries paced; avoid burst loops.
## Notes
- This skill defines behavior and checks; it does not replace the underlying SearXNG service deployment.
- For API details and response structure, see `references/api-guide.md`.

View File

@@ -0,0 +1,6 @@
{
"ownerId": "kn78casstptqwp1nhzz6bxcjj1809hvc",
"slug": "searxng-local-search",
"version": "0.1.0",
"publishedAt": 1769835313265
}

View File

@@ -0,0 +1,263 @@
# SearXNG API Reference
This document provides detailed information about the SearXNG JSON API used by the search skill.
## Endpoint
```
GET /search
```
## Query Parameters
### Required
| Parameter | Type | Description |
|-----------|------|-------------|
| `q` | string | The search query |
| `format` | string | Response format (use `json`) |
### Optional
| Parameter | Type | Description | Default |
|-----------|------|-------------|---------|
| `language` | string | Language code (en, es, de, fr, etc.) | `en` |
| `pageno` | integer | Page number for pagination | `1` |
| `time_range` | string | Time filter: `day`, `week`, `month`, `year` | None |
| `category_X` | string | Filter by category (set to `1` to enable) | None |
### Categories
Enable specific categories by setting `category_NAME=1`:
- `category_general` - General web search
- `category_images` - Image search
- `category_videos` - Video search
- `category_news` - News articles
- `category_map` - Maps and locations
- `category_music` - Music search
- `category_files` - File search
- `category_it` - IT/technical content
- `category_science` - Scientific articles
- `category_social` - Social media
## Response Format
```json
{
"query": "search query",
"number_of_results": 42,
"results": [
{
"url": "https://example.com/page",
"title": "Page Title",
"content": "Description snippet...",
"engine": "google",
"engines": ["google", "bing"],
"category": "general",
"score": 1.85,
"pretty_url": "https://example.com/page",
"parsed_url": ["https", "example.com", "/page", "", "", ""],
"publishedDate": "2024-01-15T12:00:00"
}
],
"answers": [],
"corrections": [],
"infoboxes": [],
"suggestions": ["related query 1", "related query 2"],
"unresponsive_engines": []
}
```
## Result Fields
| Field | Type | Description |
|-------|------|-------------|
| `url` | string | Full URL of the result |
| `title` | string | Page title |
| `content` | string | Description or snippet |
| `engine` | string | Primary search engine |
| `engines` | array | All engines that returned this result |
| `score` | float | Relevance score (higher is better) |
| `category` | string | Result category |
| `publishedDate` | string | Publication date (ISO 8601) |
## Example Requests
### Basic Search
```bash
curl "http://localhost:8888/search?q=NixOS&format=json"
```
### Category Filter
```bash
curl "http://localhost:8888/search?q=python&category_it=1&format=json"
```
### Time Range Filter
```bash
curl "http://localhost:8888/search?q=news&time_range=day&format=json"
```
### Multiple Filters
```bash
curl "http://localhost:8888/search?q=AI&category_news=1&time_range=week&language=en&format=json"
```
### Pagination
```bash
curl "http://localhost:8888/search?q=rust&pageno=2&format=json"
```
## Rate Limiting
SearXNG implements rate limiting to prevent abuse. The default configuration allows:
- IP-based rate limiting
- Bot detection via various heuristics
- Link token verification
If you receive a 429 (Too Many Requests) response:
- Wait a few seconds before retrying
- Implement exponential backoff
- Cache frequently-accessed results
## Error Responses
### 400 Bad Request
Missing required parameters or invalid format.
```json
{
"error": "Missing required parameter: q"
}
```
### 429 Too Many Requests
Rate limit exceeded.
```json
{
"error": "Rate limit exceeded"
}
```
### 500 Internal Server Error
SearXNG server error. Check logs:
```bash
journalctl -u searx -n 50
```
## Best Practices
### 1. Query Construction
- Keep queries concise (1-6 words is optimal)
- Use quotes for exact phrases: `"exact phrase"`
- Use boolean operators: `term1 OR term2`
- Exclude terms with minus: `query -excluded`
### 2. Result Handling
- Sort by score for best results
- Check multiple engines for reliability
- Handle empty results gracefully
- Respect `unresponsive_engines` field
### 3. Performance
- Cache results locally when possible
- Use appropriate timeouts (30s recommended)
- Implement retry logic with exponential backoff
- Monitor response times
### 4. Categories
Choose appropriate categories for your query:
| Query Type | Best Category |
|------------|---------------|
| Current events | `news` |
| Code/documentation | `it` |
| Research papers | `science` |
| How-to guides | `general` |
| Media content | `videos` or `images` |
### 5. Time Ranges
Use time filters for time-sensitive queries:
- `day` - Breaking news, stock prices
- `week` - Recent updates, current events
- `month` - Trends, ongoing stories
- `year` - Annual reports, yearly summaries
## Engine-Specific Notes
SearXNG aggregates results from multiple search engines. Common engines:
- **Google** - Broad coverage, good relevance
- **Bing** - Good for recent content
- **DuckDuckGo** - Privacy-focused
- **Wikipedia** - Encyclopedic content
- **Stack Overflow** - Programming Q&A
- **GitHub** - Code repositories
- **arXiv** - Scientific papers
Each result may come from multiple engines, indicated in the `engines` array.
## Troubleshooting
### No Results
1. Check query is not too specific
2. Remove filters and try again
3. Verify engines are responding:
```bash
journalctl -u searx | grep -i error
```
### Slow Responses
1. Check `unresponsive_engines` field
2. Increase timeout in client
3. Disable slow engines in SearXNG config
### Inconsistent Results
1. Results vary by engine availability
2. Check which engines responded: `engines` field
3. Consider using score for ranking
## Advanced Configuration
For custom SearXNG configurations, edit the NixOS module:
```nix
services.searx.settings.engines = [
{
name = "google";
weight = 1.5; # Boost Google results
}
{
name = "duckduckgo";
disabled = true; # Disable DDG
}
];
```
## Resources
- [SearXNG Documentation](https://docs.searxng.org/)
- [SearXNG GitHub](https://github.com/searxng/searxng)
- [Engine Configuration](https://docs.searxng.org/admin/engines/index.html)
- [API Documentation](https://docs.searxng.org/dev/search_api.html)

View File

@@ -0,0 +1,152 @@
#!/usr/bin/env bb
(ns search
(:require [babashka.http-client :as http]
[cheshire.core :as json]
[clojure.string :as str]
[clojure.java.io :as io]))
(def default-endpoints
["http://localhost:8888"
"http://127.0.0.1:8888"
"http://192.168.153.113:18803"
"http://192.168.153.117:18803"])
(def min-delay-ms 1000)
(def timeout-ms 30000)
(def rate-file ".searxng-last-request")
(defn parse-options [s]
(if (or (nil? s) (str/blank? s))
{}
(try
(json/parse-string s true)
(catch Exception e
(binding [*out* *err*]
(println "Error: invalid options JSON")
(println (.getMessage e)))
(System/exit 2)))))
(defn now-ms [] (System/currentTimeMillis))
(defn last-request-ms []
(try
(when (.exists (io/file rate-file))
(Long/parseLong (str/trim (slurp rate-file))))
(catch Exception _ nil)))
(defn write-last-request! [ts]
(spit rate-file (str ts)))
(defn enforce-rate-limit! []
(when-let [last-ts (last-request-ms)]
(let [elapsed (- (now-ms) last-ts)]
(when (< elapsed min-delay-ms)
(Thread/sleep (- min-delay-ms elapsed))))))
(defn endpoint-candidates []
(let [env-url (some-> (System/getenv "SEARXNG_URL") str/trim)]
(if (and env-url (not (str/blank? env-url)))
(cons env-url default-endpoints)
default-endpoints)))
(defn category->param [category]
(when (and category (not= "general" category))
{(keyword (str "category_" category)) "1"}))
(defn build-params [query opts]
(merge
{:q query
:format "json"
:language (or (:language opts) "en")}
(when-let [tr (:time_range opts)] {:time_range tr})
(when-let [n (:num_results opts)] {:pageno 1 :count n})
(category->param (:category opts))))
(defn try-search [base-url params]
(let [url (str (str/replace base-url #"/$" "") "/search")]
(try
(let [resp (http/get url
{:query-params params
:timeout timeout-ms
:throw false
:headers {"accept" "application/json"}})]
(cond
(= 200 (:status resp))
{:ok true
:endpoint base-url
:body (json/parse-string (:body resp) true)}
(= 429 (:status resp))
{:ok false :retryable true :endpoint base-url :error "Rate limit exceeded (429)"}
:else
{:ok false :retryable true :endpoint base-url
:error (format "HTTP %s" (:status resp))}))
(catch Exception e
{:ok false :retryable true :endpoint base-url :error (.getMessage e)}))))
(defn top-results [results n]
(->> (or results [])
(sort-by (fn [r] (double (or (:score r) 0.0))) >)
(take n)))
(defn fmt-engines [r]
(let [engs (or (:engines r)
(when-let [e (:engine r)] [e])
[])]
(if (seq engs)
(str/join ", " engs)
"unknown")))
(defn print-results [query body num-results endpoint]
(let [total (or (:number_of_results body) (count (:results body)) 0)
results (top-results (:results body) num-results)]
(println (format "Search Results for \"%s\"" query))
(println (format "Found %s total results" total))
(println (format "Endpoint: %s" endpoint))
(println)
(if (seq results)
(doseq [[idx r] (map-indexed vector results)]
(println (format "%d. %s [Score: %.2f]"
(inc idx)
(or (:title r) "(untitled)")
(double (or (:score r) 0.0))))
(println (str " URL: " (or (:url r) "N/A")))
(println (str " " (or (:content r) "No description available.")))
(println (str " Engines: " (fmt-engines r)))
(println))
(println "No results found."))))
(defn usage []
(binding [*out* *err*]
(println "Usage: bb scripts/search.clj \"query\" '{\"category\":\"news\",\"time_range\":\"day\",\"num_results\":5}'")
(println)
(println "Options JSON keys: category, time_range, language, num_results")))
(defn -main [& args]
(let [[query opts-json] args]
(when (or (nil? query) (str/blank? query))
(usage)
(System/exit 1))
(let [opts (parse-options opts-json)
num-results (max 1 (min 20 (int (or (:num_results opts) 5))))
params (build-params query opts)]
(enforce-rate-limit!)
(write-last-request! (now-ms))
(loop [[endpoint & rest] (endpoint-candidates)
failures []]
(if (nil? endpoint)
(do
(binding [*out* *err*]
(println "Error: all SearXNG endpoints failed")
(doseq [{:keys [endpoint error]} failures]
(println (format "- %s -> %s" endpoint error))))
(System/exit 3))
(let [res (try-search endpoint params)]
(if (:ok res)
(print-results query (:body res) num-results endpoint)
(recur rest (conj failures (select-keys res [:endpoint :error]))))))))))
(apply -main *command-line-args*)

View File

@@ -0,0 +1,21 @@
#!/usr/bin/env bash
set -euo pipefail
ROOT="/home/openclaw/.openclaw/workspace"
SKILL_DIR="$ROOT/skills/searxng-local-search"
ENV_FILE="$ROOT/.env"
if [[ -f "$ENV_FILE" ]]; then
set -a
# shellcheck disable=SC1090
source "$ENV_FILE"
set +a
fi
if [[ $# -lt 1 ]]; then
echo "Usage: scripts/search.sh \"query\" '[{"category":"news","time_range":"day","num_results":5}]'" >&2
echo "Example: scripts/search.sh \"openclaw ai\" '{\"num_results\":3}'" >&2
exit 1
fi
exec bb "$SKILL_DIR/scripts/search.clj" "$@"

View File

@@ -0,0 +1,21 @@
#!/usr/bin/env bash
set -euo pipefail
SEARXNG_URL="${SEARXNG_URL:-http://192.168.153.113:18803}"
QUERY="${1:-test}"
echo "[smoke] endpoint: ${SEARXNG_URL}"
echo "[smoke] query: ${QUERY}"
echo "[smoke] curl json API..."
ENC_QUERY="$(python3 -c 'import urllib.parse,sys; print(urllib.parse.quote(sys.argv[1]))' "${QUERY}")"
curl -fsS --max-time 15 "${SEARXNG_URL%/}/search?q=${ENC_QUERY}&format=json" > /tmp/searx-smoke.json
echo "[smoke] validating response..."
python3 - <<'PY'
import json
p='/tmp/searx-smoke.json'
obj=json.load(open(p))
print('[ok] query:', obj.get('query'))
print('[ok] results:', len(obj.get('results', [])))
PY