chore(workspace): add hardened startup/security workflows and skill suite
This commit is contained in:
7
skills/searxng-local-search/.clawhub/origin.json
Normal file
7
skills/searxng-local-search/.clawhub/origin.json
Normal file
@@ -0,0 +1,7 @@
|
||||
{
|
||||
"version": 1,
|
||||
"registry": "https://clawhub.ai",
|
||||
"slug": "searxng-local-search",
|
||||
"installedVersion": "0.1.0",
|
||||
"installedAt": 1772497721114
|
||||
}
|
||||
80
skills/searxng-local-search/SKILL.md
Normal file
80
skills/searxng-local-search/SKILL.md
Normal file
@@ -0,0 +1,80 @@
|
||||
---
|
||||
name: searxng-local-search
|
||||
description: Search the web via the local self-hosted SearXNG instance and use Brave only as fallback. Use when gathering current information, docs, links, or fact checks, and when privacy/local-first search is preferred.
|
||||
metadata:
|
||||
openclaw:
|
||||
requires:
|
||||
bins: ["bb"]
|
||||
env: ["SEARXNG_URL"]
|
||||
emoji: "🔍"
|
||||
nix:
|
||||
plugin: "babashka"
|
||||
---
|
||||
|
||||
# SearXNG Local Search
|
||||
|
||||
## Policy (default behavior)
|
||||
|
||||
1. Use **SearXNG first** for normal web lookups.
|
||||
2. Fall back to **Brave** only when:
|
||||
- SearXNG is unavailable,
|
||||
- SearXNG returns very weak/empty results,
|
||||
- or user explicitly asks for Brave/second opinion.
|
||||
3. In research answers, label which source was used.
|
||||
|
||||
## Preconditions
|
||||
|
||||
- `SEARXNG_URL` points to the local instance.
|
||||
- SearXNG JSON API is enabled.
|
||||
- Script entrypoint is available: `scripts/search.sh`.
|
||||
|
||||
Preferred local value in this workspace is the LAN endpoint already documented in `TOOLS.md`.
|
||||
|
||||
## Quick usage
|
||||
|
||||
```bash
|
||||
scripts/search.sh "your search query"
|
||||
```
|
||||
|
||||
With options:
|
||||
|
||||
```bash
|
||||
scripts/search.sh "your query" '{"category":"news","time_range":"week","num_results":8}'
|
||||
```
|
||||
|
||||
Options:
|
||||
- `category`: `general|news|images|videos|it|science`
|
||||
- `time_range`: `day|week|month|year`
|
||||
- `language`: ISO language code (default `en`)
|
||||
- `num_results`: integer (default `5`)
|
||||
|
||||
## Smoke test routine
|
||||
|
||||
Run before first use in a fresh environment or after changes:
|
||||
|
||||
```bash
|
||||
scripts/smoke.sh openclaw
|
||||
```
|
||||
|
||||
Pass criteria:
|
||||
- command exits successfully,
|
||||
- returns at least one result,
|
||||
- includes title + URL fields.
|
||||
|
||||
If smoke test fails:
|
||||
1. Confirm `SEARXNG_URL` is reachable.
|
||||
2. Confirm SearXNG container/service is healthy.
|
||||
3. Retry with a broad query and no filters.
|
||||
4. If still failing, switch to Brave fallback and report SearXNG incident.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
- **Connection/timeout**: verify endpoint + container health.
|
||||
- **Empty results**: broaden query, remove filters, retry.
|
||||
- **Bad JSON/format**: verify SearXNG JSON format support.
|
||||
- **Rate concerns**: keep queries paced; avoid burst loops.
|
||||
|
||||
## Notes
|
||||
|
||||
- This skill defines behavior and checks; it does not replace the underlying SearXNG service deployment.
|
||||
- For API details and response structure, see `references/api-guide.md`.
|
||||
6
skills/searxng-local-search/_meta.json
Normal file
6
skills/searxng-local-search/_meta.json
Normal file
@@ -0,0 +1,6 @@
|
||||
{
|
||||
"ownerId": "kn78casstptqwp1nhzz6bxcjj1809hvc",
|
||||
"slug": "searxng-local-search",
|
||||
"version": "0.1.0",
|
||||
"publishedAt": 1769835313265
|
||||
}
|
||||
263
skills/searxng-local-search/references/api-guide.md
Normal file
263
skills/searxng-local-search/references/api-guide.md
Normal file
@@ -0,0 +1,263 @@
|
||||
# SearXNG API Reference
|
||||
|
||||
This document provides detailed information about the SearXNG JSON API used by the search skill.
|
||||
|
||||
## Endpoint
|
||||
|
||||
```
|
||||
GET /search
|
||||
```
|
||||
|
||||
## Query Parameters
|
||||
|
||||
### Required
|
||||
|
||||
| Parameter | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `q` | string | The search query |
|
||||
| `format` | string | Response format (use `json`) |
|
||||
|
||||
### Optional
|
||||
|
||||
| Parameter | Type | Description | Default |
|
||||
|-----------|------|-------------|---------|
|
||||
| `language` | string | Language code (en, es, de, fr, etc.) | `en` |
|
||||
| `pageno` | integer | Page number for pagination | `1` |
|
||||
| `time_range` | string | Time filter: `day`, `week`, `month`, `year` | None |
|
||||
| `category_X` | string | Filter by category (set to `1` to enable) | None |
|
||||
|
||||
### Categories
|
||||
|
||||
Enable specific categories by setting `category_NAME=1`:
|
||||
|
||||
- `category_general` - General web search
|
||||
- `category_images` - Image search
|
||||
- `category_videos` - Video search
|
||||
- `category_news` - News articles
|
||||
- `category_map` - Maps and locations
|
||||
- `category_music` - Music search
|
||||
- `category_files` - File search
|
||||
- `category_it` - IT/technical content
|
||||
- `category_science` - Scientific articles
|
||||
- `category_social` - Social media
|
||||
|
||||
## Response Format
|
||||
|
||||
```json
|
||||
{
|
||||
"query": "search query",
|
||||
"number_of_results": 42,
|
||||
"results": [
|
||||
{
|
||||
"url": "https://example.com/page",
|
||||
"title": "Page Title",
|
||||
"content": "Description snippet...",
|
||||
"engine": "google",
|
||||
"engines": ["google", "bing"],
|
||||
"category": "general",
|
||||
"score": 1.85,
|
||||
"pretty_url": "https://example.com/page",
|
||||
"parsed_url": ["https", "example.com", "/page", "", "", ""],
|
||||
"publishedDate": "2024-01-15T12:00:00"
|
||||
}
|
||||
],
|
||||
"answers": [],
|
||||
"corrections": [],
|
||||
"infoboxes": [],
|
||||
"suggestions": ["related query 1", "related query 2"],
|
||||
"unresponsive_engines": []
|
||||
}
|
||||
```
|
||||
|
||||
## Result Fields
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `url` | string | Full URL of the result |
|
||||
| `title` | string | Page title |
|
||||
| `content` | string | Description or snippet |
|
||||
| `engine` | string | Primary search engine |
|
||||
| `engines` | array | All engines that returned this result |
|
||||
| `score` | float | Relevance score (higher is better) |
|
||||
| `category` | string | Result category |
|
||||
| `publishedDate` | string | Publication date (ISO 8601) |
|
||||
|
||||
## Example Requests
|
||||
|
||||
### Basic Search
|
||||
|
||||
```bash
|
||||
curl "http://localhost:8888/search?q=NixOS&format=json"
|
||||
```
|
||||
|
||||
### Category Filter
|
||||
|
||||
```bash
|
||||
curl "http://localhost:8888/search?q=python&category_it=1&format=json"
|
||||
```
|
||||
|
||||
### Time Range Filter
|
||||
|
||||
```bash
|
||||
curl "http://localhost:8888/search?q=news&time_range=day&format=json"
|
||||
```
|
||||
|
||||
### Multiple Filters
|
||||
|
||||
```bash
|
||||
curl "http://localhost:8888/search?q=AI&category_news=1&time_range=week&language=en&format=json"
|
||||
```
|
||||
|
||||
### Pagination
|
||||
|
||||
```bash
|
||||
curl "http://localhost:8888/search?q=rust&pageno=2&format=json"
|
||||
```
|
||||
|
||||
## Rate Limiting
|
||||
|
||||
SearXNG implements rate limiting to prevent abuse. The default configuration allows:
|
||||
|
||||
- IP-based rate limiting
|
||||
- Bot detection via various heuristics
|
||||
- Link token verification
|
||||
|
||||
If you receive a 429 (Too Many Requests) response:
|
||||
- Wait a few seconds before retrying
|
||||
- Implement exponential backoff
|
||||
- Cache frequently-accessed results
|
||||
|
||||
## Error Responses
|
||||
|
||||
### 400 Bad Request
|
||||
|
||||
Missing required parameters or invalid format.
|
||||
|
||||
```json
|
||||
{
|
||||
"error": "Missing required parameter: q"
|
||||
}
|
||||
```
|
||||
|
||||
### 429 Too Many Requests
|
||||
|
||||
Rate limit exceeded.
|
||||
|
||||
```json
|
||||
{
|
||||
"error": "Rate limit exceeded"
|
||||
}
|
||||
```
|
||||
|
||||
### 500 Internal Server Error
|
||||
|
||||
SearXNG server error. Check logs:
|
||||
|
||||
```bash
|
||||
journalctl -u searx -n 50
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. Query Construction
|
||||
|
||||
- Keep queries concise (1-6 words is optimal)
|
||||
- Use quotes for exact phrases: `"exact phrase"`
|
||||
- Use boolean operators: `term1 OR term2`
|
||||
- Exclude terms with minus: `query -excluded`
|
||||
|
||||
### 2. Result Handling
|
||||
|
||||
- Sort by score for best results
|
||||
- Check multiple engines for reliability
|
||||
- Handle empty results gracefully
|
||||
- Respect `unresponsive_engines` field
|
||||
|
||||
### 3. Performance
|
||||
|
||||
- Cache results locally when possible
|
||||
- Use appropriate timeouts (30s recommended)
|
||||
- Implement retry logic with exponential backoff
|
||||
- Monitor response times
|
||||
|
||||
### 4. Categories
|
||||
|
||||
Choose appropriate categories for your query:
|
||||
|
||||
| Query Type | Best Category |
|
||||
|------------|---------------|
|
||||
| Current events | `news` |
|
||||
| Code/documentation | `it` |
|
||||
| Research papers | `science` |
|
||||
| How-to guides | `general` |
|
||||
| Media content | `videos` or `images` |
|
||||
|
||||
### 5. Time Ranges
|
||||
|
||||
Use time filters for time-sensitive queries:
|
||||
|
||||
- `day` - Breaking news, stock prices
|
||||
- `week` - Recent updates, current events
|
||||
- `month` - Trends, ongoing stories
|
||||
- `year` - Annual reports, yearly summaries
|
||||
|
||||
## Engine-Specific Notes
|
||||
|
||||
SearXNG aggregates results from multiple search engines. Common engines:
|
||||
|
||||
- **Google** - Broad coverage, good relevance
|
||||
- **Bing** - Good for recent content
|
||||
- **DuckDuckGo** - Privacy-focused
|
||||
- **Wikipedia** - Encyclopedic content
|
||||
- **Stack Overflow** - Programming Q&A
|
||||
- **GitHub** - Code repositories
|
||||
- **arXiv** - Scientific papers
|
||||
|
||||
Each result may come from multiple engines, indicated in the `engines` array.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### No Results
|
||||
|
||||
1. Check query is not too specific
|
||||
2. Remove filters and try again
|
||||
3. Verify engines are responding:
|
||||
```bash
|
||||
journalctl -u searx | grep -i error
|
||||
```
|
||||
|
||||
### Slow Responses
|
||||
|
||||
1. Check `unresponsive_engines` field
|
||||
2. Increase timeout in client
|
||||
3. Disable slow engines in SearXNG config
|
||||
|
||||
### Inconsistent Results
|
||||
|
||||
1. Results vary by engine availability
|
||||
2. Check which engines responded: `engines` field
|
||||
3. Consider using score for ranking
|
||||
|
||||
## Advanced Configuration
|
||||
|
||||
For custom SearXNG configurations, edit the NixOS module:
|
||||
|
||||
```nix
|
||||
services.searx.settings.engines = [
|
||||
{
|
||||
name = "google";
|
||||
weight = 1.5; # Boost Google results
|
||||
}
|
||||
{
|
||||
name = "duckduckgo";
|
||||
disabled = true; # Disable DDG
|
||||
}
|
||||
];
|
||||
```
|
||||
|
||||
## Resources
|
||||
|
||||
- [SearXNG Documentation](https://docs.searxng.org/)
|
||||
- [SearXNG GitHub](https://github.com/searxng/searxng)
|
||||
- [Engine Configuration](https://docs.searxng.org/admin/engines/index.html)
|
||||
- [API Documentation](https://docs.searxng.org/dev/search_api.html)
|
||||
152
skills/searxng-local-search/scripts/search.clj
Executable file
152
skills/searxng-local-search/scripts/search.clj
Executable file
@@ -0,0 +1,152 @@
|
||||
#!/usr/bin/env bb
|
||||
(ns search
|
||||
(:require [babashka.http-client :as http]
|
||||
[cheshire.core :as json]
|
||||
[clojure.string :as str]
|
||||
[clojure.java.io :as io]))
|
||||
|
||||
(def default-endpoints
|
||||
["http://localhost:8888"
|
||||
"http://127.0.0.1:8888"
|
||||
"http://192.168.153.113:18803"
|
||||
"http://192.168.153.117:18803"])
|
||||
|
||||
(def min-delay-ms 1000)
|
||||
(def timeout-ms 30000)
|
||||
(def rate-file ".searxng-last-request")
|
||||
|
||||
(defn parse-options [s]
|
||||
(if (or (nil? s) (str/blank? s))
|
||||
{}
|
||||
(try
|
||||
(json/parse-string s true)
|
||||
(catch Exception e
|
||||
(binding [*out* *err*]
|
||||
(println "Error: invalid options JSON")
|
||||
(println (.getMessage e)))
|
||||
(System/exit 2)))))
|
||||
|
||||
(defn now-ms [] (System/currentTimeMillis))
|
||||
|
||||
(defn last-request-ms []
|
||||
(try
|
||||
(when (.exists (io/file rate-file))
|
||||
(Long/parseLong (str/trim (slurp rate-file))))
|
||||
(catch Exception _ nil)))
|
||||
|
||||
(defn write-last-request! [ts]
|
||||
(spit rate-file (str ts)))
|
||||
|
||||
(defn enforce-rate-limit! []
|
||||
(when-let [last-ts (last-request-ms)]
|
||||
(let [elapsed (- (now-ms) last-ts)]
|
||||
(when (< elapsed min-delay-ms)
|
||||
(Thread/sleep (- min-delay-ms elapsed))))))
|
||||
|
||||
(defn endpoint-candidates []
|
||||
(let [env-url (some-> (System/getenv "SEARXNG_URL") str/trim)]
|
||||
(if (and env-url (not (str/blank? env-url)))
|
||||
(cons env-url default-endpoints)
|
||||
default-endpoints)))
|
||||
|
||||
(defn category->param [category]
|
||||
(when (and category (not= "general" category))
|
||||
{(keyword (str "category_" category)) "1"}))
|
||||
|
||||
(defn build-params [query opts]
|
||||
(merge
|
||||
{:q query
|
||||
:format "json"
|
||||
:language (or (:language opts) "en")}
|
||||
(when-let [tr (:time_range opts)] {:time_range tr})
|
||||
(when-let [n (:num_results opts)] {:pageno 1 :count n})
|
||||
(category->param (:category opts))))
|
||||
|
||||
(defn try-search [base-url params]
|
||||
(let [url (str (str/replace base-url #"/$" "") "/search")]
|
||||
(try
|
||||
(let [resp (http/get url
|
||||
{:query-params params
|
||||
:timeout timeout-ms
|
||||
:throw false
|
||||
:headers {"accept" "application/json"}})]
|
||||
(cond
|
||||
(= 200 (:status resp))
|
||||
{:ok true
|
||||
:endpoint base-url
|
||||
:body (json/parse-string (:body resp) true)}
|
||||
|
||||
(= 429 (:status resp))
|
||||
{:ok false :retryable true :endpoint base-url :error "Rate limit exceeded (429)"}
|
||||
|
||||
:else
|
||||
{:ok false :retryable true :endpoint base-url
|
||||
:error (format "HTTP %s" (:status resp))}))
|
||||
(catch Exception e
|
||||
{:ok false :retryable true :endpoint base-url :error (.getMessage e)}))))
|
||||
|
||||
(defn top-results [results n]
|
||||
(->> (or results [])
|
||||
(sort-by (fn [r] (double (or (:score r) 0.0))) >)
|
||||
(take n)))
|
||||
|
||||
(defn fmt-engines [r]
|
||||
(let [engs (or (:engines r)
|
||||
(when-let [e (:engine r)] [e])
|
||||
[])]
|
||||
(if (seq engs)
|
||||
(str/join ", " engs)
|
||||
"unknown")))
|
||||
|
||||
(defn print-results [query body num-results endpoint]
|
||||
(let [total (or (:number_of_results body) (count (:results body)) 0)
|
||||
results (top-results (:results body) num-results)]
|
||||
(println (format "Search Results for \"%s\"" query))
|
||||
(println (format "Found %s total results" total))
|
||||
(println (format "Endpoint: %s" endpoint))
|
||||
(println)
|
||||
(if (seq results)
|
||||
(doseq [[idx r] (map-indexed vector results)]
|
||||
(println (format "%d. %s [Score: %.2f]"
|
||||
(inc idx)
|
||||
(or (:title r) "(untitled)")
|
||||
(double (or (:score r) 0.0))))
|
||||
(println (str " URL: " (or (:url r) "N/A")))
|
||||
(println (str " " (or (:content r) "No description available.")))
|
||||
(println (str " Engines: " (fmt-engines r)))
|
||||
(println))
|
||||
(println "No results found."))))
|
||||
|
||||
(defn usage []
|
||||
(binding [*out* *err*]
|
||||
(println "Usage: bb scripts/search.clj \"query\" '{\"category\":\"news\",\"time_range\":\"day\",\"num_results\":5}'")
|
||||
(println)
|
||||
(println "Options JSON keys: category, time_range, language, num_results")))
|
||||
|
||||
(defn -main [& args]
|
||||
(let [[query opts-json] args]
|
||||
(when (or (nil? query) (str/blank? query))
|
||||
(usage)
|
||||
(System/exit 1))
|
||||
|
||||
(let [opts (parse-options opts-json)
|
||||
num-results (max 1 (min 20 (int (or (:num_results opts) 5))))
|
||||
params (build-params query opts)]
|
||||
(enforce-rate-limit!)
|
||||
(write-last-request! (now-ms))
|
||||
|
||||
(loop [[endpoint & rest] (endpoint-candidates)
|
||||
failures []]
|
||||
(if (nil? endpoint)
|
||||
(do
|
||||
(binding [*out* *err*]
|
||||
(println "Error: all SearXNG endpoints failed")
|
||||
(doseq [{:keys [endpoint error]} failures]
|
||||
(println (format "- %s -> %s" endpoint error))))
|
||||
(System/exit 3))
|
||||
(let [res (try-search endpoint params)]
|
||||
(if (:ok res)
|
||||
(print-results query (:body res) num-results endpoint)
|
||||
(recur rest (conj failures (select-keys res [:endpoint :error]))))))))))
|
||||
|
||||
(apply -main *command-line-args*)
|
||||
21
skills/searxng-local-search/scripts/search.sh
Executable file
21
skills/searxng-local-search/scripts/search.sh
Executable file
@@ -0,0 +1,21 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
ROOT="/home/openclaw/.openclaw/workspace"
|
||||
SKILL_DIR="$ROOT/skills/searxng-local-search"
|
||||
ENV_FILE="$ROOT/.env"
|
||||
|
||||
if [[ -f "$ENV_FILE" ]]; then
|
||||
set -a
|
||||
# shellcheck disable=SC1090
|
||||
source "$ENV_FILE"
|
||||
set +a
|
||||
fi
|
||||
|
||||
if [[ $# -lt 1 ]]; then
|
||||
echo "Usage: scripts/search.sh \"query\" '[{"category":"news","time_range":"day","num_results":5}]'" >&2
|
||||
echo "Example: scripts/search.sh \"openclaw ai\" '{\"num_results\":3}'" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
exec bb "$SKILL_DIR/scripts/search.clj" "$@"
|
||||
21
skills/searxng-local-search/scripts/smoke.sh
Executable file
21
skills/searxng-local-search/scripts/smoke.sh
Executable file
@@ -0,0 +1,21 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
SEARXNG_URL="${SEARXNG_URL:-http://192.168.153.113:18803}"
|
||||
QUERY="${1:-test}"
|
||||
|
||||
echo "[smoke] endpoint: ${SEARXNG_URL}"
|
||||
echo "[smoke] query: ${QUERY}"
|
||||
|
||||
echo "[smoke] curl json API..."
|
||||
ENC_QUERY="$(python3 -c 'import urllib.parse,sys; print(urllib.parse.quote(sys.argv[1]))' "${QUERY}")"
|
||||
curl -fsS --max-time 15 "${SEARXNG_URL%/}/search?q=${ENC_QUERY}&format=json" > /tmp/searx-smoke.json
|
||||
|
||||
echo "[smoke] validating response..."
|
||||
python3 - <<'PY'
|
||||
import json
|
||||
p='/tmp/searx-smoke.json'
|
||||
obj=json.load(open(p))
|
||||
print('[ok] query:', obj.get('query'))
|
||||
print('[ok] results:', len(obj.get('results', [])))
|
||||
PY
|
||||
Reference in New Issue
Block a user