Website Bot
Website Bot
Website Bot
This repository includes a floating bot embedded into the Knockoff Pipeline website. It is designed as a lightweight RAG service rather than a full agent system.
Goals
- answer questions grounded in the published knockoff documentation
- stay attached to the existing Hugo site instead of replacing it
- support both local development and hosted deployment
- fall back gracefully when an LLM provider is unavailable
Stack
- Frontend: Hugo + Hextra + vanilla JavaScript + CSS
- Backend: FastAPI + Uvicorn
- Retrieval: local text parsing, chunking, token matching, TF-IDF style scoring
- LLM interface: OpenAI-compatible
/chat/completions - Supported providers: Gemini, OpenAI, Groq, OpenRouter, Ollama
- Hosting:
- frontend on GitHub Pages
- backend on Hugging Face Spaces (Docker)
Main Files
layouts/_partials/scripts.html- injects the floating bot root into the site layout
static/bot-launcher.js- handles drag, open/close, request submission, and message rendering
assets/css/custom.css- styles the floating icon and chat panel
backend/app.py- exposes
/api/health,/api/chat,/api/reindex
- exposes
backend/rag.py- parses documents, chunks content, retrieves evidence, and builds extractive fallback answers
backend/llm.py- sends grounded evidence to the configured LLM provider
backend/settings.py- loads provider settings and knowledge source paths
Runtime Flow
flowchart TD
A["User asks a question"] --> B["Floating bot UI"]
B --> C["POST /api/chat"]
C --> D["Retrieve matching chunks from content/"]
D --> E{"LLM configured?"}
E -->|Yes| F["Call provider with grounded evidence"]
E -->|No| G["Return extractive fallback answer"]
F --> H["Answer with citations"]
G --> H
Retrieval Strategy
The current retrieval layer is intentionally simple:
- source files come from
content/by default - markdown is stripped to plain text
- documents are split into overlapping chunks
- the question is tokenized and matched against chunk term frequencies
- results are ranked with a TF-IDF style score
- the top chunks become evidence for either fallback output or LLM synthesis
The bot also prefers evidence in the same language as the question:
- English questions prefer
content/en/ - Chinese questions prefer
content/zh/
This prevents mixed-language answers when both language trees contain related material.
Deployment Route
The production deployment is split:
- GitHub Pages builds and serves the static Hugo site.
- GitHub Actions syncs an API-only bundle to a Hugging Face Docker Space.
- The public site calls the deployed Space API.
- For local development, the launcher switches back to
http://127.0.0.1:8000/api.
Relevant deployment files:
.github/workflows/pages.yaml.github/workflows/hf-space.yamlDockerfilerequirements.txtspace/README.md
Local Cleanup Safety
The bot does not require your local generated cache to survive.
runtime/bot/index.jsonis rebuildable cache data- if you delete
runtime/, the backend recreates the index fromcontent/on startup - the public Hugging Face Space has its own bundled copy of
content/
So the bot still works after cleaning local generated data.
What you should not delete from the repository is content/, because that is the actual knowledge source.
Current Limits
- retrieval is lexical, not embedding-based
- there is no persistent conversation memory
/api/reindexis still open and should be restricted before broader public exposure- index freshness is not yet automatically invalidated when source files change
Next Improvements
- fingerprint-aware index refresh
- embedding or rerank-based retrieval
- tighter API protection
- streaming responses in the frontend
- provider/model status shown in the chat panel