docs(project): add production-ready documentation
Add a top-level README plus production configuration, API, and rollout documentation. Also align deployment and runbook docs with the current runtime semantics, ports, and daily pipeline entrypoints.
This commit is contained in:
@@ -6,3 +6,12 @@ OPENROUTER_API_KEY=
|
||||
|
||||
# 本机 PostgreSQL 连接(long 用户通过本地 socket 直连)
|
||||
DATABASE_URL="host=/var/run/postgresql dbname=llm_intelligence user=long sslmode=disable"
|
||||
|
||||
# API Server 监听端口(默认 8080)
|
||||
PORT=8080
|
||||
|
||||
# 正式日报失败告警(可选)
|
||||
FEISHU_WEBHOOK=
|
||||
|
||||
# 日报输出目录(可选,默认 reports/daily)
|
||||
REPORT_OUTPUT_DIR="reports/daily"
|
||||
|
||||
@@ -1,8 +1,14 @@
|
||||
# LLM Intelligence Hub - 部署指南
|
||||
|
||||
> 版本: v1.0
|
||||
> 日期: 2026-05-10
|
||||
> 适用版本: Phase 1
|
||||
> 版本: v1.1
|
||||
> 日期: 2026-05-14
|
||||
> 适用版本: Phase 3 / Phase 5
|
||||
|
||||
相关文档:
|
||||
|
||||
- `README.md`:项目入口与常用命令
|
||||
- `docs/CONFIGURATION.md`:环境变量与运行语义
|
||||
- `docs/PRODUCTION_CHECKLIST.md`:上线前检查、发布与回滚
|
||||
|
||||
---
|
||||
|
||||
@@ -60,7 +66,11 @@ npm run dev
|
||||
### 6. 配置定时任务
|
||||
```bash
|
||||
crontab -e
|
||||
# 添加: 0 8 * * * cd /path/to/llm-intelligence && bash scripts/run_daily.sh
|
||||
# 正式日报调度
|
||||
0 8 * * * cd /path/to/llm-intelligence && bash scripts/run_daily.sh >> /tmp/llm_hub_cron.log 2>&1
|
||||
|
||||
# 真实采集 + 写库 + 报告生成的手动复跑入口
|
||||
cd /path/to/llm-intelligence && bash scripts/run_real_pipeline.sh
|
||||
```
|
||||
|
||||
---
|
||||
@@ -84,7 +94,7 @@ docker-compose up -d
|
||||
| DATABASE_URL | ✅ | PostgreSQL 连接串 |
|
||||
| OPENROUTER_API_KEY | ✅ | OpenRouter API Key |
|
||||
| FEISHU_WEBHOOK | ❌ | 飞书告警 Webhook |
|
||||
| API_PORT | ❌ | 默认 8080 |
|
||||
| PORT | ❌ | API Server 监听端口,默认 8080 |
|
||||
|
||||
---
|
||||
|
||||
@@ -95,10 +105,14 @@ docker-compose up -d
|
||||
curl http://localhost:8080/health
|
||||
|
||||
# 采集器测试
|
||||
go run scripts/fetch_openrouter.go
|
||||
go run scripts/fetch_openrouter.go -strict-real
|
||||
|
||||
# 日报生成
|
||||
go run scripts/generate_daily_report.go
|
||||
|
||||
# 运行门禁
|
||||
bash scripts/verify_phase3.sh
|
||||
bash scripts/verify_phase5.sh
|
||||
```
|
||||
|
||||
---
|
||||
@@ -112,7 +126,13 @@ go run scripts/generate_daily_report.go
|
||||
检查 Node.js 版本 >= 20,npm 版本 >= 10。
|
||||
|
||||
### Q: 采集器返回模拟数据?
|
||||
未提供 OPENROUTER_API_KEY 时使用模拟数据,提供 Key 后获取真实数据。
|
||||
`fetch_openrouter.go` 在非严格模式下会降级到模拟数据;正式调度和真实流水线默认要求 `OPENROUTER_API_KEY`、真实写库成功,并会把 `run_kind / trigger_source / is_official_daily` 写入运行审计。
|
||||
|
||||
### Q: 历史重建如何执行?
|
||||
```bash
|
||||
bash scripts/rebuild_historical_report.sh 2025-08-07
|
||||
```
|
||||
历史重建只会回填审计语义,不会冒充当天正式定时产出。
|
||||
|
||||
---
|
||||
|
||||
|
||||
160
README.md
Normal file
160
README.md
Normal file
@@ -0,0 +1,160 @@
|
||||
# LLM Intelligence Hub
|
||||
|
||||
面向 LLM 模型、定价与日报产出的情报采集项目,当前仓库提供:
|
||||
|
||||
- Go 采集脚本:采集 OpenRouter、多源补充数据与官方补录数据
|
||||
- PostgreSQL 数据层:保存模型、区域定价、订阅套餐、日报与运行审计
|
||||
- Go HTTP API:提供模型列表、套餐列表、最新正式日报入口
|
||||
- Vite + React 前端:提供 Dashboard / Explorer 两个只读页面
|
||||
- Shell 运维脚本:迁移、调度、备份、恢复、验收与性能门禁
|
||||
|
||||
## 当前能力边界
|
||||
|
||||
- 真实生产主链路是“采集/导入脚本 + PostgreSQL + 日报生成器 + API Server + Nginx”
|
||||
- 最新正式日报由 `scripts/run_daily.sh` 生成,并写入 `daily_report` / `report_runs`
|
||||
- 手工复跑使用 `scripts/run_real_pipeline.sh`,不会把产物标记成正式日报
|
||||
- 历史补跑使用 `scripts/rebuild_historical_report.sh YYYY-MM-DD`
|
||||
- HTTP API 当前未内建认证、授权和限流;公网暴露前必须在网关层补齐
|
||||
|
||||
## 目录概览
|
||||
|
||||
```text
|
||||
cmd/server/ Go API Server
|
||||
internal/ 通用内部库(collector、retry)
|
||||
scripts/ 采集、导入、日报、验收、运维脚本
|
||||
db/migrations/ PostgreSQL 迁移
|
||||
frontend/ Vite + React 前端
|
||||
reports/daily/ 日报产物与归档
|
||||
ops/ 运维配置(如 logrotate)
|
||||
docs/ 补充说明与上线文档
|
||||
```
|
||||
|
||||
## 本地启动
|
||||
|
||||
### 1. 准备环境
|
||||
|
||||
```bash
|
||||
cp .env.example .env
|
||||
```
|
||||
|
||||
至少需要配置:
|
||||
|
||||
- `DATABASE_URL`
|
||||
- `OPENROUTER_API_KEY`(仅真实采集需要)
|
||||
|
||||
详细变量说明见 [docs/CONFIGURATION.md](docs/CONFIGURATION.md)。
|
||||
|
||||
### 2. 应用数据库迁移
|
||||
|
||||
```bash
|
||||
bash scripts/apply_migration.sh
|
||||
```
|
||||
|
||||
### 3. 启动 API Server
|
||||
|
||||
```bash
|
||||
go run ./cmd/server
|
||||
```
|
||||
|
||||
默认端口为 `8080`,可通过 `PORT` 覆盖。
|
||||
|
||||
### 4. 启动前端开发环境
|
||||
|
||||
```bash
|
||||
cd frontend
|
||||
npm install
|
||||
npm run dev
|
||||
```
|
||||
|
||||
## 生产运行主链路
|
||||
|
||||
### 正式日报调度
|
||||
|
||||
```bash
|
||||
bash scripts/run_daily.sh
|
||||
```
|
||||
|
||||
该脚本负责:
|
||||
|
||||
1. OpenRouter 真实采集
|
||||
2. 多源补充同步
|
||||
3. 官方导入脚本执行
|
||||
4. 数据质量检查
|
||||
5. Markdown / HTML 日报生成
|
||||
6. 日报归档
|
||||
7. `daily_report` / `report_runs` 审计写入
|
||||
8. 失败时降级复制昨日报告并可选飞书告警
|
||||
|
||||
### 手工真实复跑
|
||||
|
||||
```bash
|
||||
bash scripts/run_real_pipeline.sh
|
||||
```
|
||||
|
||||
适用于联调、排障、上线后人工验证。该入口写入:
|
||||
|
||||
- `run_kind=manual`
|
||||
- `trigger_source=pipeline`
|
||||
- `is_official_daily=false`
|
||||
|
||||
### 历史补跑
|
||||
|
||||
```bash
|
||||
bash scripts/rebuild_historical_report.sh 2026-05-13
|
||||
```
|
||||
|
||||
该入口写入:
|
||||
|
||||
- `run_kind=historical_rebuild`
|
||||
- `trigger_source=rebuild_script`
|
||||
- `is_official_daily=false`
|
||||
|
||||
## 常用命令
|
||||
|
||||
```bash
|
||||
go test ./...
|
||||
bash scripts/test.sh
|
||||
bash scripts/verify_pre_phase6.sh
|
||||
bash scripts/verify_phase6.sh
|
||||
bash healthcheck.sh
|
||||
cd frontend && npm run test -- --run
|
||||
cd frontend && npm run build
|
||||
```
|
||||
|
||||
## API 概览
|
||||
|
||||
- `GET /health`
|
||||
- `GET /api/v1/models`
|
||||
- `GET /api/v1/subscription-plans`
|
||||
- `GET /api/v1/reports/latest`
|
||||
- `GET /api/v1/reports/latest/markdown`
|
||||
- `GET /api/v1/reports/latest/html`
|
||||
|
||||
完整字段与示例见 [docs/API_REFERENCE.md](docs/API_REFERENCE.md)。
|
||||
|
||||
## 文档索引
|
||||
|
||||
- [docs/CONFIGURATION.md](docs/CONFIGURATION.md):环境变量、运行语义、配置约束
|
||||
- [docs/API_REFERENCE.md](docs/API_REFERENCE.md):API 入口、返回体与排障说明
|
||||
- [docs/PRODUCTION_CHECKLIST.md](docs/PRODUCTION_CHECKLIST.md):生产上线前检查、发布与回滚流程
|
||||
- [DEPLOYMENT.md](DEPLOYMENT.md):部署步骤与快速启动
|
||||
- [RUNBOOK.md](RUNBOOK.md):运维巡检、故障排查、备份恢复
|
||||
- [TECHNICAL_DESIGN.md](TECHNICAL_DESIGN.md):详细技术设计与数据模型演进背景
|
||||
- [docs/PERFORMANCE_TEST.md](docs/PERFORMANCE_TEST.md):性能基线
|
||||
|
||||
## 生产上线最低门禁
|
||||
|
||||
建议把以下检查作为发布前硬门禁:
|
||||
|
||||
```bash
|
||||
bash scripts/verify_pre_phase6.sh
|
||||
bash scripts/verify_phase6.sh
|
||||
```
|
||||
|
||||
上线后首轮冒烟建议至少覆盖:
|
||||
|
||||
```bash
|
||||
curl -fsS http://127.0.0.1:8080/health
|
||||
curl -fsS http://127.0.0.1:8080/api/v1/models
|
||||
curl -fsS http://127.0.0.1:8080/api/v1/reports/latest
|
||||
```
|
||||
32
RUNBOOK.md
32
RUNBOOK.md
@@ -1,8 +1,14 @@
|
||||
# LLM Intelligence Hub - 运维手册
|
||||
|
||||
> 版本: v1.0
|
||||
> 日期: 2026-05-10
|
||||
> 适用版本: Phase 1
|
||||
> 版本: v1.1
|
||||
> 日期: 2026-05-14
|
||||
> 适用版本: Phase 3 / Phase 5
|
||||
|
||||
相关文档:
|
||||
|
||||
- `docs/PRODUCTION_CHECKLIST.md`:上线前门禁、发布步骤、回滚流程
|
||||
- `docs/CONFIGURATION.md`:环境变量与产物路径约定
|
||||
- `docs/API_REFERENCE.md`:健康检查与只读接口说明
|
||||
|
||||
---
|
||||
|
||||
@@ -32,11 +38,15 @@ docker-compose logs -f db
|
||||
```bash
|
||||
psql "$DATABASE_URL" -c "SELECT COUNT(*) FROM models WHERE deleted_at IS NULL"
|
||||
psql "$DATABASE_URL" -c "SELECT source, success, created_at FROM collector_stats ORDER BY created_at DESC LIMIT 5"
|
||||
psql "$DATABASE_URL" -c "SELECT report_date, run_kind, trigger_source, is_official_daily, status FROM daily_report ORDER BY updated_at DESC LIMIT 5"
|
||||
psql "$DATABASE_URL" -c "SELECT report_date, run_kind, trigger_source, is_official_daily, status FROM report_runs ORDER BY report_date DESC, created_at DESC LIMIT 5"
|
||||
```
|
||||
|
||||
### 日报检查
|
||||
```bash
|
||||
ls -la reports/daily/daily_report_$(date +%Y-%m-%d).md
|
||||
ls -la reports/daily/html/daily_report_$(date +%Y-%m-%d).html
|
||||
ls -la reports/daily/$(date +%Y)/$(date +%m)/daily_report_$(date +%Y-%m-%d).md
|
||||
```
|
||||
|
||||
### 磁盘空间
|
||||
@@ -63,6 +73,13 @@ df -h /tmp
|
||||
1. 检查 cron: `crontab -l | grep llm-intelligence`
|
||||
2. 手动运行: `bash scripts/run_daily.sh`
|
||||
3. 检查降级报告: `ls reports/daily/*.md | tail -1`
|
||||
4. 如果是历史补跑,使用 `REPORT_RUN_KIND=historical_rebuild` 和 `REPORT_TRIGGER_SOURCE=rebuild_script`,不要当作正式定时产出读取
|
||||
|
||||
### 正式日报与历史重建
|
||||
- 正式定时产出由 `scripts/run_daily.sh` 生成,`is_official_daily=true`
|
||||
- 真实复跑由 `scripts/run_real_pipeline.sh` 负责,通常用于手工验证真实采集 + 真实写库 + 报告生成
|
||||
- 历史重建通过 `scripts/rebuild_historical_report.sh <date>` 执行,运行语义应保持 `run_kind=historical_rebuild`
|
||||
- 前端 `/api/v1/reports/latest` 默认只读正式日报,不会把历史重建当成最新正式产出
|
||||
|
||||
### 前端无法访问
|
||||
1. 检查 Nginx: `docker-compose ps nginx`
|
||||
@@ -99,6 +116,15 @@ gunzip < backup_file.sql.gz | psql "$DATABASE_URL"
|
||||
| 数据库连接 | 失败 | `pg_isready` |
|
||||
| 磁盘空间 | > 80% | `df -h` |
|
||||
|
||||
## 运行审计
|
||||
|
||||
正式日报与历史重建现在会写入运行语义字段,排障时优先看这些字段:
|
||||
|
||||
- `run_kind`: `scheduled` / `historical_rebuild` / `manual`
|
||||
- `trigger_source`: `cron` / `rebuild_script` / `pipeline`
|
||||
- `is_official_daily`: 是否属于当天定时正式产出
|
||||
- `summary_md`: 真实运行审计前缀 + 报告摘要
|
||||
|
||||
---
|
||||
|
||||
## 扩容指南
|
||||
|
||||
200
docs/API_REFERENCE.md
Normal file
200
docs/API_REFERENCE.md
Normal file
@@ -0,0 +1,200 @@
|
||||
# API 参考
|
||||
|
||||
当前服务端入口位于 `cmd/server/main.go`,只暴露只读查询接口与健康检查接口。
|
||||
|
||||
## 通用约定
|
||||
|
||||
- 基础地址:`http://<host>:<port>`
|
||||
- 默认端口:`8080`
|
||||
- 返回格式:成功接口统一返回 `{ "data": ... }`
|
||||
- 失败格式:当前直接返回纯文本错误信息,不是统一 JSON 错误结构
|
||||
- 鉴权:当前仓库未内建认证、鉴权与限流;公网暴露前应由网关或反向代理补齐
|
||||
|
||||
## `GET /health`
|
||||
|
||||
检查数据库连通性。
|
||||
|
||||
### 成功
|
||||
|
||||
```json
|
||||
{
|
||||
"status": "ok"
|
||||
}
|
||||
```
|
||||
|
||||
### 失败
|
||||
|
||||
- `503 database not configured`:未配置 `DATABASE_URL`
|
||||
- `503 database unavailable`:数据库 Ping 失败
|
||||
|
||||
### 示例
|
||||
|
||||
```bash
|
||||
curl -fsS http://127.0.0.1:8080/health
|
||||
```
|
||||
|
||||
## `GET /api/v1/models`
|
||||
|
||||
返回模型列表,数据来源于 `models`、`model_provider`、`region_pricing` 当前最新价格快照。
|
||||
|
||||
### 返回体
|
||||
|
||||
```json
|
||||
{
|
||||
"data": [
|
||||
{
|
||||
"id": "openai/gpt-4o",
|
||||
"name": "gpt-4o",
|
||||
"provider": "OpenAI",
|
||||
"providerCN": "OpenAI",
|
||||
"modality": "text",
|
||||
"contextLength": 128000,
|
||||
"inputPrice": 2.5,
|
||||
"outputPrice": 10,
|
||||
"currency": "USD",
|
||||
"isFree": false,
|
||||
"stale": false,
|
||||
"dataConfidence": "official"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 字段说明
|
||||
|
||||
| 字段 | 说明 |
|
||||
|------|------|
|
||||
| `id` | 模型外部 ID,通常是 `provider/model` |
|
||||
| `name` | 模型名称;为空时回退为 `external_id` |
|
||||
| `provider` | 英文厂商名 |
|
||||
| `providerCN` | 中文厂商名;缺失时回退为英文名或 `external_id` 前缀 |
|
||||
| `modality` | 模态类型 |
|
||||
| `contextLength` | 上下文窗口 |
|
||||
| `inputPrice` | 输入价格,单位与 `currency` 配套,默认按每百万 token |
|
||||
| `outputPrice` | 输出价格 |
|
||||
| `currency` | 币种 |
|
||||
| `isFree` | 是否免费 |
|
||||
| `stale` | 是否陈旧数据,当前由 `dataConfidence == "stale"` 推导 |
|
||||
| `dataConfidence` | 数据置信度 |
|
||||
|
||||
### 失败
|
||||
|
||||
- `503 database not configured`
|
||||
- `500 query failed`
|
||||
|
||||
## `GET /api/v1/subscription-plans`
|
||||
|
||||
返回订阅型套餐列表,当前主要对应腾讯云套餐数据。
|
||||
|
||||
### 返回体
|
||||
|
||||
```json
|
||||
{
|
||||
"data": [
|
||||
{
|
||||
"planFamily": "token_plan",
|
||||
"planCode": "token-plan-lite",
|
||||
"planName": "通用 Token Plan Lite",
|
||||
"tier": "Lite",
|
||||
"provider": "Tencent",
|
||||
"providerCN": "腾讯",
|
||||
"operator": "Tencent Cloud",
|
||||
"operatorCN": "腾讯云",
|
||||
"currency": "CNY",
|
||||
"listPrice": 39,
|
||||
"priceUnit": "CNY/month",
|
||||
"quotaValue": 35000000,
|
||||
"quotaUnit": "tokens/month",
|
||||
"contextWindow": 0,
|
||||
"modelScope": ["tc-code-latest", "glm-5", "glm-5.1"],
|
||||
"sourceUrl": "https://cloud.tencent.com/document/product/1823/130060",
|
||||
"publishedAt": "2026-04-27T00:00:00",
|
||||
"effectiveDate": "2026-04-27"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 失败
|
||||
|
||||
- `503 database not configured`
|
||||
- `500 query failed`
|
||||
|
||||
## `GET /api/v1/reports/latest`
|
||||
|
||||
返回最新“正式日报”元数据。查询条件来自 `daily_report`:
|
||||
|
||||
- `status = 'generated'`
|
||||
- `output_path` 非空
|
||||
- `is_official_daily = true`
|
||||
|
||||
### 返回体
|
||||
|
||||
```json
|
||||
{
|
||||
"data": {
|
||||
"reportDate": "2026-05-13",
|
||||
"status": "generated",
|
||||
"modelCount": 504,
|
||||
"summaryMD": "runtime_audit ...",
|
||||
"markdownPath": "reports/daily/daily_report_2026-05-13.md",
|
||||
"htmlPath": "reports/daily/html/daily_report_2026-05-13.html",
|
||||
"archiveMarkdownPath": "reports/daily/2026/05/daily_report_2026-05-13.md",
|
||||
"archiveHtmlPath": "reports/daily/2026/05/daily_report_2026-05-13.html",
|
||||
"markdownUrl": "/api/v1/reports/latest/markdown",
|
||||
"htmlUrl": "/api/v1/reports/latest/html",
|
||||
"updatedAt": "2026-05-13T08:00:00"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 失败
|
||||
|
||||
- `503 database not configured`
|
||||
- `404 latest report not found`
|
||||
- `500 query failed`
|
||||
|
||||
## `GET /api/v1/reports/latest/markdown`
|
||||
|
||||
直接返回最新正式日报的 Markdown 文件内容。
|
||||
|
||||
### 成功
|
||||
|
||||
- `200`
|
||||
- `Content-Type: text/markdown; charset=utf-8`
|
||||
|
||||
### 失败
|
||||
|
||||
- `404 latest report not found`:数据库中没有符合条件的正式日报
|
||||
- `404 report artifact not found`:元数据存在,但落盘文件缺失
|
||||
|
||||
## `GET /api/v1/reports/latest/html`
|
||||
|
||||
直接返回最新正式日报 HTML 文件内容。
|
||||
|
||||
### 成功
|
||||
|
||||
- `200`
|
||||
- `Content-Type: text/html; charset=utf-8`
|
||||
|
||||
### 失败
|
||||
|
||||
- `404 latest report not found`
|
||||
- `404 report artifact not found`
|
||||
|
||||
## 冒烟检查命令
|
||||
|
||||
```bash
|
||||
curl -fsS http://127.0.0.1:8080/health
|
||||
curl -fsS http://127.0.0.1:8080/api/v1/models | jq '.data | length'
|
||||
curl -fsS http://127.0.0.1:8080/api/v1/subscription-plans | jq '.data | length'
|
||||
curl -fsS http://127.0.0.1:8080/api/v1/reports/latest | jq '.data.reportDate'
|
||||
curl -fsS http://127.0.0.1:8080/api/v1/reports/latest/html > /tmp/latest_report.html
|
||||
```
|
||||
|
||||
## 生产暴露建议
|
||||
|
||||
- 在 Nginx / 网关上补齐访问控制、速率限制和超时配置
|
||||
- `/health` 仅暴露给负载均衡器和监控系统
|
||||
- 如果前端与 API 同域部署,优先由 Nginx 转发 `/api/` 和 `/health`
|
||||
- 如果需要公网访问,建议至少加一层 Basic Auth、OIDC 或内网入口限制
|
||||
130
docs/CONFIGURATION.md
Normal file
130
docs/CONFIGURATION.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# 配置说明
|
||||
|
||||
本文档描述 `llm-intelligence` 在本地、CI 与生产环境中的关键配置项,以及各脚本的运行语义。
|
||||
|
||||
## 配置原则
|
||||
|
||||
- 生产环境优先使用容器平台、systemd 或 CI/CD 注入环境变量,不要依赖仓库内 `.env`
|
||||
- `.env.example` 只作为示例,不应存放真实密钥
|
||||
- 避免在 `.env.local` 与 `.env` 中重复定义同一变量
|
||||
- 由于不同脚本的加载方式不同,重复定义时优先级并不完全一致
|
||||
- Shell 脚本通常按 `.env.local` 然后 `.env` 顺序 `source`,后者可能覆盖前者
|
||||
- `generate_daily_report.go` 会优先保留已存在环境变量,并优先保留较早注入的值
|
||||
|
||||
生产环境建议:所有关键变量统一在部署系统中注入,仓库内 `.env*` 仅用于开发。
|
||||
|
||||
## 关键环境变量
|
||||
|
||||
| 变量名 | 必填 | 使用方 | 默认值 | 说明 |
|
||||
|--------|------|--------|--------|------|
|
||||
| `DATABASE_URL` | 是 | API Server、迁移、采集、日报、备份恢复、验收脚本 | 无 | PostgreSQL 连接串,缺失时多数核心脚本会直接失败 |
|
||||
| `OPENROUTER_API_KEY` | 条件必填 | `fetch_openrouter.go`、`run_real_pipeline.sh`、`run_daily.sh` | 无 | 真实采集所需;只查看历史数据或仅跑前端时可不配 |
|
||||
| `PORT` | 否 | `cmd/server/main.go` | `8080` | API Server 监听端口 |
|
||||
| `FEISHU_WEBHOOK` | 否 | `run_daily.sh`、`feishu_alert.sh` | 空 | 正式日报失败时发送飞书告警 |
|
||||
| `REPORT_OUTPUT_DIR` | 否 | `generate_daily_report.go` | `reports/daily` | 日报主产物输出目录 |
|
||||
| `REPORT_DATE` | 否 | `generate_daily_report.go`、`rebuild_historical_report.sh` | 当天日期 | 指定日报生成日期,格式 `YYYY-MM-DD` |
|
||||
| `REPORT_RUN_KIND` | 否 | `generate_daily_report.go` | `manual` | 运行语义,如 `scheduled` / `manual` / `historical_rebuild` |
|
||||
| `REPORT_TRIGGER_SOURCE` | 否 | `generate_daily_report.go` | `cli` | 触发来源,如 `cron` / `pipeline` / `rebuild_script` |
|
||||
| `REPORT_IS_OFFICIAL_DAILY` | 否 | `generate_daily_report.go` | `false` | 是否属于正式日报产出 |
|
||||
| `REPORT_RUNTIME_AUDIT` | 否 | `generate_daily_report.go` | 空 | 来源级运行审计摘要,通常由流水线脚本注入 |
|
||||
| `PHASE6_PORT` | 否 | `verify_phase6.sh` | 自动挑选 `18080-18120` | Phase 6 验收时临时启动 API Server 的端口 |
|
||||
| `LIGHTHOUSE_PORT` | 否 | `verify_lighthouse.sh` | `4173` | Lighthouse 预览端口 |
|
||||
| `LIGHTHOUSE_SCORE_THRESHOLD` | 否 | `verify_lighthouse.sh` | `80` | 前端性能分数门槛 |
|
||||
| `LIGHTHOUSE_FCP_THRESHOLD_MS` | 否 | `verify_lighthouse.sh` | `2000` | 首次内容绘制门槛 |
|
||||
| `VERIFY_DB_NAME` | 否 | `verify_common.sh` | `llm_intelligence` | SQL 型验收脚本默认连接的数据库名 |
|
||||
|
||||
## 推荐的生产注入方式
|
||||
|
||||
### API Server
|
||||
|
||||
```bash
|
||||
export DATABASE_URL="postgres://app_user:***@db:5432/llm_intelligence?sslmode=disable"
|
||||
export PORT="8080"
|
||||
./server
|
||||
```
|
||||
|
||||
### 正式日报调度
|
||||
|
||||
```bash
|
||||
export DATABASE_URL="postgres://app_user:***@db:5432/llm_intelligence?sslmode=disable"
|
||||
export OPENROUTER_API_KEY="***"
|
||||
export FEISHU_WEBHOOK="https://open.feishu.cn/..."
|
||||
bash scripts/run_daily.sh
|
||||
```
|
||||
|
||||
### 手工真实复跑
|
||||
|
||||
```bash
|
||||
export DATABASE_URL="postgres://app_user:***@db:5432/llm_intelligence?sslmode=disable"
|
||||
export OPENROUTER_API_KEY="***"
|
||||
bash scripts/run_real_pipeline.sh
|
||||
```
|
||||
|
||||
## 日报运行语义
|
||||
|
||||
项目用以下字段区分正式日报、手工复跑和历史补跑:
|
||||
|
||||
| 字段 | 说明 | 典型值 |
|
||||
|------|------|--------|
|
||||
| `run_kind` | 运行类型 | `scheduled` / `manual` / `historical_rebuild` |
|
||||
| `trigger_source` | 触发来源 | `cron` / `pipeline` / `rebuild_script` / `cli` |
|
||||
| `is_official_daily` | 是否视为最新正式日报 | `true` / `false` |
|
||||
| `summary_md` | 运行摘要与审计 | 包含 `REPORT_RUNTIME_AUDIT` 拼接结果 |
|
||||
|
||||
`/api/v1/reports/latest` 只返回:
|
||||
|
||||
- `status='generated'`
|
||||
- `output_path` 非空
|
||||
- `is_official_daily=true`
|
||||
|
||||
这意味着:
|
||||
|
||||
- 手工复跑不会覆盖“最新正式日报”
|
||||
- 历史补跑不会冒充当天正式结果
|
||||
- 如果正式日报写库成功但落盘产物丢失,元数据查询可成功,文件拉取接口会返回 `404`
|
||||
|
||||
## 产物路径约定
|
||||
|
||||
| 类型 | 路径 |
|
||||
|------|------|
|
||||
| 当天 Markdown | `reports/daily/daily_report_YYYY-MM-DD.md` |
|
||||
| 当天 HTML | `reports/daily/html/daily_report_YYYY-MM-DD.html` |
|
||||
| 归档 Markdown | `reports/daily/YYYY/MM/daily_report_YYYY-MM-DD.md` |
|
||||
| 归档 HTML | `reports/daily/YYYY/MM/daily_report_YYYY-MM-DD.html` |
|
||||
| 每日日志 | `/tmp/llm_hub_daily_YYYY-MM-DD.log` |
|
||||
| 备份目录 | `/tmp/llm_hub_backups` |
|
||||
|
||||
## 最小可运行配置
|
||||
|
||||
### 仅启动 API Server
|
||||
|
||||
```bash
|
||||
DATABASE_URL="host=/var/run/postgresql dbname=llm_intelligence user=long sslmode=disable" \
|
||||
PORT="8080" \
|
||||
go run ./cmd/server
|
||||
```
|
||||
|
||||
### 仅生成指定日期日报
|
||||
|
||||
```bash
|
||||
DATABASE_URL="host=/var/run/postgresql dbname=llm_intelligence user=long sslmode=disable" \
|
||||
REPORT_DATE="2026-05-13" \
|
||||
go run -tags llm_script ./scripts/generate_daily_report.go
|
||||
```
|
||||
|
||||
### 真实采集并写库
|
||||
|
||||
```bash
|
||||
DATABASE_URL="host=/var/run/postgresql dbname=llm_intelligence user=long sslmode=disable" \
|
||||
OPENROUTER_API_KEY="***" \
|
||||
go run ./scripts/fetch_openrouter.go -strict-real -db "$DATABASE_URL" -api-key "$OPENROUTER_API_KEY"
|
||||
```
|
||||
|
||||
## 配置错误的典型症状
|
||||
|
||||
| 症状 | 可能原因 | 排查方向 |
|
||||
|------|----------|----------|
|
||||
| `/health` 返回 `503 database not configured` | `DATABASE_URL` 未注入到 API Server | 检查进程环境变量 |
|
||||
| `run_real_pipeline.sh` 直接退出 | `OPENROUTER_API_KEY` 或 `DATABASE_URL` 缺失 | 检查 `.env` 或部署配置 |
|
||||
| `/api/v1/reports/latest` 返回 `404` | 没有正式日报或 `is_official_daily=false` | 查 `daily_report` 表 |
|
||||
| 最新日报元数据存在,但 `/html` 返回 `404` | `output_path` 对应文件丢失 | 检查 `reports/daily` 与归档目录 |
|
||||
204
docs/PRODUCTION_CHECKLIST.md
Normal file
204
docs/PRODUCTION_CHECKLIST.md
Normal file
@@ -0,0 +1,204 @@
|
||||
# 生产上线检查清单
|
||||
|
||||
本文档面向“准备把当前仓库作为生产服务上线”的场景,聚焦发布前检查、上线步骤、回滚和日常守护要求。
|
||||
|
||||
## 目标
|
||||
|
||||
上线后的最小可用能力应包括:
|
||||
|
||||
- 数据库可连接且已完成全部迁移
|
||||
- API Server 可稳定返回 `/health`、`/api/v1/models`、`/api/v1/reports/latest`
|
||||
- 正式日报可由调度脚本按天产出
|
||||
- 失败时可回退、可告警、可恢复
|
||||
|
||||
## 生产拓扑建议
|
||||
|
||||
建议采用以下最小拓扑:
|
||||
|
||||
1. PostgreSQL 16
|
||||
2. API Server:`cmd/server/main.go` 构建产物
|
||||
3. Nginx:托管 `frontend/dist` 并反向代理 `/api` 与 `/health`
|
||||
4. Cron 或 systemd timer:执行 `scripts/run_daily.sh`
|
||||
|
||||
如果使用容器部署,仓库内 `docker-compose.yml` 可作为单机参考,但正式环境仍建议:
|
||||
|
||||
- 单独管理数据库持久化与备份
|
||||
- 在网关层处理 TLS、限流和访问控制
|
||||
- 将密钥注入部署系统,而不是依赖仓库内 `.env`
|
||||
|
||||
## 发布前硬检查
|
||||
|
||||
### 基础设施
|
||||
|
||||
- PostgreSQL 已创建库并验证可连接
|
||||
- `DATABASE_URL` 在 API Server、调度脚本、备份脚本所在环境都可用
|
||||
- `reports/daily` 及其归档目录所在磁盘有足够空间
|
||||
- `/tmp` 不会被过早清理,避免影响每天的流水线日志追踪
|
||||
|
||||
### 数据与迁移
|
||||
|
||||
- 已执行 `bash scripts/apply_migration.sh`
|
||||
- `daily_report`、`report_runs`、`subscription_plan`、`region_pricing` 等关键表存在
|
||||
- 历史数据回填策略已确认,避免上线首日“空库”
|
||||
|
||||
### 应用与产物
|
||||
|
||||
- `go test ./...` 通过
|
||||
- `bash scripts/test.sh` 通过
|
||||
- `cd frontend && npm run test -- --run` 通过
|
||||
- `cd frontend && npm run build` 通过
|
||||
- `go build ./cmd/server` 通过
|
||||
|
||||
### 调度与日报
|
||||
|
||||
- 正式调度命令已确定:`bash scripts/run_daily.sh`
|
||||
- 手工复跑命令已确定:`bash scripts/run_real_pipeline.sh`
|
||||
- 历史补跑命令已确定:`bash scripts/rebuild_historical_report.sh YYYY-MM-DD`
|
||||
- `OPENROUTER_API_KEY` 已在正式调度环境可用
|
||||
- `FEISHU_WEBHOOK` 已配置或明确不上告警
|
||||
|
||||
### 安全与访问控制
|
||||
|
||||
- 密钥未提交入库
|
||||
- API 暴露路径已放在网关后,不直接裸露到公网
|
||||
- 已补充访问控制、TLS、限流与日志保留策略
|
||||
- `scripts/restore.sh` 属于高风险脚本,使用权限已收敛到少数运维成员
|
||||
|
||||
## 上线门禁命令
|
||||
|
||||
建议按下面顺序执行:
|
||||
|
||||
```bash
|
||||
bash scripts/verify_pre_phase6.sh
|
||||
bash scripts/verify_phase6.sh
|
||||
bash healthcheck.sh
|
||||
```
|
||||
|
||||
其中 `verify_phase6.sh` 会额外检查:
|
||||
|
||||
- 真实采集链路
|
||||
- API Server 构建与健康检查
|
||||
- `/api/v1/models` 响应时间 `< 500ms`
|
||||
- 最近 7 次采集成功率 `>= 95%`
|
||||
- 前端测试入口存在
|
||||
|
||||
## 上线步骤
|
||||
|
||||
### 1. 发布前备份
|
||||
|
||||
```bash
|
||||
bash scripts/backup.sh
|
||||
```
|
||||
|
||||
确认:
|
||||
|
||||
- 备份文件已生成在 `/tmp/llm_hub_backups`
|
||||
- 备份文件大小非零
|
||||
- 如接入 OSS,远端对象已上传成功
|
||||
|
||||
### 2. 执行迁移
|
||||
|
||||
```bash
|
||||
bash scripts/apply_migration.sh
|
||||
```
|
||||
|
||||
### 3. 构建与发布 API Server / 前端
|
||||
|
||||
```bash
|
||||
go build -o bin/server ./cmd/server
|
||||
cd frontend && npm run build
|
||||
```
|
||||
|
||||
### 4. 部署反向代理
|
||||
|
||||
确认 Nginx 已正确代理:
|
||||
|
||||
- `/` -> `frontend/dist`
|
||||
- `/api/` -> `app:8080/api/`
|
||||
- `/health` -> `app:8080/health`
|
||||
|
||||
### 5. 手工真实复跑一次
|
||||
|
||||
```bash
|
||||
bash scripts/run_real_pipeline.sh
|
||||
```
|
||||
|
||||
目的:
|
||||
|
||||
- 验证真实采集、补录、日报生成和写库全链路
|
||||
- 确认不会错误覆盖“最新正式日报”语义
|
||||
|
||||
### 6. 启用正式调度
|
||||
|
||||
```cron
|
||||
0 8 * * * cd /path/to/llm-intelligence && bash scripts/run_daily.sh >> /tmp/llm_hub_cron.log 2>&1
|
||||
```
|
||||
|
||||
### 7. 线上冒烟
|
||||
|
||||
```bash
|
||||
curl -fsS http://127.0.0.1:8080/health
|
||||
curl -fsS http://127.0.0.1:8080/api/v1/models
|
||||
curl -fsS http://127.0.0.1:8080/api/v1/reports/latest
|
||||
```
|
||||
|
||||
## 运行中监控基线
|
||||
|
||||
建议至少监控以下指标:
|
||||
|
||||
| 指标 | 目标 / 告警线 | 说明 |
|
||||
|------|---------------|------|
|
||||
| API 健康检查 | `200` | `/health` 必须稳定可达 |
|
||||
| `/api/v1/models` 响应时间 | `< 500ms` | Phase 6 验收门槛 |
|
||||
| 最近 7 次采集成功率 | `>= 95%` | Phase 6 验收门槛 |
|
||||
| 模型总数 | `< 300` 告警 | 来自现有 RUNBOOK 基线 |
|
||||
| 今日日报是否生成 | 每天 08:00 后应存在 | 检查 `daily_report` 与产物文件 |
|
||||
| 归档是否完整 | Markdown + HTML 均存在 | 检查 `reports/daily/YYYY/MM/` |
|
||||
|
||||
## 回滚方案
|
||||
|
||||
### 何时触发回滚
|
||||
|
||||
- API Server 启动失败或健康检查持续异常
|
||||
- 真实流水线连续失败且无法在发布窗口内修复
|
||||
- 正式日报生成语义错误,导致“最新正式日报”被污染
|
||||
- 迁移导致查询失败或关键表结构异常
|
||||
|
||||
### 回滚步骤
|
||||
|
||||
1. 停止正式调度,避免继续写入错误数据
|
||||
2. 回滚应用版本或镜像
|
||||
3. 如数据已损坏,使用备份恢复:
|
||||
|
||||
```bash
|
||||
bash scripts/restore.sh --force /path/to/backup.sql.gz
|
||||
```
|
||||
|
||||
4. 重新执行迁移到目标版本所需状态
|
||||
5. 启动服务后执行:
|
||||
|
||||
```bash
|
||||
bash healthcheck.sh
|
||||
curl -fsS http://127.0.0.1:8080/health
|
||||
curl -fsS http://127.0.0.1:8080/api/v1/reports/latest
|
||||
```
|
||||
|
||||
## 常见上线遗漏
|
||||
|
||||
- 只启动 API,没有配置正式日报调度
|
||||
- 只写入 `daily_report`,但落盘目录没有写权限
|
||||
- 手工复跑后误以为“正式日报已准备好”,但 `is_official_daily=false`
|
||||
- 把 API 直接暴露到公网,却没有鉴权或限流
|
||||
- 依赖 `.env.local`,但生产机器并不存在该文件
|
||||
- 没有先跑 `backup.sh` 就执行高风险恢复或迁移
|
||||
|
||||
## 建议的发布结论标准
|
||||
|
||||
满足以下条件后,才建议标记为“可生产上线”:
|
||||
|
||||
- `verify_pre_phase6.sh` 通过
|
||||
- `verify_phase6.sh` 通过
|
||||
- 手工真实复跑成功
|
||||
- API / 前端冒烟通过
|
||||
- 正式调度已配置并完成一次演练
|
||||
- 备份与恢复路径已演练至少一次
|
||||
Reference in New Issue
Block a user