diff --git a/PRODUCTION_PHASE1_STATUS.md b/PRODUCTION_PHASE1_STATUS.md index 52d98c0..2443ed0 100644 --- a/PRODUCTION_PHASE1_STATUS.md +++ b/PRODUCTION_PHASE1_STATUS.md @@ -11,11 +11,12 @@ - webhook body schema 校验 - webhook HMAC 签名与时间戳防重放校验 - 消息幂等去重 -- 基于依赖检查的 `/actuator/health`、`/live`、`/ready` +- 基于依赖检查的 `/actuator/health`、`/actuator/health/live`、`/actuator/health/ready` - 转人工工单创建 -- 工单列表 / 分配 / 解决最小闭环 API +- 工单列表 / 分配 / 解决 / 关闭最小闭环 API - 审计日志持久化写入 - PostgreSQL migration 基础表结构 +- 后台接口最小 header 鉴权与角色校验 但距离“生产一期完成”仍有明显缺口,不能作为可灰度上线结论。 @@ -32,8 +33,8 @@ | webhook 签名校验 | 已完成 | `internal/http/handlers/webhook_security.go` | HMAC-SHA256 | | 时间戳防重放 | 已完成 | `internal/http/handlers/webhook_security.go` | 仅做 skew 校验,未持久化 nonce | | 幂等去重 | 已完成 | `internal/store/postgres/dedup_store.go`, `internal/store/memory/dedup_store.go` | 基于 `(channel,message_id)` | -| 速率限制 | 未完成 | 无 | P1 缺口 | -| 渠道级独立 webhook | 未完成 | 当前仅统一 webhook | 与 INTERFACE 文档仍有漂移 | +| 速率限制 | 已完成 | `internal/platform/httpx/limits.go`, `internal/http/router.go` | 当前已挂到 webhook 路由 | +| 渠道级独立 webhook 适配器 | 未完成 | 当前仅具备统一 webhook 与路径覆写 channel | 与最终多渠道适配目标仍有距离 | ### 2.2 工单闭环 @@ -42,9 +43,9 @@ | 转人工自动创建工单 | 已完成 | `internal/service/dialog/service.go` | 退款/敏感意图触发 | | 工单持久化 | 已完成 | `internal/store/postgres/ticket_store.go` | PostgreSQL / memory 均可 | | 工单列表 | 已完成 | `internal/http/handlers/ticket_handler.go` | `GET /tickets` | -| 工单分配 | 已完成 | `internal/http/handlers/ticket_handler.go`, `internal/store/postgres/ticket_workflow.go` | 当前 query 参数驱动 | -| 工单解决 | 已完成 | 同上 | 当前 query 参数驱动 | -| 工单关闭 | 未完成 | 无 | 只有 resolve,没有 close | +| 工单分配 | 已完成 | `internal/http/handlers/ticket_handler.go`, `internal/store/postgres/ticket_workflow.go` | 当前由 header 鉴权 + query 业务参数驱动 | +| 工单解决 | 已完成 | 同上 | 当前由 header 鉴权 + query 业务参数驱动 | +| 工单关闭 | 已完成 | `internal/http/handlers/ticket_handler.go`, `internal/store/postgres/ticket_workflow.go` | 当前由 header 鉴权 + query 业务参数驱动 | | 工单回复用户 | 未完成 | 无 | 尚无人工回消息链路 | | 排队位置查询 | 未完成 | 无 | 文档要求未落地 | @@ -55,9 +56,9 @@ | message processed 审计 | 已完成 | `internal/service/dialog/service.go` | 成功路径会写审计 | | 审计持久化 | 已完成 | `internal/store/postgres/audit_store.go` | 写 `cs_audit_logs` | | fail-closed 审计 | 已完成 | `dialog.Process()` | 审计失败时整体返回错误 | -| 安全拒绝事件审计 | 未完成 | 无 | 签名失败/非法请求未记审计 | -| 工单状态流转审计 | 未完成 | 无 | assign/resolve 未写审计 | -| source_ip / actor / action 分类完备 | 部分完成 | `internal/store/postgres/audit_store.go` | 当前 action 固定为 `update`,source_ip 未写 | +| 安全拒绝事件审计 | 已完成 | `internal/http/handlers/webhook_security.go` | 签名缺失/时间戳异常/签名不匹配会写审计 | +| 工单状态流转审计 | 已完成 | `internal/http/handlers/ticket_handler.go`, `internal/store/postgres/ticket_workflow.go` | assign/resolve/close 已写状态流转审计 | +| source_ip / actor / action 分类完备 | 部分完成 | `internal/http/handlers/ticket_handler.go`, `internal/http/handlers/session_handler.go`, `internal/store/postgres/audit_store.go` | 当前已记录 source_ip/actor,但完整分类体系仍可继续收紧 | ### 2.4 运维与健康检查 @@ -68,15 +69,15 @@ | graceful shutdown | 已完成 | `internal/app/app.go` | | | 结构化日志 | 部分完成 | `internal/platform/logging/logger.go`, `webhook_handler.go` | 仅少量入口日志 | | metrics/tracing | 未完成 | 无 | P1 缺口 | -| 灰度/回滚 runbook | 未完成 | 无 | 文档缺失 | +| 灰度/回滚 runbook | 部分完成 | `docs/RUNBOOK.md`, `prd/GRAY_RELEASE_ROLLBACK_RUNBOOK.md` | 文档已交付,演练与证据化验证待补 | --- ## 3. 当前与文档的主要漂移 -1. `tech/INTERFACE.md` 约定了按渠道 webhook(`/webhook/{channel}`),当前实现仍只有统一入口 `/api/v1/customer-service/webhook`。 -2. 文档要求人工接单/回复/关闭完整后台闭环,当前只做到 list/assign/resolve 最小 API。 -3. 文档要求安全事件审计,当前签名失败、时间戳失败、非法 body 不入审计。 +1. 文档中的最终形态仍包含真实多渠道适配器、LLM、RAG 与运营后台,当前代码尚未覆盖这些范围。 +2. 当前后台接口已加最小 header 鉴权,但完整 RBAC、用户级数据隔离仍未落地。 +3. 当前仍缺人工回复用户链路与排队位置查询。 4. 文档要求更完整的运维可观测(metrics/tracing/SLO),当前尚未实现。 --- @@ -85,18 +86,18 @@ ### P0(继续执行必须优先收口) -1. 工单状态流转审计补齐 -2. 安全拒绝事件审计补齐 -3. 工单 API 与接口文档对齐(至少明确当前最小契约) -4. 工单关闭语义补齐或文档明确 resolve=关闭 +1. 完整 RBAC 与用户级数据隔离补齐 +2. 工单 API 与接口文档继续对齐(尤其是后台鉴权契约) +3. 人工回复用户链路补齐 +4. 灰度与回滚演练证据化 ### P1(生产一期仍必须完成) -1. webhook 速率限制 -2. 人工回复用户链路 -3. 排队位置查询 -4. metrics / tracing / SLO 基础设施 -5. 灰度/回滚 runbook +1. 排队位置查询 +2. metrics / tracing / SLO 基础设施 +3. 灰度/回滚演练 +4. 真实多渠道适配器产品化 +5. 真实 LLM / RAG 能力 --- diff --git a/docs/CONFIG_CONTRACT_BASELINE.md b/docs/CONFIG_CONTRACT_BASELINE.md index 8b9c51d..4f24d73 100644 --- a/docs/CONFIG_CONTRACT_BASELINE.md +++ b/docs/CONFIG_CONTRACT_BASELINE.md @@ -2,18 +2,18 @@ > 来源:`internal/config/config.go` 当前实现 > 用途:作为 PM / QA / DevOps / 部署文档的唯一配置事实来源 -> 状态:当前代码事实基线,不等同于“prod 已自动强制保证” +> 状态:当前代码事实基线;production 下的关键运行约束已经由 `internal/config/config.go` 执行校验 --- ## 0. 重要说明 -当前代码已经实现了基础配置解析与部分校验,但**尚未完全实现生产模式强约束**。 +当前代码已经实现了基础配置解析,并对 production 下的关键约束做了 fail-fast 校验。 这意味着: - 本文档描述的是**当前代码真实读取和校验的配置契约** -- 不代表所有生产要求都已被代码自动 enforce -- 对于 prod fail-fast、readiness 收紧等要求,当前仍属于待整改项 +- production 下缺少关键配置时,`Load()` 会直接返回错误 +- readiness / 依赖可观测仍需结合运行态和部署层继续完善 --- @@ -46,7 +46,7 @@ | 变量名 | 默认值 | 含义 | 当前代码是否校验 | prod 是否应允许默认值 | |---|---|---|---|---| -| `AI_CS_WEBHOOK_SECRET` | 空 | webhook HMAC secret | 当前无必填校验 | **不允许为空** | +| `AI_CS_WEBHOOK_SECRET` | 空 | webhook HMAC secret | production 下必填 | **不允许为空** | | `AI_CS_WEBHOOK_TIMESTAMP_HEADER` | `X-CS-Timestamp` | 时间戳请求头 | 无额外校验 | 可 | | `AI_CS_WEBHOOK_SIGNATURE_HEADER` | `X-CS-Signature` | 签名请求头 | 无额外校验 | 可 | | `AI_CS_WEBHOOK_MAX_SKEW_SECONDS` | `300` | 最大时钟偏差(秒) | 必须 > 0 | 需安全确认 | @@ -61,18 +61,19 @@ 2. `AI_CS_MAX_BODY_BYTES` 必须为正数 3. `AI_CS_POSTGRES_ENABLED=true` 时,`AI_CS_POSTGRES_DSN` 不允许为空 4. `AI_CS_WEBHOOK_MAX_SKEW_SECONDS` 必须为正数 +5. `AI_CS_RUNTIME_ENV` 只允许 `production/development/test` +6. `AI_CS_RUNTIME_ENV=production` 时,`AI_CS_POSTGRES_ENABLED` 必须为 `true` +7. `AI_CS_RUNTIME_ENV=production` 时,`AI_CS_WEBHOOK_SECRET` 不允许为空 --- ## 3. 当前代码尚未自动保证、但生产必须满足的要求 -以下要求目前主要是**生产约束**,而不是代码已强制执行的事实: +以下要求目前仍需部署层和运行态共同保证: -1. **prod 环境必须启用 Postgres** -2. **prod 环境必须禁止 memory fallback** -3. **prod 环境必须要求 webhook secret 完整配置** -4. **readiness 必须反映 DB / migration / 关键配置就绪状态** -5. **migration 目录必须真实可执行,且执行成功才能接流量** +1. **readiness 必须反映 DB / migration / 关键配置就绪状态** +2. **migration 目录必须真实可执行,且执行成功才能接流量** +3. **部署文档和环境模板必须只使用真实变量名** --- diff --git a/docs/PRODUCTION_LAUNCH.md b/docs/PRODUCTION_LAUNCH.md index f066a8a..4f3266f 100644 --- a/docs/PRODUCTION_LAUNCH.md +++ b/docs/PRODUCTION_LAUNCH.md @@ -1,7 +1,7 @@ # AI-Customer-Service 生产上线文档 > 版本:v1.0 | 日期:2026-05-01 -> 状态:✅ 已通过全部上线门禁,可灰度发布 +> 状态:⚠️ 代码级主链已通过验证,但预生产与灰度门禁尚未闭环 > 代码基准:`3e9022a`(`upload/ai-customer-service` 分支) --- @@ -9,13 +9,20 @@ ## 1. 项目概述 **项目名**:ai-customer-service(立交桥智能客服系统) -**一句话**:多渠道接入的 AI 客服系统,自动处理用户初始化、配额/计费异常等常见问题,降低人工介入率 60%+。 +**一句话**:当前交付物是面向生产一期的客服后端最小闭环服务,覆盖 webhook、会话、转人工工单、审计与健康检查。 -**核心能力**: -- 多渠道 Webhook 接收(Telegram/Discord/微信/网页) -- 基于 LLM 的意图识别 + 知识库 RAG -- 自动转人工工单闭环(创建→分配→解决→关闭) +**当前已验证能力**: +- 统一 Webhook 入口与按路径覆写 channel 的入口 +- 基于规则的意图识别与静态 FAQ 回复 +- 自动转人工工单最小闭环(创建→分配→解决→关闭) - 审计日志持久化 +- PostgreSQL 持久化、健康检查、优雅停机 + +**当前未完成但属于后续目标能力**: +- 真实 LLM 意图识别与多供应商 failover +- 真实 RAG 检索与知识库运营 +- 完整多渠道适配器产品化 +- 运营后台 UI 与完整 RBAC --- @@ -55,7 +62,7 @@ store/ | 方法 | 路径 | 说明 | 状态 | |------|------|------|------| | POST | `/api/v1/customer-service/webhook` | 统一 Webhook 入口 | ✅ 已实现 | -| GET | `/api/v1/customer-service/webhook/channels` | 查询已注册渠道 | ✅ 已实现 | +| POST | `/api/v1/customer-service/webhook/{channel}` | 按路径指定 channel 的 Webhook 入口 | ✅ 已实现 | **安全特性**:HMAC-SHA256 签名校验 + 时间戳防重放 + BodyLimit 512KB + 速率限制(滑动窗口 10 req/s/IP) @@ -81,8 +88,8 @@ store/ | 方法 | 路径 | 说明 | 状态 | |------|------|------|------| | GET | `/actuator/health` | 综合健康检查 | ✅ 已实现 | -| GET | `/live` | Liveness 探针 | ✅ 已实现 | -| GET | `/ready` | Readiness 探针(含 DB 依赖检查) | ✅ 已实现 | +| GET | `/actuator/health/live` | Liveness 探针 | ✅ 已实现 | +| GET | `/actuator/health/ready` | Readiness 探针(含 DB 依赖检查) | ✅ 已实现 | | GET | `/tickets/stats` | 工单统计(open/assigned/resolved) | ✅ 已实现 | --- @@ -106,17 +113,15 @@ store/ | internal/platform/health | **100%** | — | ✅ | | **整体覆盖率** | **77.4%** | >70% | ✅ | -### 4.2 上线门禁 +### 4.2 当前门禁结论 -| 阻断条件 | 状态 | 说明 | +| 门禁层级 | 状态 | 说明 | |---------|------|------| -| BC-01 接口路由漂移 | 🟢 解除 | Phase 1 核心端点已全部实现 | -| BC-02 P0 安全测试覆盖 | 🟢 解除 | webhook 签名/重放/幂等/速率限制全通过 | -| BC-03 错误码一致 | 🟢 解除 | CS_TKT_4002 等统一使用 | -| BC-04 会话端点 | 🟢 解除 | feedback + handoff 已实现 | -| BC-05 速率限制 | 🟢 解除 | RateLimiter 已实现并测试 | +| 代码级门禁 | ✅ 通过 | `go test ./...`、`go test -race ./...`、`go build ./...` 通过 | +| 预生产门禁 | ⚠️ 未闭环 | 真实环境 DB/migration/webhook/audit/ticket 入库验证仍需证据化 | +| 灰度门禁 | ❌ 未通过 | 鉴权、最小监控、灰度阈值、回滚演练未闭环 | -**所有 22 个测试包通过,19/19 E2E 通过,go test -race 无竞态。** +**当前解释口径**:仓库内测试通过,只能证明现有实现稳定,不等于“PRD 功能已完成”或“可直接灰度发布”。 ### 4.3 安全审计 @@ -153,30 +158,31 @@ make run # 本地运行(go run) | 变量 | 说明 | 示例 | |------|------|------| -| `POSTGRES_HOST` | PostgreSQL 地址 | `10.0.0.5:5432` | -| `POSTGRES_USER` | 数据库用户 | `ai_cs` | -| `POSTGRES_PASSWORD` | 数据库密码 | — | -| `POSTGRES_DATABASE` | 数据库名 | `ai_customer_service` | -| `WEBHOOK_HMAC_KEY` | HMAC 签名密钥 | — | -| `SERVER_PORT` | HTTP 监听端口 | `8080` | -| `RATE_LIMIT_RPS` | 每秒请求上限 | `10` | -| `LOG_LEVEL` | 日志级别 | `info` | +| `AI_CS_RUNTIME_ENV` | 运行环境 | `production` | +| `AI_CS_ADDR` | HTTP 监听地址 | `:8080` | +| `AI_CS_POSTGRES_ENABLED` | 是否启用 PostgreSQL store | `true` | +| `AI_CS_POSTGRES_DSN` | PostgreSQL 连接串 | `postgres://ai_cs:***@localhost:5432/ai_customer_service?sslmode=disable` | +| `AI_CS_POSTGRES_MIGRATION_DIR` | migration 目录 | `db/migration` | +| `AI_CS_WEBHOOK_SECRET` | Webhook HMAC 密钥 | — | +| `AI_CS_WEBHOOK_TIMESTAMP_HEADER` | 时间戳请求头 | `X-CS-Timestamp` | +| `AI_CS_WEBHOOK_SIGNATURE_HEADER` | 签名请求头 | `X-CS-Signature` | +| `AI_CS_WEBHOOK_MAX_SKEW_SECONDS` | 最大时钟偏差(秒) | `300` | ### 5.3 数据库初始化 ```bash # 执行 migration(项目 db/ 目录) -psql -h $POSTGRES_HOST -U ai_cs -d ai_customer_service -f db/migration/001_init.sql +psql "$AI_CS_POSTGRES_DSN" -f db/migration/0001_init.up.sql ``` ### 5.4 健康检查 ```bash # Readiness(含 DB 依赖检查) -curl http://localhost:8080/ready +curl http://localhost:8080/actuator/health/ready # Liveness -curl http://localhost:8080/live +curl http://localhost:8080/actuator/health/live # 综合健康 curl http://localhost:8080/actuator/health @@ -202,13 +208,13 @@ curl http://localhost:8080/actuator/health | 功能 | 优先级 | 说明 | |------|--------|------| -| 按渠道独立 Webhook(`/webhook/{channel}`) | P1 | 当前为统一入口 | +| 真实多渠道适配器产品化 | P1 | 当前只有统一 webhook 模型与路径覆写 channel | | 人工回复用户链路 | P1 | 只有工单创建,无回复闭环 | | 排队位置查询 | P1 | 无此 API | -| 工单关闭语义补齐 | P1 | resolve=关闭语义待明确 | +| 真实 LLM / RAG | P1 | 当前为规则识别 + 静态 FAQ | | 安全拒绝事件审计(签名失败/非法 body) | P0 | 此类事件暂未写审计 | | metrics / tracing / SLO | P1 | 暂无可观测基础设施 | -| 灰度/回滚 runbook | P1 | 文档缺失 | +| 灰度/回滚 Runbook | P1 | 需完成演练与证据化验证 | --- @@ -216,4 +222,4 @@ curl http://localhost:8080/actuator/health - **项目负责人**:小龙团队(Hermes Review 完成) - **代码基准**:`3e9022a` -- **Phase 2 覆盖率目标**:✅ 已达成(77.4% > 70%) \ No newline at end of file +- **Phase 2 覆盖率目标**:✅ 已达成(77.4% > 70%) diff --git a/docs/REVIEW_REPORT_2026-05-04.md b/docs/REVIEW_REPORT_2026-05-04.md new file mode 100644 index 0000000..e21c782 --- /dev/null +++ b/docs/REVIEW_REPORT_2026-05-04.md @@ -0,0 +1,367 @@ +# AI-Customer-Service 全面 Review 与上线距离评估报告 + +> 审查时间:2026-05-04 +> 审查方式:静态代码审查 + 文档对照 + 本地构建/测试验证 +> 审查范围:`/home/long/project/立交桥/projects/ai-customer-service` + +## 1. 结论摘要 + +当前项目**不是“接近完整的生产客服系统”**,而是一个**质量尚可的生产一期后端原型 / 最小闭环服务**。 + +从“完全完成规划设计和生产上线”这个目标看,当前状态更接近: + +| 维度 | 当前完成度 | 结论 | +|---|---:|---| +| 规划与设计文档 | 75% | 文档数量充足,但存在明显漂移和口径冲突 | +| 核心后端最小闭环实现 | 45% | webhook、session、ticket、audit、health 基本具备 | +| 相对 PRD 的真实功能完成度 | 25% | 缺少 LLM/RAG、真实诊断查询、身份核验、多渠道适配、运营后台 | +| 生产放量准备度 | 20% | 缺少鉴权/RBAC、可观测性、真实联调、灰度回滚闭环 | + +结论可以直接表述为: + +1. **代码级可运行、可测试,但不是 PRD 意义上的“智能客服系统已完成”。** +2. **不具备直接生产上线条件。** +3. **更适合被定义为“Phase 1 后端骨架 + 最小工单闭环”,距离生产上线至少还差 3 个阶段。** + +## 2. 本次实际验证 + +本次实际执行并确认了以下检查: + +```bash +go test ./... +go test -race ./... +go build ./... +``` + +结果: + +- `go test ./...` 通过 +- `go test -race ./...` 通过 +- `go build ./...` 通过 + +这说明当前仓库的**现有实现质量**整体不差,但这些结果只能证明: + +- 当前代码可以编译 +- 当前测试覆盖的行为成立 +- 当前并发路径未被 race 检测发现问题 + +这些结果**不能证明**: + +- PRD 功能已完成 +- 真实依赖已联通 +- 生产链路已验证 +- 灰度和回滚具备可执行性 + +## 3. 关键发现 + +### P0-1 文档将“原型/最小实现”误表述为“可灰度发布” + +`docs/PRODUCTION_LAUNCH.md` 明确写了: + +- “已通过全部上线门禁,可灰度发布” +- “多渠道 Webhook 接收(Telegram/Discord/微信/网页)” +- “基于 LLM 的意图识别 + 知识库 RAG” + +对应证据: + +- `docs/PRODUCTION_LAUNCH.md:4` +- `docs/PRODUCTION_LAUNCH.md:15-18` + +但实际代码实现是: + +- 意图识别为关键词规则,不是 LLM +- 回复来自内存 FAQ,不是 RAG +- 没有 Telegram / Discord / 微信独立适配器实现 + +对应代码: + +- `internal/service/intent/service.go:15-49` +- `internal/store/memory/knowledge_store.go:7-20` +- `internal/http/router.go:29-52` + +影响: + +- 会误导团队把“代码骨架可运行”当成“产品能力可上线” +- 会直接污染 PM、QA、运维对项目状态的判断 + +### P0-2 管理与工单接口无鉴权,不能作为生产后台暴露 + +当前 `tickets` 和 `sessions` 相关接口直接挂在路由上,没有任何认证或权限中间件: + +- `internal/http/router.go:54-123` + +同时,关键操作人信息仅来自 query 参数: + +- `internal/http/handlers/ticket_handler.go:63-65` +- `internal/http/handlers/ticket_handler.go:86-88` +- `internal/http/handlers/ticket_handler.go:109-111` +- `internal/http/handlers/session_handler.go:72-75` +- `internal/http/handlers/session_handler.go:140-143` + +也就是说: + +- 任意调用方只要能访问接口,就可以尝试分配、解决、关闭工单 +- `actor_id` 可以伪造 +- 审计里的操作者身份不可信 + +虽然仓库文档已经承认“权限模型当前未落地”,但这也恰恰说明**生产放量前它仍是阻断项**: + +- `prd/IDENTITY_AND_PERMISSION_STRATEGY.md:71-79` + +### P0-3 当前实现与 PRD 的核心能力差距仍然很大 + +PRD 的 in-scope 能力包含: + +- 多渠道接入 +- 基于大模型的意图识别 +- RAG 检索 +- 知识库管理 +- 诊断查询 +- 运营后台 +- 埋点与监控 + +证据: + +- `prd/PRD.md:44-51` +- `prd/PRD.md:73-85` +- `prd/PRD.md:97-105` + +而当前代码真实提供的是: + +- 一个统一 webhook 入口 +- 基于规则的 intent +- 基于内存 map 的固定回复 +- 工单与审计的最小后端接口 + +对应代码: + +- `internal/http/router.go:29-52` +- `internal/service/intent/service.go:15-49` +- `internal/store/memory/knowledge_store.go:7-20` +- `internal/service/dialog/service.go:69-145` + +这不是“差一点上线”,而是**产品层级仍处于缩 scope 的后端一期**。 + +### P1-1 上下文能力低于设计规格 + +设计文档要求保留最近 5 轮对话,即 10 条消息: + +- `tech/HLD.md:176-179` + +实际代码只保留最近 6 条消息: + +- `internal/service/dialog/service.go:95-98` +- `internal/service/dialog/service.go:129-132` + +影响: + +- 多轮对话理解能力低于设计要求 +- 一旦未来接入真实 LLM,上下文容量会先成为效果瓶颈 + +### P1-2 生产文档中的 API 与真实路由不一致 + +`docs/PRODUCTION_LAUNCH.md` 声称已实现: + +- `GET /api/v1/customer-service/webhook/channels` +- `GET /live` +- `GET /ready` + +证据: + +- `docs/PRODUCTION_LAUNCH.md:57-58` +- `docs/PRODUCTION_LAUNCH.md:83-86` +- `docs/PRODUCTION_LAUNCH.md:176-179` + +但真实路由只有: + +- `/actuator/health` +- `/actuator/health/live` +- `/actuator/health/ready` +- `/api/v1/customer-service/webhook` +- `/api/v1/customer-service/webhook/{channel}` + +对应代码: + +- `internal/http/router.go:25-27` +- `internal/http/router.go:34` +- `internal/http/router.go:52` + +`/webhook/channels` 根本不存在,`/live` 与 `/ready` 也不是实际路径。 + +这说明发布文档本身不可直接用于部署或联调。 + +### P1-3 配置文档与真实配置契约不一致 + +生产文档列出的环境变量是: + +- `POSTGRES_HOST` +- `POSTGRES_USER` +- `POSTGRES_PASSWORD` +- `SERVER_PORT` +- `WEBHOOK_HMAC_KEY` + +证据: + +- `docs/PRODUCTION_LAUNCH.md:154-163` + +但代码真实读取的是: + +- `AI_CS_ADDR` +- `AI_CS_POSTGRES_ENABLED` +- `AI_CS_POSTGRES_DSN` +- `AI_CS_WEBHOOK_SECRET` +- `AI_CS_RUNTIME_ENV` + +对应代码: + +- `internal/config/config.go:47-97` + +影响: + +- 直接按发布文档配置环境,服务不会按预期启动 +- 部署侧会产生“文档正确但服务读不到配置”的高风险误操作 + +## 4. 当前已经做对的部分 + +这部分需要客观肯定,否则会误判为“完全不可用”: + +1. **HTTP 服务骨架清晰** + - `cmd/ai-customer-service/main.go` + - `internal/app/app.go` + +2. **Webhook 安全基础比一般 demo 强** + - HMAC/时间戳/body limit/rate limit/dedup 都已经接到主路径 + - 相关路由见 `internal/http/router.go:29-52` + +3. **健康检查、优雅停机、Postgres 模式切换具备基础能力** + - `internal/http/handlers/health_handler.go` + - `internal/store/postgres/db.go` + - `internal/app/app.go` + +4. **测试现状良好** + - `go test ./...` 通过 + - `go test -race ./...` 通过 + - `go build ./...` 通过 + +所以这个项目的真实评价应当是: + +> **不是“乱写的 demo”,而是“工程质量尚可,但业务完成度和生产 readiness 明显不足的后端一期骨架”。** + +## 5. 与“完整规划设计”之间的具体距离 + +如果目标是“规划设计完全完成”,当前还差的不是“再补几页文档”,而是**文档统一口径和事实对齐**。 + +### 已完成 + +- PRD、HLD、接口、测试、运行、SOP、灰度、合规文档已经有较完整框架 +- 项目内部已经意识到自己是“生产一期未完成” + - `PRODUCTION_EXECUTION_PLAN.md:5-18` + +### 未完成 + +1. **文档单一真相源还没有建立** + - `PRODUCTION_LAUNCH.md` 仍然过度乐观 + - `PRODUCTION_EXECUTION_PLAN.md` 更接近真实状态 + +2. **Phase 1 / Phase 2 / 最终 PRD 的边界没有完全收敛** + - `prd/SCOPE_PHASE1_VS_PHASE2.md` 在降 scope + - `docs/PRODUCTION_LAUNCH.md` 却仍按最终系统表述 + +3. **部署文档、API 文档、配置文档尚未完全和代码对齐** + +我的判断: + +- **规划设计完成度约 75%** +- 距离“设计冻结、文档可直接驱动实施和上线”还差 **25% 左右** + +## 6. 与“生产上线”之间的具体距离 + +### 当前可视为已完成的生产前置能力 + +- 基础 HTTP 服务 +- 基础 webhook 入口 +- 基础工单后端 +- 基础审计 +- 基础 Postgres 支持 +- 基础测试 + +### 距离生产上线仍缺的关键阶段 + +#### 阶段 A:收口事实口径 + +- 清理错误上线表述 +- 统一 Phase 1 / Phase 2 / 最终版边界 +- 让所有文档和真实路由、真实配置、真实依赖一致 + +#### 阶段 B:补齐生产级后台安全 + +- Auth middleware +- RBAC +- 跨用户数据隔离 +- 工单/会话接口权限校验 +- 审计 actor 可信来源 + +#### 阶段 C:补齐真实业务能力 + +- 真实身份核验 +- 只读 quota/token/error logs 查询 +- 真实多渠道适配 +- 真实知识库/RAG +- 真实 LLM/failover +- 人工回复用户闭环 + +#### 阶段 D:补齐生产运维能力 + +- metrics / tracing / SLO +- 告警 +- 灰度开关 +- 回滚 Runbook +- 真实环境联调证据 + +我的判断: + +- **距离“生产可灰度”仍差至少 3 个实质阶段** +- **距离“按 PRD 完整上线”仍差至少 4 个阶段** + +如果用工作量粗估: + +| 目标 | 距离 | +|---|---| +| 代码级稳定后端一期 | 已基本达到 | +| 可进入预生产联调 | 还差 2~4 周,取决于是否只做 Phase 1 | +| 可做小流量灰度 | 还差 4~8 周,取决于鉴权、观测、联调资源 | +| 接近 PRD 完整版上线 | 还差 8~16 周,且前提是追加 LLM/RAG/运营后台/多渠道资源 | + +## 7. 建议的下一步顺序 + +### 第一优先级 + +1. 修正文档口径 +2. 建立单一上线基线文档 +3. 停止使用 `docs/PRODUCTION_LAUNCH.md` 作为上线依据 + +### 第二优先级 + +1. 为 `tickets` / `sessions` 全部接口补鉴权与角色校验 +2. 修复部署文档与真实环境变量不一致 +3. 修复发布文档与真实路由不一致 + +### 第三优先级 + +1. 明确生产一期是否只做“工单后端 + webhook” +2. 如果是,就把 LLM/RAG/运营后台全部降到 Phase 2,且文档同步 +3. 如果不是,就必须补真实 LLM/RAG/诊断查询链路,而不是继续用规则和静态 FAQ + +## 8. 最终判定 + +本项目当前更准确的定位是: + +> **一个通过本地测试验证的、工程质量尚可的客服后端一期原型,而不是接近完整生产上线的 AI 客服系统。** + +正式结论: + +- **全面 review 结果:不建议按“已完成规划设计并可生产上线”口径汇报** +- **真实状态:可继续推进为生产一期后端服务** +- **距离完整规划设计完成:约 25%** +- **距离生产可灰度上线:约 75% 的关键工作仍未闭环** +- **距离 PRD 全量目标上线:约 70%~80% 的业务能力仍未落地** diff --git a/docs/RUNBOOK.md b/docs/RUNBOOK.md index 2e58499..bd162db 100644 --- a/docs/RUNBOOK.md +++ b/docs/RUNBOOK.md @@ -11,7 +11,7 @@ ```bash # 1. 确认环境变量完整 -echo "AI_CS_ENV=$AI_CS_ENV" +echo "AI_CS_RUNTIME_ENV=$AI_CS_RUNTIME_ENV" echo "AI_CS_POSTGRES_ENABLED=$AI_CS_POSTGRES_ENABLED" echo "AI_CS_POSTGRES_DSN=${AI_CS_POSTGRES_DSN:+[SET]}" echo "AI_CS_WEBHOOK_SECRET=${AI_CS_WEBHOOK_SECRET:+[SET]}" @@ -28,7 +28,7 @@ nohup ./ai-customer-service > /var/log/ai-cs.log 2>&1 & sleep 3 # 5. 验证 ready probe -curl -s http://localhost:8080/ready | grep -q '"status":"UP"' || { echo "READY FAILED"; cat /var/log/ai-cs.log; exit 1; } +curl -s http://localhost:8080/actuator/health/ready | grep -q '"status":"UP"' || { echo "READY FAILED"; cat /var/log/ai-cs.log; exit 1; } ``` --- @@ -42,7 +42,7 @@ curl -s http://localhost:8080/ready | grep -q '"status":"UP"' || { echo "READY F | `listen tcp :8080: bind: address already in use` | 8080 端口被占用 | `pkill -f ai-customer-service` 或改 `AI_CS_ADDR=:8081` | | `pq: connection refused` | PostgreSQL 不可达 | 检查 PG 主机/端口/防火墙,确认 `psql` 可连 | | `pq: password authentication failed` | 密码错误 | 核对 `AI_CS_POSTGRES_DSN` 中的密码 | -| 启动成功但 `/ready` 返回 `postgres:DOWN` | PG 连通但 health check 失败 | 检查 PG 是否在 `AI_CS_POSTGRES_DSN` 指定端口响应 | +| 启动成功但 `/actuator/health/ready` 返回 `postgres:DOWN` | PG 连通但 health check 失败 | 检查 PG 是否在 `AI_CS_POSTGRES_DSN` 指定端口响应 | --- @@ -115,7 +115,7 @@ nohup ./ai-customer-service-v1.0.0 > /var/log/ai-cs-v1.0.0.log 2>&1 & sleep 3 # 6. 验证 -curl -s http://localhost:8080/ready +curl -s http://localhost:8080/actuator/health/ready curl -s http://localhost:8080/actuator/health ``` @@ -155,7 +155,7 @@ ps aux | grep "ai-customer-service" | grep -v grep || echo " NOT RUNNING ❌" echo "" echo "[2/5] HTTP endpoints:" -for endpoint in "/live" "/ready" "/actuator/health"; do +for endpoint in "/actuator/health/live" "/actuator/health/ready" "/actuator/health"; do status=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:8080$endpoint) echo " $endpoint → HTTP $status $([ "$status" = "200" ] && echo '✅' || echo '❌')" done @@ -181,4 +181,4 @@ curl -s -X POST http://localhost:8080/api/v1/customer-service/webhook \ echo "" echo "=== Diagnostic complete ===" -``` \ No newline at end of file +``` diff --git a/docs/plans/2026-05-04-gray-launch-readiness-plan.md b/docs/plans/2026-05-04-gray-launch-readiness-plan.md new file mode 100644 index 0000000..9193353 --- /dev/null +++ b/docs/plans/2026-05-04-gray-launch-readiness-plan.md @@ -0,0 +1,499 @@ +# Gray Launch Readiness Implementation Plan + +> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. + +**Goal:** 将 `ai-customer-service` 从“代码级可运行的一期后端骨架”推进到“具备小流量灰度上线条件的生产一期服务”。 + +**Architecture:** 先收口“单一事实源”和部署契约,避免继续用错误文档驱动上线;再补齐后台鉴权、真实联调、可观测、灰度/回滚闭环四个生产阻断面。坚持最小范围推进:不在本轮补完整 LLM/RAG/运营后台,而是把 Phase 1 的真实范围做成可灰度交付物。 + +**Tech Stack:** Go 1.22, net/http, PostgreSQL, HMAC webhook security, Go testing, system/deployment docs + +--- + +## 0. 目标范围定义 + +本计划的“可灰度上线”仅指: + +1. `POST /api/v1/customer-service/webhook` 及 `POST /api/v1/customer-service/webhook/{channel}` 可在真实预生产环境接入。 +2. 工单最小闭环可用:创建、查询、分配、解决、关闭、反馈。 +3. 关键后台接口有基本鉴权和角色校验。 +4. 真实 PostgreSQL、migration、审计、dedup、health、监控、回滚有证据化验证。 +5. 文档、配置契约、代码实现一致。 + +本计划**不包含**: + +1. 真实 LLM / 多供应商 failover。 +2. 真实 RAG 检索和知识库运营后台。 +3. Telegram / Discord / 微信专有适配器的完整产品化实现。 +4. 完整客服运营后台 UI。 + +--- + +### Task 1: 收口上线口径与单一事实源 + +**Files:** +- Modify: `docs/PRODUCTION_LAUNCH.md` +- Modify: `docs/REVIEW_REPORT_2026-05-04.md` +- Modify: `PRODUCTION_PHASE1_STATUS.md` +- Modify: `prd/PRODUCTION_CHECKLIST.md` +- Modify: `docs/P0_P1_P2_RECTIFICATION_EXECUTION_BOARD.md` + +**Step 1: 写文档一致性检查清单** + +在本任务开始前,先列出 5 个必须统一的事实: + +```text +1. 当前范围是 Phase 1 后端最小闭环,不是 PRD 全量范围 +2. 当前未实现真实 LLM/RAG +3. 当前未实现完整运营后台 +4. 当前是否允许灰度,必须以真实环境验证为准 +5. 部署变量必须与 internal/config/config.go 一致 +``` + +**Step 2: 修正过宽表述** + +修改 `docs/PRODUCTION_LAUNCH.md`: +- 删除或降级“已通过全部上线门禁,可灰度发布” +- 将“LLM + RAG + 多渠道能力”改为“目标能力/非当前已交付” +- 保留当前真实已交付:webhook、ticket、audit、health、postgres + +**Step 3: 回写阶段状态文档** + +在 `PRODUCTION_PHASE1_STATUS.md` 和 `prd/PRODUCTION_CHECKLIST.md` 中统一三层结论: +- 代码级门禁 +- 预生产门禁 +- 灰度放量门禁 + +**Step 4: 复核并更新执行板** + +将 `docs/P0_P1_P2_RECTIFICATION_EXECUTION_BOARD.md` 中与“可直接上线”相关的状态更新为基于真实环境证据的状态。 + +**Step 5: 验证文档中不再出现错误口径** + +Run: + +```bash +rg -n "可灰度发布|允许上线|LLM 的意图识别 \\+ 知识库 RAG|多渠道 Webhook 接收" . +``` + +Expected: +- 不再在 `docs/PRODUCTION_LAUNCH.md` 中看到把当前代码误表述为已具备完整能力的语句 + +**Step 6: Commit** + +```bash +git add docs/PRODUCTION_LAUNCH.md docs/REVIEW_REPORT_2026-05-04.md PRODUCTION_PHASE1_STATUS.md prd/PRODUCTION_CHECKLIST.md docs/P0_P1_P2_RECTIFICATION_EXECUTION_BOARD.md +git commit -m "docs(ai-customer-service): align launch status with verified phase-1 scope" +``` + +--- + +### Task 2: 收口部署配置契约 + +**Files:** +- Modify: `docs/PRODUCTION_LAUNCH.md` +- Modify: `docs/RUNBOOK.md` +- Modify: `docs/CONFIG_CONTRACT_BASELINE.md` +- Test: `internal/config/config_test.go` + +**Step 1: 写出真实变量清单** + +以 `internal/config/config.go` 为唯一基线,整理以下变量: + +```text +AI_CS_ADDR +AI_CS_POSTGRES_ENABLED +AI_CS_POSTGRES_DSN +AI_CS_POSTGRES_MIGRATION_DIR +AI_CS_POSTGRES_MAX_OPEN_CONNS +AI_CS_POSTGRES_MAX_IDLE_CONNS +AI_CS_POSTGRES_CONN_MAX_LIFETIME_SEC +AI_CS_WEBHOOK_SECRET +AI_CS_WEBHOOK_TIMESTAMP_HEADER +AI_CS_WEBHOOK_SIGNATURE_HEADER +AI_CS_WEBHOOK_MAX_SKEW_SECONDS +AI_CS_RUNTIME_ENV +``` + +**Step 2: 修正文档中的伪变量** + +将 `POSTGRES_HOST`、`SERVER_PORT`、`WEBHOOK_HMAC_KEY` 等非真实变量全部替换或注明为废弃口径。 + +**Step 3: 为缺省/非法值补测试** + +在 `internal/config/config_test.go` 增加针对以下场景的测试: +- `AI_CS_RUNTIME_ENV=production` 且 `AI_CS_POSTGRES_ENABLED=false` -> fail +- `AI_CS_RUNTIME_ENV=production` 且 `AI_CS_WEBHOOK_SECRET=""` -> fail +- 非 prod 下 memory 模式 -> pass + +**Step 4: 运行测试** + +Run: + +```bash +go test ./internal/config -count=1 +``` + +Expected: +- PASS + +**Step 5: Commit** + +```bash +git add docs/PRODUCTION_LAUNCH.md docs/RUNBOOK.md docs/CONFIG_CONTRACT_BASELINE.md internal/config/config_test.go +git commit -m "docs(config): align deployment contract with runtime config loader" +``` + +--- + +### Task 3: 为后台接口补最小鉴权和角色边界 + +**Files:** +- Modify: `internal/http/router.go` +- Modify: `internal/http/handlers/ticket_handler.go` +- Modify: `internal/http/handlers/session_handler.go` +- Create: `internal/http/middleware/authz.go` +- Create: `internal/http/middleware/authz_test.go` +- Modify: `internal/http/router_test.go` +- Modify: `prd/IDENTITY_AND_PERMISSION_STRATEGY.md` + +**Step 1: 先写失败测试** + +至少覆盖: + +```go +func TestTicketAssign_shouldReject_whenMissingAuthHeader(t *testing.T) {} +func TestTicketResolve_shouldReject_whenRoleNotAllowed(t *testing.T) {} +func TestSessionHandoff_shouldReject_whenActorSpoofedByQueryOnly(t *testing.T) {} +``` + +**Step 2: 运行测试确认失败** + +Run: + +```bash +go test ./internal/http/... -count=1 +``` + +Expected: +- FAIL,提示缺少鉴权中间件或权限校验 + +**Step 3: 写最小实现** + +实现原则: +- 不上完整 OAuth/JWT 平台 +- 先引入最小 header-based 鉴权,供预生产和灰度环境使用 +- 建议从请求头读取: + - `X-CS-Actor-ID` + - `X-CS-Actor-Role` +- 允许角色: + - `agent` + - `supervisor` + - `admin` +- 将 `actor_id` 从 query 参数降为只读兼容,不作为可信来源 + +**Step 4: 权限规则落地** + +最小规则: +- `GET /tickets/{id}`: `agent/supervisor/admin` +- `POST /tickets/{id}/assign`: `supervisor/admin` +- `POST /tickets/{id}/resolve`: `agent/supervisor/admin` +- `POST /tickets/{id}/close`: `supervisor/admin` +- `POST /sessions/{id}/handoff`: `agent/supervisor/admin` +- `POST /sessions/{id}/feedback`: 可匿名或系统,但要记录来源 + +**Step 5: 跑测试** + +Run: + +```bash +go test ./internal/http/... -count=1 +``` + +Expected: +- PASS + +**Step 6: 更新策略文档** + +把 `prd/IDENTITY_AND_PERMISSION_STRATEGY.md` 中“当前未落地”的状态更新为“Phase 1 最小鉴权已落地,完整 RBAC 仍未完成”。 + +**Step 7: Commit** + +```bash +git add internal/http/router.go internal/http/handlers/ticket_handler.go internal/http/handlers/session_handler.go internal/http/middleware/authz.go internal/http/middleware/authz_test.go internal/http/router_test.go prd/IDENTITY_AND_PERMISSION_STRATEGY.md +git commit -m "feat(auth): add minimal auth and role checks for phase-1 admin APIs" +``` + +--- + +### Task 4: 收口工单闭环语义 + +**Files:** +- Modify: `internal/http/handlers/ticket_handler.go` +- Modify: `internal/store/postgres/ticket_workflow.go` +- Modify: `internal/store/memory/ticket_workflow.go` +- Modify: `internal/http/handlers/ticket_handler_test.go` +- Modify: `test/e2e/full_ticket_flow_test.go` +- Modify: `prd/TICKET_OPERATIONS_SOP.md` +- Modify: `tech/INTERFACE.md` + +**Step 1: 补测试,明确 resolve 和 close 的语义** + +覆盖: +- assign 后 resolve 成功 +- resolve 后 close 成功 +- 已 close 工单不可再次 resolve +- 不存在工单返回明确错误 + +**Step 2: 运行测试确认边界失败** + +Run: + +```bash +go test ./internal/http/handlers ./internal/store/... ./test/e2e -count=1 +``` + +Expected: +- FAIL,暴露当前状态机或文档不一致问题 + +**Step 3: 实现最小一致语义** + +建议: +- `resolve` 表示“给出处理结论,但工单仍可后续关闭” +- `close` 表示“最终关闭,不可再变更” + +**Step 4: 对齐接口文档** + +在 `tech/INTERFACE.md` 和 `prd/TICKET_OPERATIONS_SOP.md` 明确: +- 各状态定义 +- 可执行动作 +- 返回错误码 + +**Step 5: 跑测试** + +Run: + +```bash +go test ./internal/http/handlers ./internal/store/... ./test/e2e -count=1 +``` + +Expected: +- PASS + +**Step 6: Commit** + +```bash +git add internal/http/handlers/ticket_handler.go internal/store/postgres/ticket_workflow.go internal/store/memory/ticket_workflow.go internal/http/handlers/ticket_handler_test.go test/e2e/full_ticket_flow_test.go prd/TICKET_OPERATIONS_SOP.md tech/INTERFACE.md +git commit -m "fix(ticket): align resolve and close semantics across stores and docs" +``` + +--- + +### Task 5: 建立真实预生产验证脚本与证据 + +**Files:** +- Create: `scripts/verify_preprod_gate_b.sh` +- Create: `docs/PREPROD_VERIFICATION_RECORD.md` +- Modify: `docs/RUNBOOK.md` +- Modify: `test/QA_GATE_STATUS.md` + +**Step 1: 写预生产 Gate B 检查脚本** + +脚本至少覆盖: +- 环境变量完整性校验 +- 服务启动 +- migration 执行 +- `/actuator/health/live` +- `/actuator/health/ready` +- webhook 有签名请求 +- ticket/audit 入库验证 + +**Step 2: 先用本地/容器化环境跑一遍** + +Run: + +```bash +bash scripts/verify_preprod_gate_b.sh +``` + +Expected: +- 输出每项 PASS/FAIL + +**Step 3: 把验证结果沉淀为记录** + +在 `docs/PREPROD_VERIFICATION_RECORD.md` 中记录: +- 时间 +- 环境 +- commit +- 执行命令 +- 结果截图或关键输出摘要 + +**Step 4: QA 门禁回写** + +更新 `test/QA_GATE_STATUS.md`,将“真实环境门禁未闭环”替换为当前实际结果。 + +**Step 5: Commit** + +```bash +git add scripts/verify_preprod_gate_b.sh docs/PREPROD_VERIFICATION_RECORD.md docs/RUNBOOK.md test/QA_GATE_STATUS.md +git commit -m "test(preprod): add gate-b verification script and evidence record" +``` + +--- + +### Task 6: 建立最小监控与灰度观察面 + +**Files:** +- Modify: `docs/MONITORING_ALERTING.md` +- Modify: `prd/SERVICE_SLA.md` +- Modify: `prd/GRAY_RELEASE_ROLLBACK_RUNBOOK.md` +- Create: `docs/GRAY_DASHBOARD_MINIMUM.md` + +**Step 1: 确认灰度阶段只看最小指标** + +必须包含: + +```text +1. webhook 5xx +2. webhook reject 数 +3. ticket 创建量 +4. handoff 比率 +5. audit 写入失败数 +6. readiness down 次数 +7. postgres 连接异常 +8. 单实例重启次数 +``` + +**Step 2: 为每个指标写告警阈值** + +示例: +- webhook 5xx > 1% 持续 5 分钟 -> 触发回滚评估 +- readiness 连续 3 次 DOWN -> 从灰度池摘流量 + +**Step 3: 写灰度放量节奏** + +建议默认: +- 5% / 30min +- 20% / 2h +- 50% / 半天 +- 100% / 次日 + +每一级都必须有进入和回退条件。 + +**Step 4: 文档回写** + +把以上阈值和动作同步回: +- `docs/MONITORING_ALERTING.md` +- `prd/SERVICE_SLA.md` +- `prd/GRAY_RELEASE_ROLLBACK_RUNBOOK.md` + +**Step 5: Commit** + +```bash +git add docs/MONITORING_ALERTING.md prd/SERVICE_SLA.md prd/GRAY_RELEASE_ROLLBACK_RUNBOOK.md docs/GRAY_DASHBOARD_MINIMUM.md +git commit -m "docs(gray): define minimum metrics, thresholds, and rollout gates" +``` + +--- + +### Task 7: 建立灰度放行清单 + +**Files:** +- Create: `docs/GRAY_LAUNCH_CHECKLIST.md` +- Modify: `docs/P0_P1_P2_RECTIFICATION_EXECUTION_BOARD.md` +- Modify: `docs/REVIEW_REPORT_2026-05-04.md` + +**Step 1: 设计一页式放行清单** + +清单必须包含: +- 代码级门禁 +- 预生产 Gate B +- 鉴权门禁 +- 工单闭环门禁 +- 观测门禁 +- 回滚门禁 + +**Step 2: 用 checkbox 明确阻断条件** + +示例: + +```markdown +- [ ] go test ./... 通过 +- [ ] go test -race ./... 通过 +- [ ] 真实 PostgreSQL migration 成功 +- [ ] 后台接口鉴权已启用 +- [ ] webhook 签名联调通过 +- [ ] ticket/audit 入库可验证 +- [ ] 最小监控告警上线 +- [ ] 回滚脚本/Runbook 演练通过 +``` + +**Step 3: 将执行板状态改为面向灰度** + +执行板中未闭环项按: +- 未开始 +- 进行中 +- 已完成 +- 已阻塞 + +重新标注。 + +**Step 4: Commit** + +```bash +git add docs/GRAY_LAUNCH_CHECKLIST.md docs/P0_P1_P2_RECTIFICATION_EXECUTION_BOARD.md docs/REVIEW_REPORT_2026-05-04.md +git commit -m "docs(release): add gray launch checklist and update execution board" +``` + +--- + +## 里程碑与退出条件 + +### Milestone A:文档和配置真实收口 + +退出条件: +- `docs/PRODUCTION_LAUNCH.md` 不再夸大现状 +- 部署变量文档与 `internal/config/config.go` 一致 + +### Milestone B:后台最小可信 + +退出条件: +- `tickets` / `sessions` 关键接口具备最小鉴权 +- `actor_id` 不再来自不可信 query 参数 + +### Milestone C:预生产可验证 + +退出条件: +- `scripts/verify_preprod_gate_b.sh` 可重复执行 +- 有一份真实 `PREPROD_VERIFICATION_RECORD` + +### Milestone D:可灰度 + +退出条件: +- 灰度指标、阈值、回滚条件清晰 +- `GRAY_LAUNCH_CHECKLIST` 全部打勾 + +--- + +## 推荐执行顺序 + +1. Task 1 +2. Task 2 +3. Task 3 +4. Task 4 +5. Task 5 +6. Task 6 +7. Task 7 + +这个顺序的原因: +- 先收口口径,避免边做边漂 +- 再补接口安全,避免把不可信后台继续往前推 +- 再做联调和灰度准备,保证验证基于可信实现 + +--- + +Plan complete and saved to `docs/plans/2026-05-04-gray-launch-readiness-plan.md`. Two execution options: + +**1. Subagent-Driven (this session)** - 我按任务逐项执行、每项做验证和回写,适合现在直接推进 + +**2. Parallel Session (separate)** - 在独立会话按计划批量执行,适合长周期整改 diff --git a/internal/http/handlers/auth_test.go b/internal/http/handlers/auth_test.go new file mode 100644 index 0000000..5f66683 --- /dev/null +++ b/internal/http/handlers/auth_test.go @@ -0,0 +1,11 @@ +package handlers + +import ( + "net/http" + + "github.com/bridge/ai-customer-service/internal/http/middleware" +) + +func withActor(req *http.Request, actorID, role string) *http.Request { + return req.WithContext(middleware.WithActor(req.Context(), actorID, role)) +} diff --git a/internal/http/handlers/session_handler.go b/internal/http/handlers/session_handler.go index 3de0ba2..7ba546d 100644 --- a/internal/http/handlers/session_handler.go +++ b/internal/http/handlers/session_handler.go @@ -12,6 +12,7 @@ import ( "github.com/bridge/ai-customer-service/internal/domain/error/cserrors" "github.com/bridge/ai-customer-service/internal/domain/session" "github.com/bridge/ai-customer-service/internal/domain/ticket" + "github.com/bridge/ai-customer-service/internal/http/middleware" ) type SessionGetter interface { @@ -35,8 +36,8 @@ func NewSessionHandler(sessions SessionGetter, tickets TicketCreator, audits Aud return &SessionHandler{ sessions: sessions, tickets: tickets, - audits: audits, - now: time.Now, + audits: audits, + now: time.Now, } } @@ -69,9 +70,9 @@ func (h *SessionHandler) Feedback(w http.ResponseWriter, r *http.Request) { return } - actorID := strings.TrimSpace(r.URL.Query().Get("actor_id")) - if actorID == "" { - actorID = "system" + actorID := "system" + if actor, ok := middleware.ActorFromContext(r.Context()); ok { + actorID = actor.ID } sourceIP := clientIP(r.RemoteAddr) now := h.now() @@ -137,10 +138,12 @@ func (h *SessionHandler) Handoff(w http.ResponseWriter, r *http.Request) { priority = ticket.PriorityP2 } - actorID := strings.TrimSpace(r.URL.Query().Get("actor_id")) - if actorID == "" { - actorID = "system" + actor, ok := middleware.ActorFromContext(r.Context()) + if !ok { + writeJSON(w, http.StatusForbidden, map[string]any{"error": map[string]any{"code": cserrors.CS_AUTH_4001, "message": cserrors.ErrorMsg(cserrors.CS_AUTH_4001)}}) + return } + actorID := actor.ID sourceIP := clientIP(r.RemoteAddr) now := h.now() @@ -154,11 +157,11 @@ func (h *SessionHandler) Handoff(w http.ResponseWriter, r *http.Request) { Status: ticket.StatusOpen, HandoffReason: req.Reason, ContextSnapshot: map[string]any{ - "channel": sess.Channel, - "open_id": sess.OpenID, - "manual": true, - "actor_id": actorID, - "source": "customer_service_api", + "channel": sess.Channel, + "open_id": sess.OpenID, + "manual": true, + "actor_id": actorID, + "source": "customer_service_api", }, CreatedAt: now, UpdatedAt: now, diff --git a/internal/http/handlers/session_handler_test.go b/internal/http/handlers/session_handler_test.go index edd7270..3a9c9af 100644 --- a/internal/http/handlers/session_handler_test.go +++ b/internal/http/handlers/session_handler_test.go @@ -206,11 +206,11 @@ func TestFeedback_EmptySessionID(t *testing.T) { func TestHandoff_CreatesTicketAndAudit(t *testing.T) { sessions := newMockSessionGetter() sessions.AddSession(&session.Session{ - ID: "sess-hw-1", - Channel: "feishu", - OpenID: "open-123", - UserID: "user-456", - Status: session.StatusProcessing, + ID: "sess-hw-1", + Channel: "feishu", + OpenID: "open-123", + UserID: "user-456", + Status: session.StatusProcessing, TurnCount: 3, }) tickets := newMockTicketCreator() @@ -221,7 +221,8 @@ func TestHandoff_CreatesTicketAndAudit(t *testing.T) { h.now = func() time.Time { return now } body := `{"reason":"customer requested human","priority":"P1"}` - req := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/sessions/sess-hw-1/handoff?actor_id=admin-1", strings.NewReader(body)) + req := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/sessions/sess-hw-1/handoff", strings.NewReader(body)) + req = withActor(req, "admin-1", "admin") req.Header.Set("Content-Type", "application/json") req.RemoteAddr = "10.0.0.1:12345" resp := httptest.NewRecorder() @@ -293,6 +294,7 @@ func TestHandoff_DefaultPriorityP2(t *testing.T) { body := `{"reason":"need help"}` req := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/sessions/sess-p2/handoff", strings.NewReader(body)) + req = withActor(req, "agent-1", "agent") req.Header.Set("Content-Type", "application/json") resp := httptest.NewRecorder() @@ -317,6 +319,7 @@ func TestHandoff_SessionNotFound(t *testing.T) { body := `{"reason":"urgent"}` req := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/sessions/nonexistent/handoff", strings.NewReader(body)) + req = withActor(req, "agent-1", "agent") req.Header.Set("Content-Type", "application/json") resp := httptest.NewRecorder() @@ -336,6 +339,7 @@ func TestHandoff_ReasonRequired(t *testing.T) { // empty reason req := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/sessions/sess-r1/handoff", strings.NewReader(`{"reason":""}`)) + req = withActor(req, "agent-1", "agent") req.Header.Set("Content-Type", "application/json") resp := httptest.NewRecorder() h.Handoff(resp, req) @@ -345,6 +349,7 @@ func TestHandoff_ReasonRequired(t *testing.T) { // missing reason field req = httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/sessions/sess-r1/handoff", strings.NewReader(`{}`)) + req = withActor(req, "agent-1", "agent") req.Header.Set("Content-Type", "application/json") resp = httptest.NewRecorder() h.Handoff(resp, req) @@ -360,6 +365,7 @@ func TestHandoff_InvalidJSON(t *testing.T) { h := NewSessionHandler(sessions, tickets, audits) req := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/sessions/sess-1/handoff", strings.NewReader(`{bad json}`)) + req = withActor(req, "agent-1", "agent") req.Header.Set("Content-Type", "application/json") resp := httptest.NewRecorder() h.Handoff(resp, req) @@ -379,6 +385,7 @@ func TestHandoff_TicketCreateFailure(t *testing.T) { body := `{"reason":"fail"}` req := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/sessions/sess-err/handoff", strings.NewReader(body)) + req = withActor(req, "agent-1", "agent") req.Header.Set("Content-Type", "application/json") resp := httptest.NewRecorder() @@ -389,6 +396,23 @@ func TestHandoff_TicketCreateFailure(t *testing.T) { } } +func TestHandoff_RejectsWhenActorOnlyProvidedByQuery(t *testing.T) { + sessions := newMockSessionGetter() + sessions.AddSession(&session.Session{ID: "sess-query", Channel: "feishu", OpenID: "open-1", Status: session.StatusProcessing}) + tickets := newMockTicketCreator() + audits := newMockAuditRecorder() + h := NewSessionHandler(sessions, tickets, audits) + + req := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/sessions/sess-query/handoff?actor_id=forged-admin", strings.NewReader(`{"reason":"need help"}`)) + req.Header.Set("Content-Type", "application/json") + resp := httptest.NewRecorder() + h.Handoff(resp, req) + + if resp.Code != http.StatusForbidden { + t.Fatalf("status = %d, want 403", resp.Code) + } +} + type failingTicketCreator struct{} func (f *failingTicketCreator) Create(_ context.Context, _ *ticket.Ticket) error { diff --git a/internal/http/handlers/ticket_handler.go b/internal/http/handlers/ticket_handler.go index 2a19fd2..ebc4284 100644 --- a/internal/http/handlers/ticket_handler.go +++ b/internal/http/handlers/ticket_handler.go @@ -9,6 +9,7 @@ import ( "github.com/bridge/ai-customer-service/internal/domain/audit" "github.com/bridge/ai-customer-service/internal/domain/error/cserrors" "github.com/bridge/ai-customer-service/internal/domain/ticket" + "github.com/bridge/ai-customer-service/internal/http/middleware" ) type TicketService interface { @@ -60,7 +61,12 @@ func (h *TicketHandler) Assign(w http.ResponseWriter, r *http.Request) { writeJSON(w, http.StatusBadRequest, map[string]any{"error": map[string]any{"code": cserrors.CS_REQ_4005, "message": cserrors.ErrorMsg(cserrors.CS_REQ_4005)}}) return } - actorID := strings.TrimSpace(r.URL.Query().Get("actor_id")) + actor, ok := middleware.ActorFromContext(r.Context()) + if !ok { + writeJSON(w, http.StatusForbidden, map[string]any{"error": map[string]any{"code": cserrors.CS_AUTH_4001, "message": cserrors.ErrorMsg(cserrors.CS_AUTH_4001)}}) + return + } + actorID := actor.ID sourceIP := clientIP(r.RemoteAddr) if err := h.service.Assign(r.Context(), ticketID, agentID, actorID, sourceIP, h.now()); err != nil { // P0-2 fix: route error based on error code prefix from service layer @@ -83,7 +89,12 @@ func (h *TicketHandler) Resolve(w http.ResponseWriter, r *http.Request) { writeJSON(w, http.StatusBadRequest, map[string]any{"error": map[string]any{"code": cserrors.CS_REQ_4006, "message": cserrors.ErrorMsg(cserrors.CS_REQ_4006)}}) return } - actorID := strings.TrimSpace(r.URL.Query().Get("actor_id")) + actor, ok := middleware.ActorFromContext(r.Context()) + if !ok { + writeJSON(w, http.StatusForbidden, map[string]any{"error": map[string]any{"code": cserrors.CS_AUTH_4001, "message": cserrors.ErrorMsg(cserrors.CS_AUTH_4001)}}) + return + } + actorID := actor.ID sourceIP := clientIP(r.RemoteAddr) if err := h.service.Resolve(r.Context(), ticketID, resolution, actorID, sourceIP, h.now()); err != nil { // P0-2 fix: route error based on error code prefix from service layer @@ -106,7 +117,12 @@ func (h *TicketHandler) Close(w http.ResponseWriter, r *http.Request) { writeJSON(w, http.StatusBadRequest, map[string]any{"error": map[string]any{"code": cserrors.CS_REQ_4007, "message": cserrors.ErrorMsg(cserrors.CS_REQ_4007)}}) return } - actorID := strings.TrimSpace(r.URL.Query().Get("actor_id")) + actor, ok := middleware.ActorFromContext(r.Context()) + if !ok { + writeJSON(w, http.StatusForbidden, map[string]any{"error": map[string]any{"code": cserrors.CS_AUTH_4001, "message": cserrors.ErrorMsg(cserrors.CS_AUTH_4001)}}) + return + } + actorID := actor.ID sourceIP := clientIP(r.RemoteAddr) if err := h.service.Close(r.Context(), ticketID, resolution, actorID, sourceIP, h.now()); err != nil { // P0-2 fix: route error based on error code prefix from service layer @@ -136,4 +152,4 @@ func pathParam(path, prefix, suffix string) string { trimmed = strings.TrimSuffix(trimmed, suffix) trimmed = strings.Trim(trimmed, "/") return trimmed -} \ No newline at end of file +} diff --git a/internal/http/handlers/ticket_handler_test.go b/internal/http/handlers/ticket_handler_test.go index a513b15..c9b42f7 100644 --- a/internal/http/handlers/ticket_handler_test.go +++ b/internal/http/handlers/ticket_handler_test.go @@ -43,10 +43,10 @@ func (r *ticketAuditRecorder) eventsOfType(action string) []audit.Event { // mockTicketService implements TicketService for testing, // mirroring TicketWorkflowStore behavior (calls store + writes audit). type mockTicketService struct { - mu sync.Mutex - tickets *memory.TicketStore - auditRecorder *ticketAuditRecorder - calls []struct { + mu sync.Mutex + tickets *memory.TicketStore + auditRecorder *ticketAuditRecorder + calls []struct { method string args []string } @@ -66,20 +66,23 @@ func (m *mockTicketService) GetByID(ctx context.Context, id string) (*ticket.Tic func (m *mockTicketService) Assign(ctx context.Context, ticketID, agentID, actorID, sourceIP string, now time.Time) error { m.mu.Lock() - m.calls = append(m.calls, struct{ method string; args []string }{method: "Assign", args: []string{ticketID, agentID, actorID, sourceIP}}) + m.calls = append(m.calls, struct { + method string + args []string + }{method: "Assign", args: []string{ticketID, agentID, actorID, sourceIP}}) m.mu.Unlock() if err := m.tickets.Assign(ctx, ticketID, agentID, actorID, sourceIP, now); err != nil { return err } evt := audit.Event{ - ID: fmt.Sprintf("wf-%d", now.UnixNano()), - Type: "ticket_state_changed", - Action: "assign", - TicketID: ticketID, - ActorID: actorID, - SourceIP: sourceIP, + ID: fmt.Sprintf("wf-%d", now.UnixNano()), + Type: "ticket_state_changed", + Action: "assign", + TicketID: ticketID, + ActorID: actorID, + SourceIP: sourceIP, AfterState: map[string]any{"assigned_to": agentID, "status": ticket.StatusAssigned}, - CreatedAt: now, + CreatedAt: now, } m.auditRecorder.Add(ctx, evt) return nil @@ -87,20 +90,23 @@ func (m *mockTicketService) Assign(ctx context.Context, ticketID, agentID, actor func (m *mockTicketService) Resolve(ctx context.Context, ticketID, resolution, actorID, sourceIP string, now time.Time) error { m.mu.Lock() - m.calls = append(m.calls, struct{ method string; args []string }{method: "Resolve", args: []string{ticketID, resolution, actorID, sourceIP}}) + m.calls = append(m.calls, struct { + method string + args []string + }{method: "Resolve", args: []string{ticketID, resolution, actorID, sourceIP}}) m.mu.Unlock() if err := m.tickets.Resolve(ctx, ticketID, resolution, actorID, sourceIP, now); err != nil { return err } evt := audit.Event{ - ID: fmt.Sprintf("wf-%d", now.UnixNano()), - Type: "ticket_state_changed", - Action: "resolve", - TicketID: ticketID, - ActorID: actorID, - SourceIP: sourceIP, + ID: fmt.Sprintf("wf-%d", now.UnixNano()), + Type: "ticket_state_changed", + Action: "resolve", + TicketID: ticketID, + ActorID: actorID, + SourceIP: sourceIP, AfterState: map[string]any{"resolution": resolution, "status": ticket.StatusResolved}, - CreatedAt: now, + CreatedAt: now, } m.auditRecorder.Add(ctx, evt) return nil @@ -108,20 +114,23 @@ func (m *mockTicketService) Resolve(ctx context.Context, ticketID, resolution, a func (m *mockTicketService) Close(ctx context.Context, ticketID, resolution, actorID, sourceIP string, now time.Time) error { m.mu.Lock() - m.calls = append(m.calls, struct{ method string; args []string }{method: "Close", args: []string{ticketID, resolution, actorID, sourceIP}}) + m.calls = append(m.calls, struct { + method string + args []string + }{method: "Close", args: []string{ticketID, resolution, actorID, sourceIP}}) m.mu.Unlock() if err := m.tickets.Close(ctx, ticketID, resolution, actorID, sourceIP, now); err != nil { return err } evt := audit.Event{ - ID: fmt.Sprintf("wf-%d", now.UnixNano()), - Type: "ticket_state_changed", - Action: "close", - TicketID: ticketID, - ActorID: actorID, - SourceIP: sourceIP, + ID: fmt.Sprintf("wf-%d", now.UnixNano()), + Type: "ticket_state_changed", + Action: "close", + TicketID: ticketID, + ActorID: actorID, + SourceIP: sourceIP, AfterState: map[string]any{"resolution": resolution, "status": ticket.StatusClosed}, - CreatedAt: now, + CreatedAt: now, } m.auditRecorder.Add(ctx, evt) return nil @@ -154,7 +163,8 @@ func TestTicketHandlerAssignAuditsStateChange(t *testing.T) { h := NewTicketHandler(svc, auditRecorder) h.now = func() time.Time { return now.Add(time.Minute) } - req := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/tickets/ticket-1/assign?agent_id=agent-007&actor_id=admin-1", nil) + req := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/tickets/ticket-1/assign?agent_id=agent-007", nil) + req = withActor(req, "admin-1", "admin") resp := httptest.NewRecorder() h.Assign(resp, req) @@ -202,7 +212,8 @@ func TestTicketHandlerResolveAuditsStateChange(t *testing.T) { h := NewTicketHandler(svc, auditRecorder) h.now = func() time.Time { return now.Add(2 * time.Minute) } - req := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/tickets/ticket-2/resolve?resolution=handled&actor_id=admin-2", nil) + req := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/tickets/ticket-2/resolve?resolution=handled", nil) + req = withActor(req, "admin-2", "admin") resp := httptest.NewRecorder() h.Resolve(resp, req) @@ -271,7 +282,8 @@ func TestTicketHandlerAssignPassesActorAndSourceIP(t *testing.T) { h := NewTicketHandler(svc, auditRecorder) h.now = func() time.Time { return now } - req := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/tickets/ticket-3/assign?agent_id=agent-x&actor_id=supervisor-1", nil) + req := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/tickets/ticket-3/assign?agent_id=agent-x", nil) + req = withActor(req, "supervisor-1", "supervisor") req.RemoteAddr = "192.168.1.100:12345" resp := httptest.NewRecorder() h.Assign(resp, req) @@ -309,7 +321,8 @@ func TestTicketHandlerClosePassesActorAndSourceIP(t *testing.T) { h := NewTicketHandler(svc, auditRecorder) h.now = func() time.Time { return now } - req := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/tickets/ticket-4/close?resolution=closed+by+agent&actor_id=admin-1", nil) + req := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/tickets/ticket-4/close?resolution=closed+by+agent", nil) + req = withActor(req, "admin-1", "admin") req.RemoteAddr = "10.0.0.1:54321" resp := httptest.NewRecorder() h.Close(resp, req) @@ -411,3 +424,29 @@ func TestTicketHandlerGetByID_Success(t *testing.T) { t.Fatalf("context_snapshot is nil, want non-nil") } } + +func TestTicketHandlerAssign_RejectsWhenActorOnlyProvidedByQuery(t *testing.T) { + auditRecorder := &ticketAuditRecorder{} + svc := newMockTicketService(auditRecorder) + now := time.Date(2026, 4, 29, 21, 0, 0, 0, time.UTC) + if err := svc.tickets.Create(context.Background(), &ticket.Ticket{ + ID: "ticket-auth-1", + SessionID: "session-auth-1", + Priority: ticket.PriorityP1, + Status: ticket.StatusOpen, + HandoffReason: "refund", + CreatedAt: now, + UpdatedAt: now, + }); err != nil { + t.Fatalf("Create() error = %v", err) + } + h := NewTicketHandler(svc, auditRecorder) + + req := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/tickets/ticket-auth-1/assign?agent_id=agent-007&actor_id=forged-admin", nil) + resp := httptest.NewRecorder() + h.Assign(resp, req) + + if resp.Code != http.StatusForbidden { + t.Fatalf("status = %d, want 403", resp.Code) + } +} diff --git a/internal/http/middleware/authz.go b/internal/http/middleware/authz.go new file mode 100644 index 0000000..075facb --- /dev/null +++ b/internal/http/middleware/authz.go @@ -0,0 +1,77 @@ +package middleware + +import ( + "context" + "encoding/json" + "net/http" + "strings" + + "github.com/bridge/ai-customer-service/internal/domain/error/cserrors" +) + +const ( + HeaderActorID = "X-CS-Actor-ID" + HeaderActorRole = "X-CS-Actor-Role" +) + +type Actor struct { + ID string + Role string +} + +type actorContextKey struct{} + +func WithActor(ctx context.Context, id, role string) context.Context { + return context.WithValue(ctx, actorContextKey{}, Actor{ + ID: strings.TrimSpace(id), + Role: normalizeRole(role), + }) +} + +func ActorFromContext(ctx context.Context) (Actor, bool) { + actor, ok := ctx.Value(actorContextKey{}).(Actor) + if !ok { + return Actor{}, false + } + if strings.TrimSpace(actor.ID) == "" || strings.TrimSpace(actor.Role) == "" { + return Actor{}, false + } + return actor, true +} + +func RequireRoles(next http.Handler, allowedRoles ...string) http.Handler { + allowed := make(map[string]struct{}, len(allowedRoles)) + for _, role := range allowedRoles { + if normalized := normalizeRole(role); normalized != "" { + allowed[normalized] = struct{}{} + } + } + return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + actorID := strings.TrimSpace(r.Header.Get(HeaderActorID)) + role := normalizeRole(r.Header.Get(HeaderActorRole)) + if actorID == "" || role == "" { + writeAccessDenied(w) + return + } + if _, ok := allowed[role]; !ok { + writeAccessDenied(w) + return + } + next.ServeHTTP(w, r.WithContext(WithActor(r.Context(), actorID, role))) + }) +} + +func normalizeRole(role string) string { + return strings.ToLower(strings.TrimSpace(role)) +} + +func writeAccessDenied(w http.ResponseWriter) { + w.Header().Set("Content-Type", "application/json") + w.WriteHeader(http.StatusForbidden) + _ = json.NewEncoder(w).Encode(map[string]any{ + "error": map[string]any{ + "code": cserrors.CS_AUTH_4001, + "message": cserrors.ErrorMsg(cserrors.CS_AUTH_4001), + }, + }) +} diff --git a/internal/http/middleware/authz_test.go b/internal/http/middleware/authz_test.go new file mode 100644 index 0000000..64cdc55 --- /dev/null +++ b/internal/http/middleware/authz_test.go @@ -0,0 +1,73 @@ +package middleware + +import ( + "net/http" + "net/http/httptest" + "testing" +) + +func TestRequireRoles_RejectsWhenHeadersMissing(t *testing.T) { + called := false + handler := RequireRoles(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + called = true + w.WriteHeader(http.StatusOK) + }), "admin") + + req := httptest.NewRequest(http.MethodPost, "/admin", nil) + resp := httptest.NewRecorder() + handler.ServeHTTP(resp, req) + + if called { + t.Fatal("expected wrapped handler not to be called") + } + if resp.Code != http.StatusForbidden { + t.Fatalf("status = %d, want 403", resp.Code) + } +} + +func TestRequireRoles_RejectsWhenRoleNotAllowed(t *testing.T) { + called := false + handler := RequireRoles(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + called = true + w.WriteHeader(http.StatusOK) + }), "admin", "supervisor") + + req := httptest.NewRequest(http.MethodPost, "/admin", nil) + req.Header.Set(HeaderActorID, "agent-1") + req.Header.Set(HeaderActorRole, "agent") + resp := httptest.NewRecorder() + handler.ServeHTTP(resp, req) + + if called { + t.Fatal("expected wrapped handler not to be called") + } + if resp.Code != http.StatusForbidden { + t.Fatalf("status = %d, want 403", resp.Code) + } +} + +func TestRequireRoles_AllowsAndInjectsActor(t *testing.T) { + handler := RequireRoles(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + actor, ok := ActorFromContext(r.Context()) + if !ok { + t.Fatal("expected actor in context") + } + if actor.ID != "admin-1" { + t.Fatalf("actor id = %s, want admin-1", actor.ID) + } + if actor.Role != "admin" { + t.Fatalf("actor role = %s, want admin", actor.Role) + } + w.WriteHeader(http.StatusOK) + }), "admin") + + req := httptest.NewRequest(http.MethodPost, "/admin", nil) + req.Header.Set(HeaderActorID, "admin-1") + req.Header.Set(HeaderActorRole, "ADMIN") + resp := httptest.NewRecorder() + handler.ServeHTTP(resp, req) + + if resp.Code != http.StatusOK { + t.Fatalf("status = %d, want 200", resp.Code) + } +} diff --git a/internal/http/router.go b/internal/http/router.go index a342281..5d96e9b 100644 --- a/internal/http/router.go +++ b/internal/http/router.go @@ -6,6 +6,7 @@ import ( "github.com/bridge/ai-customer-service/internal/domain/error/cserrors" "github.com/bridge/ai-customer-service/internal/http/handlers" + "github.com/bridge/ai-customer-service/internal/http/middleware" "github.com/bridge/ai-customer-service/internal/platform/httpx" ) @@ -57,18 +58,18 @@ func NewRouter(deps RouterDeps) http.Handler { writeMethodNotAllowed(w) return } - deps.Tickets.List(w, r) + middleware.RequireRoles(http.HandlerFunc(deps.Tickets.List), "agent", "supervisor", "admin").ServeHTTP(w, r) }) mux.HandleFunc("/api/v1/customer-service/tickets/", func(w http.ResponseWriter, r *http.Request) { if r.Method == http.MethodGet && r.URL.Path == "/api/v1/customer-service/tickets/stats" { if deps.TicketStats != nil { - deps.TicketStats.Get(w, r) + middleware.RequireRoles(http.HandlerFunc(deps.TicketStats.Get), "supervisor", "admin").ServeHTTP(w, r) return } } // P1-3: GET /api/v1/customer-service/tickets/{id} — Phase 1 minimum implementation if r.Method == http.MethodGet { - deps.Tickets.Get(w, r) + middleware.RequireRoles(http.HandlerFunc(deps.Tickets.Get), "agent", "supervisor", "admin").ServeHTTP(w, r) return } if strings.HasSuffix(r.URL.Path, "/assign") { @@ -76,7 +77,7 @@ func NewRouter(deps RouterDeps) http.Handler { writeMethodNotAllowed(w) return } - deps.Tickets.Assign(w, r) + middleware.RequireRoles(http.HandlerFunc(deps.Tickets.Assign), "supervisor", "admin").ServeHTTP(w, r) return } if strings.HasSuffix(r.URL.Path, "/resolve") { @@ -84,7 +85,7 @@ func NewRouter(deps RouterDeps) http.Handler { writeMethodNotAllowed(w) return } - deps.Tickets.Resolve(w, r) + middleware.RequireRoles(http.HandlerFunc(deps.Tickets.Resolve), "agent", "supervisor", "admin").ServeHTTP(w, r) return } if strings.HasSuffix(r.URL.Path, "/close") { @@ -92,7 +93,7 @@ func NewRouter(deps RouterDeps) http.Handler { writeMethodNotAllowed(w) return } - deps.Tickets.Close(w, r) + middleware.RequireRoles(http.HandlerFunc(deps.Tickets.Close), "supervisor", "admin").ServeHTTP(w, r) return } writeMethodNotAllowed(w) @@ -115,7 +116,7 @@ func NewRouter(deps RouterDeps) http.Handler { writeMethodNotAllowed(w) return } - deps.Sessions.Handoff(w, r) + middleware.RequireRoles(http.HandlerFunc(deps.Sessions.Handoff), "agent", "supervisor", "admin").ServeHTTP(w, r) return } writeMethodNotAllowed(w) diff --git a/internal/http/router_test.go b/internal/http/router_test.go index f62e1b2..a0bd210 100644 --- a/internal/http/router_test.go +++ b/internal/http/router_test.go @@ -6,6 +6,7 @@ import ( "testing" "github.com/bridge/ai-customer-service/internal/http/handlers" + "github.com/bridge/ai-customer-service/internal/http/middleware" "github.com/bridge/ai-customer-service/internal/platform/health" ) @@ -210,3 +211,50 @@ func TestRouter_UnknownTicketsPath_Returns405(t *testing.T) { t.Errorf("POST /tickets/t1/unknown = %d, want 405", rr.Code) } } + +func TestRouter_TicketAssign_RejectsWhenAuthHeadersMissing(t *testing.T) { + probe := health.NewProbe() + probe.SetReady(true) + h := handlers.NewHealthHandler(probe) + ticketHandler := &handlers.TicketHandler{} + router := NewRouter(RouterDeps{Health: h, Tickets: ticketHandler}) + + req := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/tickets/t1/assign?agent_id=a1", nil) + rr := httptest.NewRecorder() + router.ServeHTTP(rr, req) + if rr.Code != http.StatusForbidden { + t.Fatalf("POST /tickets/t1/assign without auth = %d, want 403", rr.Code) + } +} + +func TestRouter_TicketAssign_RejectsWhenRoleNotAllowed(t *testing.T) { + probe := health.NewProbe() + probe.SetReady(true) + h := handlers.NewHealthHandler(probe) + ticketHandler := &handlers.TicketHandler{} + router := NewRouter(RouterDeps{Health: h, Tickets: ticketHandler}) + + req := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/tickets/t1/assign?agent_id=a1", nil) + req.Header.Set(middleware.HeaderActorID, "agent-1") + req.Header.Set(middleware.HeaderActorRole, "agent") + rr := httptest.NewRecorder() + router.ServeHTTP(rr, req) + if rr.Code != http.StatusForbidden { + t.Fatalf("POST /tickets/t1/assign with agent role = %d, want 403", rr.Code) + } +} + +func TestRouter_SessionHandoff_RejectsWhenAuthHeadersMissing(t *testing.T) { + probe := health.NewProbe() + probe.SetReady(true) + h := handlers.NewHealthHandler(probe) + sessionHandler := &handlers.SessionHandler{} + router := NewRouter(RouterDeps{Health: h, Sessions: sessionHandler}) + + req := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/sessions/s1/handoff", nil) + rr := httptest.NewRecorder() + router.ServeHTTP(rr, req) + if rr.Code != http.StatusForbidden { + t.Fatalf("POST /sessions/s1/handoff without auth = %d, want 403", rr.Code) + } +} diff --git a/internal/store/postgres/ticket_workflow.go b/internal/store/postgres/ticket_workflow.go index 43371d3..c2b02e3 100644 --- a/internal/store/postgres/ticket_workflow.go +++ b/internal/store/postgres/ticket_workflow.go @@ -9,6 +9,7 @@ import ( "time" "github.com/bridge/ai-customer-service/internal/domain/audit" + "github.com/google/uuid" "github.com/bridge/ai-customer-service/internal/domain/ticket" ) @@ -36,7 +37,7 @@ func (s *TicketWorkflowStore) writeAudit(ctx context.Context, ticketID, action, } now := time.Now() event := audit.Event{ - ID: fmt.Sprintf("wf-%d", now.UnixNano()), + ID: uuid.New().String(), Type: "ticket_state_changed", Action: action, TicketID: ticketID, diff --git a/prd/IDENTITY_AND_PERMISSION_STRATEGY.md b/prd/IDENTITY_AND_PERMISSION_STRATEGY.md index f315ea5..7a6b19a 100644 --- a/prd/IDENTITY_AND_PERMISSION_STRATEGY.md +++ b/prd/IDENTITY_AND_PERMISSION_STRATEGY.md @@ -68,7 +68,7 @@ | 查看运营大盘 | ❌ | ✅ | ✅ | | 敏感操作(退款) | ❌ | ✅ | ✅ | -> **注**:权限模型**当前未落地**(无 RBAC 实现),所有接口均为平权访问。Phase 4 运营后台需补充完整权限校验。 +> **注**:Phase 1 已落地最小 header-based 鉴权与角色校验(`X-CS-Actor-ID` / `X-CS-Actor-Role`),用于保护 ticket/session 后台接口;完整 RBAC、用户级数据隔离与统一身份体系仍未落地,仍需在后续阶段补齐。 ### 2.3 跨用户数据隔离 diff --git a/prd/PRODUCTION_CHECKLIST.md b/prd/PRODUCTION_CHECKLIST.md index cfb64b9..7a5842d 100644 --- a/prd/PRODUCTION_CHECKLIST.md +++ b/prd/PRODUCTION_CHECKLIST.md @@ -22,7 +22,7 @@ - 预生产门禁:未通过 - 生产放行门禁:未通过 -因此:**当前仅可进入预生产整改与联调准备,不可按“生产已具备上线条件”放行。** +因此:**当前仅可进入预生产整改与联调准备,不可按“生产已具备上线条件”或“可灰度发布”口径放行。** --- diff --git a/test/e2e/full_ticket_flow_test.go b/test/e2e/full_ticket_flow_test.go index 48f6379..871ded6 100644 --- a/test/e2e/full_ticket_flow_test.go +++ b/test/e2e/full_ticket_flow_test.go @@ -11,6 +11,7 @@ import ( "github.com/bridge/ai-customer-service/internal/app" "github.com/bridge/ai-customer-service/internal/config" + "github.com/bridge/ai-customer-service/internal/http/middleware" "github.com/bridge/ai-customer-service/internal/platform/logging" ) @@ -59,11 +60,16 @@ func mustReadBody(t *testing.T, resp *http.Response, dest any) { } } +func setActorHeaders(req *http.Request, actorID, role string) { + req.Header.Set(middleware.HeaderActorID, actorID) + req.Header.Set(middleware.HeaderActorRole, role) +} + // TestFullTicketFlow_E2E exercises the complete ticket lifecycle: -// 1. Webhook triggers handoff → ticket created -// 2. Ticket is assigned to an agent -// 3. Ticket is resolved by the agent -// 4. Ticket is retrieved and verified in final resolved state +// 1. Webhook triggers handoff → ticket created +// 2. Ticket is assigned to an agent +// 3. Ticket is resolved by the agent +// 4. Ticket is retrieved and verified in final resolved state func TestFullTicketFlow_E2E(t *testing.T) { application := newTestAppE2E(t) server := httptest.NewServer(application.Server.Handler) @@ -98,11 +104,12 @@ func TestFullTicketFlow_E2E(t *testing.T) { ticketID := whResult.TicketID // ── Step 2: Assign the ticket to an agent ──────────────────────────── - assignURL := fmt.Sprintf("%s/api/v1/customer-service/tickets/%s/assign?agent_id=agent-e2e-001&actor_id=admin-e2e", baseURL, ticketID) + assignURL := fmt.Sprintf("%s/api/v1/customer-service/tickets/%s/assign?agent_id=agent-e2e-001", baseURL, ticketID) assignReq, err := http.NewRequest(http.MethodPost, assignURL, nil) if err != nil { t.Fatalf("new assign request error = %v", err) } + setActorHeaders(assignReq, "admin-e2e", "admin") assignReq.RemoteAddr = "192.168.1.1:12345" assignResp, err := http.DefaultClient.Do(assignReq) if err != nil { @@ -126,11 +133,12 @@ func TestFullTicketFlow_E2E(t *testing.T) { } // ── Step 3: Resolve the ticket ──────────────────────────────────────── - resolveURL := fmt.Sprintf("%s/api/v1/customer-service/tickets/%s/resolve?resolution=refund+processed+and+closed&actor_id=agent-e2e-001", baseURL, ticketID) + resolveURL := fmt.Sprintf("%s/api/v1/customer-service/tickets/%s/resolve?resolution=refund+processed+and+closed", baseURL, ticketID) resolveReq, err := http.NewRequest(http.MethodPost, resolveURL, nil) if err != nil { t.Fatalf("new resolve request error = %v", err) } + setActorHeaders(resolveReq, "agent-e2e-001", "agent") resolveReq.RemoteAddr = "192.168.1.2:54321" resolveResp, err := http.DefaultClient.Do(resolveReq) if err != nil { @@ -155,7 +163,12 @@ func TestFullTicketFlow_E2E(t *testing.T) { // ── Step 4: Verify ticket is retrievable in final resolved state ────── getURL := fmt.Sprintf("%s/api/v1/customer-service/tickets/%s", baseURL, ticketID) - getResp, err := http.Get(getURL) + getReq, err := http.NewRequest(http.MethodGet, getURL, nil) + if err != nil { + t.Fatalf("new get request error = %v", err) + } + setActorHeaders(getReq, "agent-e2e-001", "agent") + getResp, err := http.DefaultClient.Do(getReq) if err != nil { t.Fatalf("GET ticket error = %v", err) } @@ -215,8 +228,9 @@ func TestFullTicketFlow_AuditLogVerification(t *testing.T) { ticketID := whResult.TicketID // ── Step 2: Assign ticket ──────────────────────────────────────────── - assignURL := fmt.Sprintf("%s/api/v1/customer-service/tickets/%s/assign?agent_id=agent-audit-99&actor_id=supervisor-audit", baseURL, ticketID) + assignURL := fmt.Sprintf("%s/api/v1/customer-service/tickets/%s/assign?agent_id=agent-audit-99", baseURL, ticketID) assignReq, _ := http.NewRequest(http.MethodPost, assignURL, nil) + setActorHeaders(assignReq, "supervisor-audit", "supervisor") assignReq.RemoteAddr = "10.0.0.1:11111" assignResp, _ := http.DefaultClient.Do(assignReq) if assignResp.StatusCode != http.StatusOK { @@ -226,8 +240,9 @@ func TestFullTicketFlow_AuditLogVerification(t *testing.T) { assignResp.Body.Close() // ── Step 3: Resolve ticket ─────────────────────────────────────────── - resolveURL := fmt.Sprintf("%s/api/v1/customer-service/tickets/%s/resolve?resolution=account+secured&actor_id=agent-audit-99", baseURL, ticketID) + resolveURL := fmt.Sprintf("%s/api/v1/customer-service/tickets/%s/resolve?resolution=account+secured", baseURL, ticketID) resolveReq, _ := http.NewRequest(http.MethodPost, resolveURL, nil) + setActorHeaders(resolveReq, "agent-audit-99", "agent") resolveReq.RemoteAddr = "10.0.0.2:22222" resolveResp, _ := http.DefaultClient.Do(resolveReq) if resolveResp.StatusCode != http.StatusOK { @@ -238,7 +253,12 @@ func TestFullTicketFlow_AuditLogVerification(t *testing.T) { // ── Step 4: Verify final ticket state (audit writes were persisted) ── getURL := fmt.Sprintf("%s/api/v1/customer-service/tickets/%s", baseURL, ticketID) - getResp, err := http.Get(getURL) + getReq, err := http.NewRequest(http.MethodGet, getURL, nil) + if err != nil { + t.Fatalf("new get request error = %v", err) + } + setActorHeaders(getReq, "agent-audit-99", "agent") + getResp, err := http.DefaultClient.Do(getReq) if err != nil { t.Fatalf("GET ticket error = %v", err) } @@ -300,7 +320,12 @@ func TestFullTicketFlow_ListEndpoint_ShowsCreatedTicket(t *testing.T) { ticketID := whResult.TicketID // Verify ticket appears in GET /tickets list - listResp, err := http.Get(baseURL + "/api/v1/customer-service/tickets") + listReq, err := http.NewRequest(http.MethodGet, baseURL+"/api/v1/customer-service/tickets", nil) + if err != nil { + t.Fatalf("new tickets list request error = %v", err) + } + setActorHeaders(listReq, "supervisor-list", "supervisor") + listResp, err := http.DefaultClient.Do(listReq) if err != nil { t.Fatalf("GET tickets list error = %v", err) } @@ -388,7 +413,12 @@ func TestFullTicketFlow_MultipleTickets_MaintainedSeparately(t *testing.T) { // Assign only the first ticket if i == 0 { assignURL := fmt.Sprintf("%s/api/v1/customer-service/tickets/%s/assign?agent_id=agent-only-first", baseURL, ticketID) - assignResp, err := http.Post(assignURL, "application/octet-stream", nil) + assignReq, err := http.NewRequest(http.MethodPost, assignURL, nil) + if err != nil { + t.Fatalf("new assign request error = %v", err) + } + setActorHeaders(assignReq, "supervisor-first", "supervisor") + assignResp, err := http.DefaultClient.Do(assignReq) if err != nil { t.Fatalf("assign POST error = %v", err) } @@ -401,7 +431,12 @@ func TestFullTicketFlow_MultipleTickets_MaintainedSeparately(t *testing.T) { // Check state getURL := fmt.Sprintf("%s/api/v1/customer-service/tickets/%s", baseURL, ticketID) - getResp, err := http.Get(getURL) + getReq, err := http.NewRequest(http.MethodGet, getURL, nil) + if err != nil { + t.Fatalf("new get request error = %v", err) + } + setActorHeaders(getReq, "agent-check", "agent") + getResp, err := http.DefaultClient.Do(getReq) if err != nil { t.Fatalf("GET ticket error = %v", err) } @@ -469,7 +504,12 @@ func TestFullTicketFlow_WebhookAuditEvent(t *testing.T) { // Verify ticket is in open state getURL := fmt.Sprintf("%s/api/v1/customer-service/tickets/%s", baseURL, whResult.TicketID) - getResp, err := http.Get(getURL) + getReq, err := http.NewRequest(http.MethodGet, getURL, nil) + if err != nil { + t.Fatalf("new get request error = %v", err) + } + setActorHeaders(getReq, "agent-audit-read", "agent") + getResp, err := http.DefaultClient.Do(getReq) if err != nil { t.Fatalf("GET ticket error = %v", err) } @@ -529,7 +569,12 @@ func TestFullTicketFlow_StateTransitionAuditOrder(t *testing.T) { // Assign (audit event: assign) assignURL := fmt.Sprintf("%s/api/v1/customer-service/tickets/%s/assign?agent_id=agent-order-1", baseURL, ticketID) - assignResp, err := http.Post(assignURL, "application/octet-stream", nil) + assignReq, err := http.NewRequest(http.MethodPost, assignURL, nil) + if err != nil { + t.Fatalf("new assign request error = %v", err) + } + setActorHeaders(assignReq, "supervisor-order", "supervisor") + assignResp, err := http.DefaultClient.Do(assignReq) if err != nil { t.Fatalf("assign POST error = %v", err) } @@ -541,7 +586,12 @@ func TestFullTicketFlow_StateTransitionAuditOrder(t *testing.T) { // Resolve (audit event: resolve) resolveURL := fmt.Sprintf("%s/api/v1/customer-service/tickets/%s/resolve?resolution=handled", baseURL, ticketID) - resolveResp, err := http.Post(resolveURL, "application/octet-stream", nil) + resolveReq, err := http.NewRequest(http.MethodPost, resolveURL, nil) + if err != nil { + t.Fatalf("new resolve request error = %v", err) + } + setActorHeaders(resolveReq, "agent-order-1", "agent") + resolveResp, err := http.DefaultClient.Do(resolveReq) if err != nil { t.Fatalf("resolve POST error = %v", err) } @@ -553,7 +603,12 @@ func TestFullTicketFlow_StateTransitionAuditOrder(t *testing.T) { // Final state check: proves all audit writes succeeded in order getURL := fmt.Sprintf("%s/api/v1/customer-service/tickets/%s", baseURL, ticketID) - getResp, err := http.Get(getURL) + getReq, err := http.NewRequest(http.MethodGet, getURL, nil) + if err != nil { + t.Fatalf("new get request error = %v", err) + } + setActorHeaders(getReq, "agent-order-1", "agent") + getResp, err := http.DefaultClient.Do(getReq) if err != nil { t.Fatalf("GET ticket (final) error = %v", err) } diff --git a/test/integration/auth_test.go b/test/integration/auth_test.go new file mode 100644 index 0000000..b64b5a7 --- /dev/null +++ b/test/integration/auth_test.go @@ -0,0 +1,16 @@ +package integration + +import ( + "net/http" + + "github.com/bridge/ai-customer-service/internal/http/middleware" +) + +func withActor(req *http.Request, actorID, role string) *http.Request { + return req.WithContext(middleware.WithActor(req.Context(), actorID, role)) +} + +func setActorHeaders(req *http.Request, actorID, role string) { + req.Header.Set(middleware.HeaderActorID, actorID) + req.Header.Set(middleware.HeaderActorRole, role) +} diff --git a/test/integration/session_handler_test.go b/test/integration/session_handler_test.go index 3de5ca5..5e332c6 100644 --- a/test/integration/session_handler_test.go +++ b/test/integration/session_handler_test.go @@ -69,7 +69,10 @@ func newMockSessionService(audits *sessionAuditRecorder) *mockSessionService { func (m *mockSessionService) GetSession(ctx context.Context, id string) (*session.Session, error) { m.mu.Lock() - m.calls = append(m.calls, struct{ method string; args []string }{method: "GetSession", args: []string{id}}) + m.calls = append(m.calls, struct { + method string + args []string + }{method: "GetSession", args: []string{id}}) m.mu.Unlock() sessions := m.sessions.List() for _, s := range sessions { @@ -82,14 +85,20 @@ func (m *mockSessionService) GetSession(ctx context.Context, id string) (*sessio func (m *mockSessionService) UpdateSession(ctx context.Context, sess *session.Session) error { m.mu.Lock() - m.calls = append(m.calls, struct{ method string; args []string }{method: "UpdateSession", args: []string{sess.ID}}) + m.calls = append(m.calls, struct { + method string + args []string + }{method: "UpdateSession", args: []string{sess.ID}}) m.mu.Unlock() return m.sessions.Save(ctx, sess) } func (m *mockSessionService) CreateTicket(ctx context.Context, t *ticket.Ticket) error { m.mu.Lock() - m.calls = append(m.calls, struct{ method string; args []string }{method: "CreateTicket", args: []string{t.ID, string(t.Priority), t.SessionID}}) + m.calls = append(m.calls, struct { + method string + args []string + }{method: "CreateTicket", args: []string{t.ID, string(t.Priority), t.SessionID}}) m.mu.Unlock() return m.tickets.Create(ctx, t) } @@ -159,12 +168,12 @@ func (h *SessionHandler) Feedback(w http.ResponseWriter, r *http.Request) { // Record feedback audit event now := h.now() _ = h.audit.Add(r.Context(), audit.Event{ - ID: fmt.Sprintf("fb-%d", now.UnixNano()), - Type: "session_feedback", - Action: "feedback", + ID: fmt.Sprintf("fb-%d", now.UnixNano()), + Type: "session_feedback", + Action: "feedback", SessionID: sessionID, - ActorID: sess.OpenID, - Payload: map[string]any{"score": reqBody.Score, "note": reqBody.Note}, + ActorID: sess.OpenID, + Payload: map[string]any{"score": reqBody.Score, "note": reqBody.Note}, CreatedAt: now, }) writeJSON(w, http.StatusOK, map[string]any{"received": true}) @@ -199,7 +208,7 @@ func (h *SessionHandler) Handoff(w http.ResponseWriter, r *http.Request) { HandoffReason: reqBody.Reason, ContextSnapshot: map[string]any{ "channel": sess.Channel, - "open_id": sess.OpenID, + "open_id": sess.OpenID, }, CreatedAt: now, UpdatedAt: now, @@ -213,13 +222,13 @@ func (h *SessionHandler) Handoff(w http.ResponseWriter, r *http.Request) { _ = h.service.UpdateSession(r.Context(), sess) _ = h.audit.Add(r.Context(), audit.Event{ - ID: fmt.Sprintf("ho-%d", now.UnixNano()), - Type: "session_handoff", - Action: "handoff", + ID: fmt.Sprintf("ho-%d", now.UnixNano()), + Type: "session_handoff", + Action: "handoff", SessionID: sessionID, - TicketID: ticketID, - ActorID: sess.OpenID, - Payload: map[string]any{"reason": reqBody.Reason}, + TicketID: ticketID, + ActorID: sess.OpenID, + Payload: map[string]any{"reason": reqBody.Reason}, CreatedAt: now, }) writeJSON(w, http.StatusOK, map[string]any{"handoff": true, "ticket_id": ticketID}) @@ -374,6 +383,7 @@ func TestSessionHandlerHandoff_Success(t *testing.T) { bodyBytes, _ := json.Marshal(body) req := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/sessions/widget:u_handoff_ok/handoff", bytes.NewReader(bodyBytes)) req.Header.Set("Content-Type", "application/json") + req = withActor(req, "agent-handoff", "agent") resp := httptest.NewRecorder() h.Handoff(resp, req) @@ -409,6 +419,7 @@ func TestSessionHandlerHandoff_SessionNotFound(t *testing.T) { bodyBytes, _ := json.Marshal(body) req := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/sessions/nonexistent-session/handoff", bytes.NewReader(bodyBytes)) req.Header.Set("Content-Type", "application/json") + req = withActor(req, "agent-missing", "agent") resp := httptest.NewRecorder() h.Handoff(resp, req) @@ -442,6 +453,7 @@ func TestSessionHandlerHandoff_CreatesTicket(t *testing.T) { bodyBytes, _ := json.Marshal(body) req := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/sessions/telegram:u_ticket_create/handoff", bytes.NewReader(bodyBytes)) req.Header.Set("Content-Type", "application/json") + req = withActor(req, "agent-ticket-create", "agent") resp := httptest.NewRecorder() h.Handoff(resp, req) diff --git a/test/integration/ticket_assign_resolve_test.go b/test/integration/ticket_assign_resolve_test.go index b619354..8860c0a 100644 --- a/test/integration/ticket_assign_resolve_test.go +++ b/test/integration/ticket_assign_resolve_test.go @@ -151,7 +151,8 @@ func TestAssign_UpdatesStatusToAssigned(t *testing.T) { h := handlers.NewTicketHandler(svc, auditRecorder) - req := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/tickets/assign-tkt-1/assign?agent_id=agent-001&actor_id=supervisor-1", nil) + req := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/tickets/assign-tkt-1/assign?agent_id=agent-001", nil) + req = withActor(req, "supervisor-1", "supervisor") req.RemoteAddr = "10.0.0.5:12345" resp := httptest.NewRecorder() h.Assign(resp, req) @@ -200,6 +201,7 @@ func TestAssign_CannotReassignAlreadyAssigned(t *testing.T) { h := handlers.NewTicketHandler(svc, auditRecorder) req := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/tickets/assign-tkt-2/assign?agent_id=agent-second", nil) + req = withActor(req, "supervisor-2", "supervisor") resp := httptest.NewRecorder() h.Assign(resp, req) @@ -257,7 +259,8 @@ func TestResolve_UpdatesStatusToResolved(t *testing.T) { h := handlers.NewTicketHandler(svc, auditRecorder) - req := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/tickets/resolve-tkt-1/resolve?resolution=issue+fixed&actor_id=agent-001", nil) + req := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/tickets/resolve-tkt-1/resolve?resolution=issue+fixed", nil) + req = withActor(req, "agent-001", "agent") req.RemoteAddr = "10.0.0.6:54321" resp := httptest.NewRecorder() h.Resolve(resp, req) @@ -309,6 +312,7 @@ func TestResolve_CannotResolveClosedTicket(t *testing.T) { h := handlers.NewTicketHandler(svc, auditRecorder) req := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/tickets/resolve-tkt-closed/resolve?resolution=already+closed", nil) + req = withActor(req, "agent-001", "agent") resp := httptest.NewRecorder() h.Resolve(resp, req) @@ -339,6 +343,7 @@ func TestResolve_TicketNotFound(t *testing.T) { h := handlers.NewTicketHandler(svc, auditRecorder) req := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/tickets/nonexistent/resolve?resolution=not+found", nil) + req = withActor(req, "agent-404", "agent") resp := httptest.NewRecorder() h.Resolve(resp, req) @@ -373,7 +378,8 @@ func TestStateTransition_OpenToAssignedToResolved(t *testing.T) { h := handlers.NewTicketHandler(svc, auditRecorder) // Step 1: Assign - assignReq := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/tickets/state-tkt-1/assign?agent_id=agent-alpha&actor_id=admin-1", nil) + assignReq := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/tickets/state-tkt-1/assign?agent_id=agent-alpha", nil) + assignReq = withActor(assignReq, "admin-1", "admin") assignResp := httptest.NewRecorder() h.Assign(assignResp, assignReq) if assignResp.Code != http.StatusOK { @@ -389,7 +395,8 @@ func TestStateTransition_OpenToAssignedToResolved(t *testing.T) { } // Step 2: Resolve - resolveReq := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/tickets/state-tkt-1/resolve?resolution=refund+processed&actor_id=agent-alpha", nil) + resolveReq := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/tickets/state-tkt-1/resolve?resolution=refund+processed", nil) + resolveReq = withActor(resolveReq, "agent-alpha", "agent") resolveResp := httptest.NewRecorder() h.Resolve(resolveResp, resolveReq) if resolveResp.Code != http.StatusOK { @@ -430,6 +437,7 @@ func TestStateTransition_InvalidTransition(t *testing.T) { // Try to resolve an open ticket directly (should fail — must be assigned first) resolveReq := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/tickets/state-tkt-2/resolve?resolution=skip+assign", nil) + resolveReq = withActor(resolveReq, "agent-skip", "agent") resolveResp := httptest.NewRecorder() h.Resolve(resolveResp, resolveReq) if resolveResp.Code != http.StatusConflict { diff --git a/test/integration/ticket_handler_integration_test.go b/test/integration/ticket_handler_integration_test.go index 828a68c..e463584 100644 --- a/test/integration/ticket_handler_integration_test.go +++ b/test/integration/ticket_handler_integration_test.go @@ -68,14 +68,14 @@ func (m *mockTicketSvcForHandler) Assign(ctx context.Context, ticketID, agentID, return err } m.audit.Add(ctx, audit.Event{ - ID: "audit-assign-1", - Type: "ticket_state_changed", - Action: "assign", - TicketID: ticketID, - ActorID: actorID, - SourceIP: sourceIP, + ID: "audit-assign-1", + Type: "ticket_state_changed", + Action: "assign", + TicketID: ticketID, + ActorID: actorID, + SourceIP: sourceIP, AfterState: map[string]any{"assigned_to": agentID, "status": ticket.StatusAssigned}, - CreatedAt: now, + CreatedAt: now, }) return nil } @@ -85,14 +85,14 @@ func (m *mockTicketSvcForHandler) Resolve(ctx context.Context, ticketID, resolut return err } m.audit.Add(ctx, audit.Event{ - ID: "audit-resolve-1", - Type: "ticket_state_changed", - Action: "resolve", - TicketID: ticketID, - ActorID: actorID, - SourceIP: sourceIP, + ID: "audit-resolve-1", + Type: "ticket_state_changed", + Action: "resolve", + TicketID: ticketID, + ActorID: actorID, + SourceIP: sourceIP, AfterState: map[string]any{"resolution": resolution, "status": ticket.StatusResolved}, - CreatedAt: now, + CreatedAt: now, }) return nil } @@ -102,14 +102,14 @@ func (m *mockTicketSvcForHandler) Close(ctx context.Context, ticketID, resolutio return err } m.audit.Add(ctx, audit.Event{ - ID: "audit-close-1", - Type: "ticket_state_changed", - Action: "close", - TicketID: ticketID, - ActorID: actorID, - SourceIP: sourceIP, + ID: "audit-close-1", + Type: "ticket_state_changed", + Action: "close", + TicketID: ticketID, + ActorID: actorID, + SourceIP: sourceIP, AfterState: map[string]any{"resolution": resolution, "status": ticket.StatusClosed}, - CreatedAt: now, + CreatedAt: now, }) return nil } @@ -163,6 +163,7 @@ func TestTicketCreateAndList_CreateThenFind(t *testing.T) { handoffBodyBytes, _ := json.Marshal(handoffBody) sessionReq := httptest.NewRequest(http.MethodPost, "/api/v1/customer-service/sessions/widget:u_list_test/handoff", bytes.NewReader(handoffBodyBytes)) sessionReq.Header.Set("Content-Type", "application/json") + sessionReq = withActor(sessionReq, "agent-list", "agent") sessionResp := httptest.NewRecorder() sessionHdlr.Handoff(sessionResp, sessionReq) @@ -292,7 +293,12 @@ func TestTicketList_PaginationParams(t *testing.T) { for _, tc := range tests { t.Run(tc.name, func(t *testing.T) { - resp, err := http.Get(server.URL + tc.query) + req, err := http.NewRequest(http.MethodGet, server.URL+tc.query, nil) + if err != nil { + t.Fatalf("new GET request error = %v", err) + } + setActorHeaders(req, "supervisor-page", "supervisor") + resp, err := http.DefaultClient.Do(req) if err != nil { t.Fatalf("GET error = %v", err) }