1203 lines
36 KiB
Markdown
1203 lines
36 KiB
Markdown
# 详细技术架构设计
|
||
|
||
> 版本:v1.0
|
||
> 日期:2026-03-18
|
||
> 依据:backend skill 最佳实践
|
||
> 状态:历史草稿(已被 `technical_architecture_optimized_v2_2026-03-18.md` 替代,不作为实施基线)
|
||
|
||
---
|
||
|
||
## 1. 系统架构概览
|
||
|
||
### 1.1 整体架构图
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────────────────────────┐
|
||
│ 客户端层 │
|
||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
||
│ │ Web App │ │ Mobile App │ │ SDK (Python)│ │ SDK (Node) │ │
|
||
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
|
||
└─────────────────────────────────────────────────────────────────────────────────────┘
|
||
│
|
||
▼
|
||
┌─────────────────────────────────────────────────────────────────────────────────────┐
|
||
│ API Gateway (入口层) │
|
||
│ ┌─────────────────────────────────────────────────────────────────────────────┐ │
|
||
│ │ • 限流 • 鉴权 • 路由 • 日志 • 监控 │ │
|
||
│ └─────────────────────────────────────────────────────────────────────────────┘ │
|
||
└─────────────────────────────────────────────────────────────────────────────────────┘
|
||
│
|
||
▼
|
||
┌─────────────────────────────────────────────────────────────────────────────────────┐
|
||
│ 业务服务层 │
|
||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||
│ │ Router │ │ Auth │ │ Billing │ │ Provider │ │
|
||
│ │ Service │ │ Service │ │ Service │ │ Service │ │
|
||
│ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||
│ │ Tenant │ │ Risk │ │ Settlement │ │ Webhook │ │
|
||
│ │ Service │ │ Service │ │ Service │ │ Service │ │
|
||
│ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||
└─────────────────────────────────────────────────────────────────────────────────────┘
|
||
│
|
||
▼
|
||
┌─────────────────────────────────────────────────────────────────────────────────────┐
|
||
│ 基础设施层 │
|
||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||
│ │ PostgreSQL │ │ Redis │ │ Kafka │ │ S3/MinIO │ │
|
||
│ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||
└─────────────────────────────────────────────────────────────────────────────────────┘
|
||
│
|
||
▼
|
||
┌─────────────────────────────────────────────────────────────────────────────────────┐
|
||
│ 外部集成层 │
|
||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||
│ │ subapi │ │ OpenAI API │ │ Anthropic │ │ 国内供应商 │ │
|
||
│ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||
└─────────────────────────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
---
|
||
|
||
## 2. 技术选型
|
||
|
||
### 2.1 技术栈
|
||
|
||
| 层级 | 技术 | 版本 | 说明 |
|
||
|------|------|------|------|
|
||
| API Gateway | Kong / Traefik | 3.x | 高性能网关 |
|
||
| 后端服务 | Go | 1.21 | 高并发 |
|
||
| Web框架 | Gin | 1.9 | 高性能 |
|
||
| 数据库 | PostgreSQL | 15 | 主数据库 |
|
||
| 缓存 | Redis | 7.x | 缓存+限流 |
|
||
| 消息队列 | Kafka | 3.x | 异步处理 |
|
||
| 服务网格 | Istio | 1.18 | 微服务治理 |
|
||
| 容器编排 | Kubernetes | 1.28 | 容器编排 |
|
||
| CI/CD | GitHub Actions | - | 持续集成 |
|
||
| 监控 | Prometheus + Grafana | - | 可观测性 |
|
||
| 日志 | ELK Stack | 8.x | 日志收集 |
|
||
|
||
### 2.2 项目结构
|
||
|
||
```
|
||
llm-gateway/
|
||
├── cmd/ # 入口程序
|
||
│ ├── gateway/ # 网关服务
|
||
│ ├── router/ # 路由服务
|
||
│ ├── billing/ # 计费服务
|
||
│ └── admin/ # 管理后台
|
||
├── internal/ # 内部包
|
||
│ ├── config/ # 配置管理
|
||
│ ├── middleware/ # 中间件
|
||
│ ├── handler/ # HTTP处理器
|
||
│ ├── service/ # 业务逻辑
|
||
│ ├── repository/ # 数据访问
|
||
│ └── model/ # 数据模型
|
||
├── pkg/ # 公共包
|
||
│ ├── utils/ # 工具函数
|
||
│ ├── errors/ # 错误定义
|
||
│ └── constants/ # 常量定义
|
||
├── api/ # API定义
|
||
│ ├── openapi/ # OpenAPI规范
|
||
│ └── proto/ # Protobuf定义
|
||
├── configs/ # 配置文件
|
||
├── scripts/ # 脚本
|
||
├── test/ # 测试
|
||
└── docs/ # 文档
|
||
```
|
||
|
||
---
|
||
|
||
## 3. 模块详细设计
|
||
|
||
### 3.1 API Gateway 模块
|
||
|
||
```go
|
||
// cmd/gateway/main.go
|
||
package main
|
||
|
||
import (
|
||
"github.com/gin-gonic/gin"
|
||
"llm-gateway/internal/middleware"
|
||
"llm-gateway/internal/handler"
|
||
)
|
||
|
||
func main() {
|
||
r := gin.Default()
|
||
|
||
// 全局中间件
|
||
r.Use(middleware.Logger())
|
||
r.Use(middleware.Recovery())
|
||
r.Use(middleware.CORS())
|
||
|
||
// 限流
|
||
r.Use(middleware.RateLimiter())
|
||
|
||
// API路由
|
||
v1 := r.Group("/v1")
|
||
{
|
||
// 认证
|
||
v1.POST("/auth/token", handler.AuthToken)
|
||
v1.POST("/auth/refresh", handler.RefreshToken)
|
||
|
||
// 对话
|
||
v1.POST("/chat/completions", middleware.AuthRequired(), handler.ChatCompletions)
|
||
v1.POST("/completions", middleware.AuthRequired(), handler.Completions)
|
||
|
||
// Embeddings
|
||
v1.POST("/embeddings", middleware.AuthRequired(), handler.Embeddings)
|
||
|
||
// 模型
|
||
v1.GET("/models", handler.ListModels)
|
||
|
||
// 用户
|
||
users := v1.Group("/users")
|
||
users.Use(middleware.AuthRequired())
|
||
{
|
||
users.GET("", handler.ListUsers)
|
||
users.GET("/:id", handler.GetUser)
|
||
users.PUT("/:id", handler.UpdateUser)
|
||
}
|
||
|
||
// API Key
|
||
keys := v1.Group("/keys")
|
||
keys.Use(middleware.AuthRequired())
|
||
{
|
||
keys.GET("", handler.ListKeys)
|
||
keys.POST("", handler.CreateKey)
|
||
keys.DELETE("/:id", handler.DeleteKey)
|
||
keys.POST("/:id/rotate", handler.RotateKey)
|
||
}
|
||
|
||
// 计费
|
||
billing := v1.Group("/billing")
|
||
billing.Use(middleware.AuthRequired())
|
||
{
|
||
billing.GET("/balance", handler.GetBalance)
|
||
billing.GET("/usage", handler.GetUsage)
|
||
billing.GET("/invoices", handler.ListInvoices)
|
||
}
|
||
|
||
// 供应方
|
||
supply := v1.Group("/supply")
|
||
supply.Use(middleware.AuthRequired())
|
||
{
|
||
supply.GET("/accounts", handler.ListAccounts)
|
||
supply.POST("/accounts", handler.CreateAccount)
|
||
supply.GET("/packages", handler.ListPackages)
|
||
supply.POST("/packages", handler.CreatePackage)
|
||
supply.GET("/earnings", handler.GetEarnings)
|
||
}
|
||
}
|
||
|
||
// 管理后台
|
||
admin := r.Group("/admin")
|
||
admin.Use(middleware.AdminRequired())
|
||
{
|
||
admin.GET("/stats", handler.AdminStats)
|
||
admin.GET("/users", handler.AdminListUsers)
|
||
admin.POST("/users/:id/disable", handler.DisableUser)
|
||
}
|
||
|
||
r.Run(":8080")
|
||
}
|
||
```
|
||
|
||
### 3.2 路由服务模块
|
||
|
||
```go
|
||
// internal/service/router.go
|
||
package service
|
||
|
||
import (
|
||
"context"
|
||
"time"
|
||
"llm-gateway/internal/model"
|
||
"llm-gateway/internal/adapter"
|
||
)
|
||
|
||
type RouterService struct {
|
||
adapterRegistry *adapter.Registry
|
||
metricsCollector *MetricsCollector
|
||
}
|
||
|
||
type RouteRequest struct {
|
||
Model string `json:"model"`
|
||
Messages []model.Message `json:"messages"`
|
||
Options model.CompletionOptions `json:"options"`
|
||
UserID int64 `json:"user_id"`
|
||
TenantID int64 `json:"tenant_id"`
|
||
}
|
||
|
||
func (s *RouterService) Route(ctx context.Context, req RouteRequest) (*model.CompletionResponse, error) {
|
||
// 1. 获取可用供应商
|
||
providers := s.adapterRegistry.GetAvailableProviders(req.Model)
|
||
if len(providers) == 0 {
|
||
return nil, ErrNoProviderAvailable
|
||
}
|
||
|
||
// 2. 选择最优供应商
|
||
selected := s.selectProvider(providers, req)
|
||
|
||
// 3. 记录路由决策
|
||
s.metricsCollector.RecordRoute(ctx, &RouteMetrics{
|
||
Model: req.Model,
|
||
Provider: selected.Name(),
|
||
TenantID: req.TenantID,
|
||
})
|
||
|
||
// 4. 调用供应商
|
||
resp, err := selected.Call(ctx, req)
|
||
if err != nil {
|
||
// 5. 失败时尝试fallback
|
||
return s.tryFallback(ctx, req, err)
|
||
}
|
||
|
||
return resp, nil
|
||
}
|
||
|
||
func (s *RouterService) selectProvider(providers []*adapter.Provider, req RouteRequest) *adapter.Provider {
|
||
// 多维度选择策略
|
||
var best *adapter.Provider
|
||
bestScore := -1.0
|
||
|
||
for _, p := range providers {
|
||
score := s.calculateScore(p, req)
|
||
if score > bestScore {
|
||
bestScore = score
|
||
best = p
|
||
}
|
||
}
|
||
|
||
return best
|
||
}
|
||
|
||
func (s *RouterService) calculateScore(p *adapter.Provider, req RouteRequest) float64 {
|
||
// 延迟评分 (40%)
|
||
latencyScore := 1.0 / (p.LatencyP99 + 1)
|
||
|
||
// 可用性评分 (30%)
|
||
availabilityScore := p.Availability
|
||
|
||
// 成本评分 (20%)
|
||
costScore := 1.0 / (p.CostPer1K + 1)
|
||
|
||
// 质量评分 (10%)
|
||
qualityScore := p.QualityScore
|
||
|
||
return latencyScore*0.4 + availabilityScore*0.3 + costScore*0.2 + qualityScore*0.1
|
||
}
|
||
```
|
||
|
||
### 3.3 Provider Adapter 模块
|
||
|
||
```go
|
||
// internal/adapter/registry.go
|
||
package adapter
|
||
|
||
import (
|
||
"context"
|
||
"sync"
|
||
)
|
||
|
||
type Registry struct {
|
||
mu sync.RWMutex
|
||
providers map[string]Provider
|
||
fallback map[string]string
|
||
health map[string]*HealthStatus
|
||
}
|
||
|
||
type Provider interface {
|
||
Name() string
|
||
Call(ctx context.Context, req interface{}) (interface{}, error)
|
||
HealthCheck(ctx context.Context) error
|
||
GetCapabilities() Capabilities
|
||
}
|
||
|
||
type Capabilities struct {
|
||
SupportsStreaming bool
|
||
SupportsFunctionCall bool
|
||
SupportsVision bool
|
||
MaxTokens int
|
||
Models []string
|
||
}
|
||
|
||
type HealthStatus struct {
|
||
IsHealthy bool
|
||
Latency time.Duration
|
||
LastCheck time.Time
|
||
}
|
||
|
||
func NewRegistry() *Registry {
|
||
return &Registry{
|
||
providers: make(map[string]Provider),
|
||
fallback: make(map[string]string),
|
||
health: make(map[string]*HealthStatus),
|
||
}
|
||
}
|
||
|
||
func (r *Registry) Register(name string, p Provider, fallback string) {
|
||
r.mu.Lock()
|
||
defer r.mu.Unlock()
|
||
|
||
r.providers[name] = p
|
||
if fallback != "" {
|
||
r.fallback[name] = fallback
|
||
}
|
||
|
||
// 启动健康检查
|
||
go r.healthCheckLoop(name, p)
|
||
}
|
||
|
||
func (r *Registry) Get(name string) (Provider, error) {
|
||
r.mu.RLock()
|
||
defer r.mu.RUnlock()
|
||
|
||
p, ok := r.providers[name]
|
||
if !ok {
|
||
return nil, ErrProviderNotFound
|
||
}
|
||
|
||
// 检查健康状态
|
||
if health, ok := r.health[name]; ok && !health.IsHealthy {
|
||
// 尝试fallback
|
||
if fallback, ok := r.fallback[name]; ok {
|
||
return r.providers[fallback], nil
|
||
}
|
||
}
|
||
|
||
return p, nil
|
||
}
|
||
|
||
func (r *Registry) healthCheckLoop(name string, p Provider) {
|
||
ticker := time.NewTicker(30 * time.Second)
|
||
for range ticker.C {
|
||
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||
err := p.HealthCheck(ctx)
|
||
cancel()
|
||
|
||
r.mu.Lock()
|
||
r.health[name] = &HealthStatus{
|
||
IsHealthy: err == nil,
|
||
LastCheck: time.Now(),
|
||
}
|
||
r.mu.Unlock()
|
||
}
|
||
}
|
||
```
|
||
|
||
### 3.4 计费服务模块
|
||
|
||
```go
|
||
// internal/service/billing.go
|
||
package service
|
||
|
||
import (
|
||
"context"
|
||
"decimal"
|
||
"llm-gateway/internal/model"
|
||
)
|
||
|
||
type BillingService struct {
|
||
repo *repository.BillingRepository
|
||
balanceMgr *BalanceManager
|
||
notifier *WebhookNotifier
|
||
}
|
||
|
||
type Money struct {
|
||
Amount decimal.Decimal
|
||
Currency string
|
||
}
|
||
|
||
func (s *BillingService) ProcessRequest(ctx context.Context, req *model.LLMRequest) (*model.BillingRecord, error) {
|
||
// 1. 预扣额度
|
||
estimatedCost := s.EstimateCost(req)
|
||
reserved, err := s.balanceMgr.Reserve(ctx, req.UserID, estimatedCost)
|
||
if err != nil {
|
||
return nil, ErrInsufficientBalance
|
||
}
|
||
|
||
// 2. 处理请求(实际扣费)
|
||
actualCost := s.CalculateActualCost(req.Response)
|
||
|
||
// 3. 补偿差额
|
||
diff := actualCost.Sub(reserved.Amount)
|
||
if diff.IsPositive() {
|
||
err = s.balanceMgr.Charge(ctx, req.UserID, diff)
|
||
} else if diff.IsNegative() {
|
||
err = s.balanceMgr.Refund(ctx, req.UserID, diff.Abs())
|
||
}
|
||
|
||
// 4. 记录账单
|
||
record := &model.BillingRecord{
|
||
UserID: req.UserID,
|
||
RequestID: req.ID,
|
||
Model: req.Model,
|
||
PromptTokens: req.Response.Usage.PromptTokens,
|
||
CompletionTokens: req.Response.Usage.CompletionTokens,
|
||
Amount: actualCost,
|
||
Status: model.BillingStatusSettled,
|
||
}
|
||
|
||
err = s.repo.Create(ctx, record)
|
||
if err != nil {
|
||
// 记录失败,触发补偿
|
||
s.notifier.NotifyBillingAnomaly(ctx, record, err)
|
||
}
|
||
|
||
return record, nil
|
||
}
|
||
|
||
func (s *BillingService) EstimateCost(req *model.LLMRequest) Money {
|
||
// 使用模型定价估算
|
||
price := s.repo.GetModelPrice(req.Model)
|
||
|
||
promptCost := decimal.NewFromInt(int64(req.Messages.Tokens()))
|
||
.Mul(price.InputPer1K)
|
||
.Div(decimal.NewFromInt(1000))
|
||
|
||
// 估算输出
|
||
estimatedOutput := decimal.NewFromInt(int64(req.Options.MaxTokens))
|
||
outputCost := estimatedOutput
|
||
.Mul(price.OutputPer1K)
|
||
.Div(decimal.NewFromInt(1000))
|
||
|
||
total := promptCost.Add(outputCost)
|
||
|
||
return Money{Amount: total.Round(2), Currency: "USD"}
|
||
}
|
||
```
|
||
|
||
### 3.5 风控服务模块
|
||
|
||
```go
|
||
// internal/service/risk.go
|
||
package service
|
||
|
||
import (
|
||
"llm-gateway/internal/model"
|
||
)
|
||
|
||
type RiskService struct {
|
||
rules []RiskRule
|
||
rateLimiter *RateLimiter
|
||
}
|
||
|
||
type RiskRule struct {
|
||
Name string
|
||
Condition func(*model.LLMRequest, *model.User) bool
|
||
Score int
|
||
Action RiskAction
|
||
}
|
||
|
||
type RiskAction string
|
||
|
||
const (
|
||
RiskActionAllow RiskAction = "allow"
|
||
RiskActionBlock RiskAction = "block"
|
||
RiskActionReview RiskAction = "review"
|
||
)
|
||
|
||
func (s *RiskService) Evaluate(ctx context.Context, req *model.LLMRequest) *RiskResult {
|
||
var totalScore int
|
||
var triggers []string
|
||
|
||
user := s.getUser(ctx, req.UserID)
|
||
|
||
for _, rule := range s.rules {
|
||
if rule.Condition(req, user) {
|
||
totalScore += rule.Score
|
||
triggers = append(triggers, rule.Name)
|
||
}
|
||
}
|
||
|
||
// 决策
|
||
if totalScore >= 70 {
|
||
return &RiskResult{
|
||
Action: RiskActionBlock,
|
||
Score: totalScore,
|
||
Triggers: triggers,
|
||
}
|
||
} else if totalScore >= 40 {
|
||
return &RiskResult{
|
||
Action: RiskActionReview,
|
||
Score: totalScore,
|
||
Triggers: triggers,
|
||
}
|
||
}
|
||
|
||
return &RiskResult{
|
||
Action: RiskActionAllow,
|
||
Score: totalScore,
|
||
}
|
||
}
|
||
|
||
// 预定义风控规则
|
||
func DefaultRiskRules() []RiskRule {
|
||
return []RiskRule{
|
||
{
|
||
Name: "high_velocity",
|
||
Condition: func(req *model.LLMRequest, user *model.User) bool {
|
||
return req.TokensPerMinute > 1000
|
||
},
|
||
Score: 30,
|
||
Action: RiskActionBlock,
|
||
},
|
||
{
|
||
Name: "new_account_high_value",
|
||
Condition: func(req *model.LLMRequest, user *model.User) bool {
|
||
return user.AccountAgeDays < 7 && req.EstimatedCost > 100
|
||
},
|
||
Score: 35,
|
||
Action: RiskActionReview,
|
||
},
|
||
{
|
||
Name: "unusual_model",
|
||
Condition: func(req *model.LLMRequest, user *model.User) bool {
|
||
return !user.PreferredModels.Contains(req.Model)
|
||
},
|
||
Score: 15,
|
||
Action: RiskActionReview,
|
||
},
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 4. 数据流设计
|
||
|
||
### 4.1 请求处理流程
|
||
|
||
```
|
||
用户请求
|
||
│
|
||
▼
|
||
┌─────────────────┐
|
||
│ API Gateway │
|
||
│ • 限流 │
|
||
│ • 鉴权 │
|
||
│ • 日志 │
|
||
└────────┬────────┘
|
||
│
|
||
▼
|
||
┌─────────────────┐
|
||
│ 路由决策 │
|
||
│ • 模型映射 │
|
||
│ • 供应商选择 │
|
||
└────────┬────────┘
|
||
│
|
||
┌────┴────┐
|
||
│ │
|
||
▼ ▼
|
||
┌────────┐ ┌────────┐
|
||
│ Provider│ │ Fallback│
|
||
│ A │ │ B │
|
||
└────┬───┘ └────┬───┘
|
||
│ │
|
||
└────┬────┘
|
||
│
|
||
▼
|
||
┌─────────────────┐
|
||
│ 计费处理 │
|
||
│ • 预扣 │
|
||
│ • 实际扣费 │
|
||
│ • 记录 │
|
||
└────────┬────────┘
|
||
│
|
||
▼
|
||
┌─────────────────┐
|
||
│ 响应返回 │
|
||
└─────────────────┘
|
||
```
|
||
|
||
### 4.2 异步处理流程
|
||
|
||
```
|
||
请求处理
|
||
│
|
||
▼
|
||
┌─────────────────┐
|
||
│ 同步:预扣+执行 │
|
||
│ │
|
||
│ • 预扣额度 │
|
||
│ • 调用供应商 │
|
||
│ • 实际扣费 │
|
||
└────────┬────────┘
|
||
│
|
||
┌────┴────┐
|
||
│ │
|
||
▼ ▼
|
||
┌────────┐ ┌────────┐
|
||
│ 同步响应 │ │ 异步队列│
|
||
│ │ │ │
|
||
│ • 返回 │ │ • 记录使用量│
|
||
│ • 更新 │ │ • 统计 │
|
||
│ 余额 │ │ • 对账 │
|
||
└────────┘ └────┬────┘
|
||
│
|
||
▼
|
||
┌─────────────┐
|
||
│ Kafka Topic │
|
||
│ • usage │
|
||
│ • billing │
|
||
│ • audit │
|
||
└──────┬──────┘
|
||
│
|
||
┌─────┴─────┐
|
||
│ │
|
||
▼ ▼
|
||
┌─────────┐ ┌─────────┐
|
||
│ 消费者 │ │ 消费者 │
|
||
│ • 写入DB │ │ • 对账 │
|
||
│ • 监控 │ │ • 告警 │
|
||
└─────────┘ └─────────┘
|
||
```
|
||
|
||
---
|
||
|
||
## 5. API 设计规范
|
||
|
||
### 5.1 RESTful API 设计
|
||
|
||
```yaml
|
||
# openapi.yaml
|
||
openapi: 3.0.3
|
||
info:
|
||
title: LLM Gateway API
|
||
version: 1.0.0
|
||
description: Enterprise LLM Gateway API
|
||
|
||
servers:
|
||
- url: https://api.lgateway.com/v1
|
||
description: Production server
|
||
- url: https://staging-api.lgateway.com/v1
|
||
description: Staging server
|
||
|
||
paths:
|
||
/chat/completions:
|
||
post:
|
||
summary: Create a chat completion
|
||
operationId: createChatCompletion
|
||
tags:
|
||
- Chat
|
||
security:
|
||
- BearerAuth: []
|
||
requestBody:
|
||
required: true
|
||
content:
|
||
application/json:
|
||
schema:
|
||
$ref: '#/components/schemas/ChatCompletionRequest'
|
||
responses:
|
||
'200':
|
||
description: Successful response
|
||
content:
|
||
application/json:
|
||
schema:
|
||
$ref: '#/components/schemas/ChatCompletionResponse'
|
||
'400':
|
||
$ref: '#/components/responses/BadRequest'
|
||
'401':
|
||
$ref: '#/components/responses/Unauthorized'
|
||
'429':
|
||
$ref: '#/components/responses/RateLimited'
|
||
'500':
|
||
$ref: '#/components/responses/InternalServerError'
|
||
|
||
components:
|
||
securitySchemes:
|
||
BearerAuth:
|
||
type: http
|
||
scheme: bearer
|
||
bearerFormat: JWT
|
||
|
||
schemas:
|
||
ChatCompletionRequest:
|
||
type: object
|
||
required:
|
||
- model
|
||
- messages
|
||
properties:
|
||
model:
|
||
type: string
|
||
description: Model identifier
|
||
messages:
|
||
type: array
|
||
items:
|
||
$ref: '#/components/schemas/Message'
|
||
temperature:
|
||
type: number
|
||
minimum: 0
|
||
maximum: 2
|
||
default: 1.0
|
||
max_tokens:
|
||
type: integer
|
||
minimum: 1
|
||
maximum: 32000
|
||
stream:
|
||
type: boolean
|
||
default: false
|
||
|
||
Message:
|
||
type: object
|
||
required:
|
||
- role
|
||
- content
|
||
properties:
|
||
role:
|
||
type: string
|
||
enum: [system, user, assistant]
|
||
content:
|
||
type: string
|
||
|
||
responses:
|
||
BadRequest:
|
||
description: Bad request
|
||
content:
|
||
application/json:
|
||
schema:
|
||
$ref: '#/components/schemas/Error'
|
||
Unauthorized:
|
||
description: Unauthorized
|
||
content:
|
||
application/json:
|
||
schema:
|
||
$ref: '#/components/schemas/Error'
|
||
RateLimited:
|
||
description: Rate limited
|
||
content:
|
||
application/json:
|
||
schema:
|
||
$ref: '#/components/schemas/Error'
|
||
InternalServerError:
|
||
description: Internal server error
|
||
content:
|
||
application/json:
|
||
schema:
|
||
$ref: '#/components/schemas/Error'
|
||
```
|
||
|
||
### 5.2 错误响应格式
|
||
|
||
```json
|
||
{
|
||
"error": {
|
||
"code": "BILLING_001",
|
||
"message": "Insufficient balance",
|
||
"message_i18n": {
|
||
"zh_CN": "余额不足",
|
||
"en_US": "Insufficient balance"
|
||
},
|
||
"details": {
|
||
"required": 100.00,
|
||
"available": 50.00,
|
||
"top_up_url": "/v1/billing/top-up"
|
||
},
|
||
"trace_id": "req_abc123",
|
||
"retryable": false,
|
||
"doc_url": "https://docs.lgateway.com/errors/billing-001"
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 6. 数据库设计
|
||
|
||
### 6.1 核心表结构
|
||
|
||
```sql
|
||
-- 用户表
|
||
CREATE TABLE users (
|
||
id BIGSERIAL PRIMARY KEY,
|
||
email VARCHAR(255) UNIQUE NOT NULL,
|
||
password_hash VARCHAR(255) NOT NULL,
|
||
name VARCHAR(100),
|
||
tenant_id BIGINT REFERENCES tenants(id),
|
||
role VARCHAR(20) DEFAULT 'user',
|
||
status VARCHAR(20) DEFAULT 'active',
|
||
mfa_enabled BOOLEAN DEFAULT FALSE,
|
||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
|
||
);
|
||
|
||
-- API Keys表
|
||
CREATE TABLE api_keys (
|
||
id BIGSERIAL PRIMARY KEY,
|
||
user_id BIGINT NOT NULL REFERENCES users(id),
|
||
key_hash VARCHAR(64) NOT NULL UNIQUE,
|
||
key_prefix VARCHAR(20) NOT NULL,
|
||
description VARCHAR(200),
|
||
permissions JSONB DEFAULT '{}',
|
||
rate_limit_rpm INT DEFAULT 60,
|
||
rate_limit_tpm INT DEFAULT 100000,
|
||
status VARCHAR(20) DEFAULT 'active',
|
||
expires_at TIMESTAMP,
|
||
last_used_at TIMESTAMP,
|
||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
|
||
);
|
||
|
||
-- 租户表
|
||
CREATE TABLE tenants (
|
||
id BIGSERIAL PRIMARY KEY,
|
||
name VARCHAR(100) NOT NULL,
|
||
plan VARCHAR(20) DEFAULT 'free',
|
||
status VARCHAR(20) DEFAULT 'active',
|
||
settings JSONB DEFAULT '{}',
|
||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
|
||
);
|
||
|
||
-- 账单记录表
|
||
CREATE TABLE billing_records (
|
||
id BIGSERIAL PRIMARY KEY,
|
||
user_id BIGINT NOT NULL REFERENCES users(id),
|
||
tenant_id BIGINT REFERENCES tenants(id),
|
||
request_id VARCHAR(64) NOT NULL,
|
||
provider VARCHAR(50) NOT NULL,
|
||
model VARCHAR(50) NOT NULL,
|
||
prompt_tokens INT NOT NULL,
|
||
completion_tokens INT NOT NULL,
|
||
amount DECIMAL(10, 4) NOT NULL,
|
||
currency VARCHAR(3) DEFAULT 'USD',
|
||
status VARCHAR(20) DEFAULT 'settled',
|
||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
|
||
);
|
||
|
||
-- 使用量记录表
|
||
CREATE TABLE usage_records (
|
||
id BIGSERIAL PRIMARY KEY,
|
||
user_id BIGINT NOT NULL REFERENCES users(id),
|
||
tenant_id BIGINT REFERENCES tenants(id),
|
||
api_key_id BIGINT REFERENCES api_keys(id),
|
||
request_id VARCHAR(64) NOT NULL,
|
||
model VARCHAR(50) NOT NULL,
|
||
provider VARCHAR(50) NOT NULL,
|
||
prompt_tokens INT DEFAULT 0,
|
||
completion_tokens INT DEFAULT 0,
|
||
latency_ms INT,
|
||
status_code INT,
|
||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||
);
|
||
|
||
-- 索引
|
||
CREATE INDEX idx_users_email ON users(email);
|
||
CREATE INDEX idx_users_tenant ON users(tenant_id);
|
||
CREATE INDEX idx_api_keys_user ON api_keys(user_id);
|
||
CREATE INDEX idx_api_keys_hash ON api_keys(key_hash);
|
||
CREATE INDEX idx_billing_user ON billing_records(user_id, created_at);
|
||
CREATE INDEX idx_billing_tenant ON billing_records(tenant_id, created_at);
|
||
CREATE INDEX idx_usage_user ON usage_records(user_id, created_at);
|
||
CREATE INDEX idx_usage_request ON usage_records(request_id);
|
||
```
|
||
|
||
---
|
||
|
||
## 7. 消息队列运维简化
|
||
|
||
### 7.1 Kafka运维挑战分析
|
||
|
||
| 挑战 | 影响 | 简化方案 |
|
||
|------|------|----------|
|
||
| 集群管理复杂 | 运维成本高 | 使用托管服务 |
|
||
| 分区副本同步 | 数据延迟 | 优化配置 |
|
||
| 消费者组管理 | 消费积压 | 简化架构 |
|
||
| 监控告警 | 噪声过多 | 精简指标 |
|
||
| 容量规划 | 扩展困难 | 自动化伸缩 |
|
||
|
||
### 7.2 托管Kafka服务选型
|
||
|
||
```yaml
|
||
# 消息队列服务选型
|
||
recommended:
|
||
# 阿里云Kafka(国内)
|
||
aliyun:
|
||
type: managed
|
||
version: 2.2.0
|
||
features:
|
||
- 自动分区重平衡
|
||
- 死信队列支持
|
||
- 跨可用区容灾
|
||
ops_benefits:
|
||
- 免运维
|
||
- SLA 99.9%
|
||
- 按量计费
|
||
|
||
# AWS MSK(海外)
|
||
aws_msk:
|
||
type: managed
|
||
version: 2.8.0
|
||
features:
|
||
- MSK Serverless免容量规划
|
||
- 精细访问控制
|
||
ops_benefits:
|
||
- 与AWS生态集成
|
||
- 托管升级
|
||
|
||
alternatives:
|
||
# 轻量级替代方案
|
||
redis_streams:
|
||
use_case: "低延迟小消息"
|
||
limitations:
|
||
- 无持久化保证
|
||
- 单线程消费
|
||
|
||
# 业务简单时的选择
|
||
database_queues:
|
||
use_case: "消息量<1000/s"
|
||
limitations:
|
||
- 性能有限
|
||
- 需自行实现重试
|
||
```
|
||
|
||
### 7.3 Topic设计简化
|
||
|
||
```python
|
||
# 简化的Topic设计 - 从原来的10+个精简为4个
|
||
TOPIC_DESIGN = {
|
||
# 核心业务Topic
|
||
"llm.requests": {
|
||
"partitions": 6,
|
||
"retention": "7d",
|
||
"description": "LLM请求流转"
|
||
},
|
||
|
||
# 异步计费Topic
|
||
"llm.billing": {
|
||
"partitions": 3,
|
||
"retention": "30d",
|
||
"description": "计费流水"
|
||
},
|
||
|
||
# 通知事件Topic
|
||
"llm.events": {
|
||
"partitions": 3,
|
||
"retention": "3d",
|
||
"description": "各类事件通知"
|
||
},
|
||
|
||
# 监控数据Topic
|
||
"llm.metrics": {
|
||
"partitions": 1,
|
||
"retention": "1d",
|
||
"description": "原始监控数据"
|
||
}
|
||
}
|
||
```
|
||
|
||
### 7.4 消费者组简化
|
||
|
||
```python
|
||
# 简化的消费者组设计
|
||
class SimplifiedConsumerGroup:
|
||
"""简化消费者组管理"""
|
||
|
||
# 原来:每个服务多个消费者组
|
||
# 优化后:一个服务一个消费者组
|
||
|
||
def __init__(self):
|
||
self.groups = {
|
||
"router-service": {
|
||
"topics": ["llm.requests"],
|
||
"consumers": 3, # 与分区数匹配
|
||
"strategy": "round_robin"
|
||
},
|
||
"billing-service": {
|
||
"topics": ["llm.billing"],
|
||
"consumers": 2,
|
||
"strategy": "failover"
|
||
},
|
||
"notification-service": {
|
||
"topics": ["llm.events"],
|
||
"consumers": 1,
|
||
"strategy": "broadcast"
|
||
}
|
||
}
|
||
|
||
def get_consumer_count(self, group: str) -> int:
|
||
"""自动计算消费者数量"""
|
||
return self.groups[group]["consumers"]
|
||
```
|
||
|
||
### 7.5 自动化运维脚本
|
||
|
||
```bash
|
||
#!/bin/bash
|
||
# scripts/kafka-ops.sh - Kafka运维自动化
|
||
|
||
set -e
|
||
|
||
# 1. 主题健康检查
|
||
check_topics() {
|
||
echo "=== 检查Topic状态 ==="
|
||
kafka-topics.sh --bootstrap-server $KAFKA_BROKER --list | while read topic; do
|
||
partitions=$(kafka-topics.sh --bootstrap-server $KAFKA_BROKER \
|
||
--topic $topic --describe | grep -c "Leader:")
|
||
lag=$(kafka-consumer-groups.sh --bootstrap-server $KAFKA_BROKER \
|
||
--group $(get_group_for_topic $topic) \
|
||
--describe | awk '{sum+=$6} END {print sum}')
|
||
|
||
echo "Topic: $topic, Partitions: $partitions, Lag: $lag"
|
||
|
||
if [ $lag -gt 1000 ]; then
|
||
alert "消费积压告警: $topic 积压 $lag 条"
|
||
fi
|
||
done
|
||
}
|
||
|
||
# 2. 自动创建Topic(幂等)
|
||
ensure_topics() {
|
||
for topic in "${!TOPIC_DESIGN[@]}"; do
|
||
config="${TOPIC_DESIGN[$topic]}"
|
||
kafka-topics.sh --bootstrap-server $KAFKA_BROKER \
|
||
--topic $topic --create \
|
||
--partitions ${config[partitions]} \
|
||
--replication-factor 3 \
|
||
--config retention.ms=${config[retention]} \
|
||
2>/dev/null || echo "Topic $topic already exists"
|
||
done
|
||
}
|
||
|
||
# 3. 消费延迟监控
|
||
monitor_lag() {
|
||
for group in $(kafka-consumer-groups.sh --bootstrap-server $KAFKA_BROKER \
|
||
--list 2>/dev/null); do
|
||
lag=$(kafka-consumer-groups.sh --bootstrap-server $KAFKA_BROKER \
|
||
--group $group --describe | awk '{sum+=$6} END {print sum}')
|
||
|
||
prometheus_pushgateway "kafka_consumer_lag" $lag "group=$group"
|
||
done
|
||
}
|
||
```
|
||
|
||
### 7.6 监控指标精简
|
||
|
||
```yaml
|
||
# 精简的Kafka监控指标 - 避免噪声
|
||
kafka_metrics:
|
||
essential:
|
||
- name: kafka_consumer_group_lag_max
|
||
description: 最大消费延迟
|
||
alert_threshold: 1000
|
||
|
||
- name: kafka_topic_partition_under_replicated
|
||
description: 副本不同步数
|
||
alert_threshold: 0
|
||
|
||
- name: kafka_server_broker_topic_messages_in_total
|
||
description: 消息入站速率
|
||
alert_threshold: rate_change > 50%
|
||
|
||
optional:
|
||
# 以下指标仅在排查问题时启用
|
||
- kafka_network_request_metrics
|
||
- kafka_consumer_fetch_manager_metrics
|
||
- kafka_producer_metrics
|
||
```
|
||
|
||
### 7.7 容量规划自动化
|
||
|
||
```python
|
||
# 自动容量规划
|
||
class KafkaCapacityPlanner:
|
||
"""Kafka容量自动规划"""
|
||
|
||
def calculate_requirements(self, metrics: dict) -> dict:
|
||
"""基于实际流量计算容量"""
|
||
# 峰值QPS
|
||
peak_qps = metrics["peak_qps"]
|
||
|
||
# 平均消息大小
|
||
avg_msg_size = metrics["avg_msg_size_kb"] * 1024
|
||
|
||
# 保留期
|
||
retention_days = 7
|
||
|
||
# 计算所需磁盘
|
||
disk_per_day = peak_qps * avg_msg_size * 86400
|
||
total_disk = disk_per_day * retention_days
|
||
|
||
# 推荐配置
|
||
return {
|
||
"partitions": min(peak_qps // 100, 12), # 最大12分区
|
||
"replication_factor": 3,
|
||
"disk_gb": total_disk / (1024**3),
|
||
"broker_count": 3,
|
||
"scaling_trigger": "disk_usage > 70%"
|
||
}
|
||
```
|
||
|
||
### 7.8 故障自愈机制
|
||
|
||
```python
|
||
# Kafka故障自愈
|
||
class KafkaSelfHealing:
|
||
"""Kafka自愈机制"""
|
||
|
||
def __init__(self):
|
||
self.healing_rules = {
|
||
"under_replicated": {
|
||
"detect": "partition.replicas - in.sync.replicas > 0",
|
||
"action": "trigger_preferred_reelection",
|
||
"cooldown": 300 # 5分钟
|
||
},
|
||
"controller_failover": {
|
||
"detect": "controller_epoch跳跃",
|
||
"action": "等待自动选举",
|
||
"cooldown": 60
|
||
},
|
||
"partition_offline": {
|
||
"detect": "leader == -1",
|
||
"action": "assign_new_leader",
|
||
"cooldown": 60
|
||
}
|
||
}
|
||
|
||
async def check_and_heal(self):
|
||
"""定期检查并自愈"""
|
||
for rule_name, rule in self.healing_rules.items():
|
||
if self.should_heal(rule_name):
|
||
await self.execute_healing(rule_name, rule)
|
||
```
|
||
|
||
---
|
||
|
||
## 7. 一致性验证
|
||
|
||
### 7.1 与现有文档一致性
|
||
|
||
| 设计项 | 对应文档 | 一致性 |
|
||
|--------|----------|--------|
|
||
| Provider Adapter | `architecture_solution_v1.md` | ✅ |
|
||
| 路由策略 | `architecture_solution_v1.md` | ✅ |
|
||
| 计费精度 | `business_solution_v1.md` | ✅ |
|
||
| 安全机制 | `security_solution_v1.md` | ✅ |
|
||
| API版本管理 | `api_solution_v1.md` | ✅ |
|
||
| 错误码体系 | `api_solution_v1.md` | ✅ |
|
||
| 限流机制 | `p1_optimization_solution_v1.md` | ✅ |
|
||
| Webhook | `p1_optimization_solution_v1.md` | ✅ |
|
||
|
||
---
|
||
|
||
## 8. 实施计划
|
||
|
||
### 8.1 开发阶段
|
||
|
||
| 阶段 | 内容 | 周数 |
|
||
|------|------|------|
|
||
| Phase 1 | 基础设施 + API Gateway | 3周 |
|
||
| Phase 2 | 核心服务开发 | 4周 |
|
||
| Phase 3 | 集成测试 | 2周 |
|
||
| Phase 4 | 性能优化 | 2周 |
|
||
|
||
---
|
||
|
||
**文档状态**:详细技术架构设计
|
||
**关联文档**:
|
||
- `architecture_solution_v1_2026-03-18.md`
|
||
- `api_solution_v1_2026-03-18.md`
|
||
- `security_solution_v1_2026-03-18.md`
|