Add 6 runbook documents: - 服务启动 (Service Startup) - 服务停止 (Service Shutdown) - 配置更新 (Configuration Update) - 日志分析 (Log Analysis) - 备份恢复 (Backup & Recovery) - 安全事件 (Security Incident) Add Kubernetes Helm Chart: - Chart.yaml, values.yaml - Deployment with health checks - Ingress with TLS support - PVC for data persistence - PDB for high availability - HPA for autoscaling - ServiceAccount configuration Add cron-backup.conf for automated backup scheduling.
238 lines
5.0 KiB
Markdown
238 lines
5.0 KiB
Markdown
# 备份恢复 Runbook
|
|
|
|
**用途**: 定期备份数据库和配置,以及故障时恢复数据
|
|
|
|
**适用场景**: 数据保护、故障恢复、迁移部署
|
|
|
|
---
|
|
|
|
## 备份类型
|
|
|
|
| 类型 | 频率 | 保留时间 | 用途 |
|
|
|-----|------|---------|-----|
|
|
| 自动备份 | 每日 | 30天 | 日常数据保护 |
|
|
| 手动备份 | 按需 | 自定义 | 重大变更前 |
|
|
| 灾备备份 | 每周 | 90天 | 灾难恢复 |
|
|
|
|
---
|
|
|
|
## 自动备份配置
|
|
|
|
### 设置定时任务 (Linux)
|
|
|
|
```bash
|
|
# 编辑 crontab
|
|
crontab -e
|
|
|
|
# 添加以下行(每天凌晨 2:00 执行备份)
|
|
0 2 * * * /path/to/scripts/backup/backup.sh >> /var/log/backup.log 2>&1
|
|
|
|
# 验证 crontab
|
|
crontab -l
|
|
```
|
|
|
|
### 设置定时任务 (Docker 环境)
|
|
|
|
```bash
|
|
# 创建定时任务容器或使用宿主机的 cron
|
|
# 在 docker-compose.yml 中添加 cron 服务,或使用宿主机 crontab
|
|
```
|
|
|
|
### Windows 任务计划
|
|
|
|
```powershell
|
|
# 使用 PowerShell 创建计划任务
|
|
$action = New-ScheduledTaskAction -Execute "C:\path\to\scripts\backup\backup.sh"
|
|
$trigger = New-ScheduledTaskTrigger -Daily -At "2:00AM"
|
|
Register-ScheduledTask -Action $action -Trigger $trigger -TaskName "UserManagementBackup"
|
|
```
|
|
|
|
---
|
|
|
|
## 手动备份
|
|
|
|
### 执行备份
|
|
|
|
```bash
|
|
# 基本备份
|
|
./scripts/backup/backup.sh
|
|
|
|
# 指定备份目录
|
|
BACKUP_DIR=/mnt/backups ./scripts/backup/backup.sh
|
|
|
|
# 指定数据库路径
|
|
DB_PATH=/custom/path/user_management.db ./scripts/backup/backup.sh
|
|
```
|
|
|
|
### 备份输出
|
|
|
|
```
|
|
[INFO] Starting backup...
|
|
[INFO] Backing up database: ./data/user_management.db
|
|
[SUCCESS] Database backed up to: /backups/user-management_20260411_020000/database.db
|
|
[INFO] Backing up config: ./configs/config.yaml
|
|
[SUCCESS] Config backed up to: /backups/user-management_20260411_020000/config.yaml
|
|
[SUCCESS] Backup completed: /backups/user-management_20260411_020000.tar.gz
|
|
[SUCCESS] Checksum: abc123... user-management_20260411_020000.tar.gz
|
|
```
|
|
|
|
---
|
|
|
|
## 备份恢复
|
|
|
|
### 1. 确认恢复需求
|
|
|
|
> **警告**: 恢复操作会覆盖当前数据!
|
|
|
|
- [ ] 确认需要恢复的原因
|
|
- [ ] 确认备份文件完整
|
|
- [ ] 通知相关用户
|
|
|
|
### 2. 检查备份完整性
|
|
|
|
```bash
|
|
# 列出可用备份
|
|
./scripts/backup/backup.sh --list
|
|
|
|
# 验证备份
|
|
./scripts/backup/backup.sh --verify
|
|
```
|
|
|
|
### 3. 执行恢复
|
|
|
|
```bash
|
|
# 恢复前先停止服务
|
|
docker-compose stop
|
|
|
|
# 执行恢复(会提示确认)
|
|
./scripts/backup/backup.sh --restore
|
|
|
|
# 如果需要恢复特定备份
|
|
LATEST_BACKUP=/path/to/specific/backup.tar.gz ./scripts/backup/backup.sh --restore
|
|
```
|
|
|
|
### 4. 验证恢复
|
|
|
|
```bash
|
|
# 启动服务
|
|
docker-compose up -d
|
|
|
|
# 验证数据库
|
|
sqlite3 data/user_management.db "PRAGMA integrity_check;"
|
|
|
|
# 验证数据
|
|
curl http://localhost:8080/api/v1/health
|
|
```
|
|
|
|
---
|
|
|
|
## 增量备份策略
|
|
|
|
对于数据量大的场景,可以实现增量备份:
|
|
|
|
### 方案 A: 文件级增量
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# 增量备份脚本
|
|
# 只备份自上次备份以来修改的文件
|
|
|
|
LAST_BACKUP=$(ls -t backups/*.tar.gz | head -1)
|
|
BACKUP_DIR="./incremental_backups"
|
|
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
|
|
|
|
mkdir -p $BACKUP_DIR
|
|
|
|
# 使用 rsync 进行增量备份
|
|
rsync -av --compare-dest=$LAST_BACKUP data/ $BACKUP_DIR/incremental_$TIMESTAMP/
|
|
```
|
|
|
|
### 方案 B: SQLite 在线备份
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# SQLite 在线备份(不需要停止服务)
|
|
|
|
DB_PATH="./data/user_management.db"
|
|
BACKUP_PATH="./backups/incremental_$(date +%Y%m%d_%H%M%S).db"
|
|
|
|
# 使用 SQLite 的 .backup 命令(事务一致)
|
|
sqlite3 $DB_PATH "VACUUM INTO '$BACKUP_PATH';"
|
|
|
|
echo "增量备份完成: $BACKUP_PATH"
|
|
```
|
|
|
|
---
|
|
|
|
## 异地备份
|
|
|
|
### 方案 A: SCP 到远程服务器
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# 备份到远程服务器
|
|
|
|
BACKUP_FILE=$(ls -t backups/*.tar.gz | head -1)
|
|
REMOTE_USER="backup"
|
|
REMOTE_HOST="backup-server.example.com"
|
|
REMOTE_PATH="/backups/user-management"
|
|
|
|
scp $BACKUP_FILE $REMOTE_USER@$REMOTE_HOST:$REMOTE_PATH/
|
|
```
|
|
|
|
### 方案 B: 云存储
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# 备份到 S3 兼容存储
|
|
|
|
BACKUP_FILE=$(ls -t backups/*.tar.gz | head -1)
|
|
|
|
# 使用 s3cmd
|
|
s3cmd put $BACKUP_FILE s3://my-bucket/user-management-backups/
|
|
|
|
# 或使用 aws cli
|
|
aws s3 cp $BACKUP_FILE s3://my-bucket/user-management-backups/
|
|
```
|
|
|
|
---
|
|
|
|
## 灾难恢复计划 (DRP)
|
|
|
|
### RTO (恢复时间目标): 4 小时
|
|
### RPO (恢复点目标): 24 小时
|
|
|
|
### 灾难恢复步骤
|
|
|
|
1. **宣布灾难** - 联系运维团队和相关负责人
|
|
2. **评估损失** - 确定数据丢失范围和时间点
|
|
3. **启动恢复** - 按以下顺序恢复:
|
|
- 基础设施(服务器、网络)
|
|
- 最新稳定备份
|
|
- 增量备份(如有)
|
|
4. **验证服务** - 确认所有核心功能正常
|
|
5. **通知用户** - 告知恢复完成和服务可用
|
|
|
|
### 恢复检查清单
|
|
|
|
- [ ] 数据库完整恢复
|
|
- [ ] 配置文件正确
|
|
- [ ] 服务正常启动
|
|
- [ ] 用户认证正常
|
|
- [ ] 核心 API 可用
|
|
- [ ] 数据完整性验证
|
|
|
|
---
|
|
|
|
## 相关文档
|
|
|
|
- [服务启动](./01-服务启动.md) - 恢复后启动服务
|
|
- [服务停止](./02-服务停止.md) - 备份前停止服务
|
|
- [配置更新](./03-配置更新.md) - 配置文件备份
|
|
|
|
---
|
|
|
|
**维护日期**: 2026-04-11
|
|
**下次审查**: 每季度检查一次
|
|
**测试频率**: 每季度执行一次恢复演练
|