docs: add runbooks and Kubernetes Helm Chart

Add 6 runbook documents:
- 服务启动 (Service Startup)
- 服务停止 (Service Shutdown)
- 配置更新 (Configuration Update)
- 日志分析 (Log Analysis)
- 备份恢复 (Backup & Recovery)
- 安全事件 (Security Incident)

Add Kubernetes Helm Chart:
- Chart.yaml, values.yaml
- Deployment with health checks
- Ingress with TLS support
- PVC for data persistence
- PDB for high availability
- HPA for autoscaling
- ServiceAccount configuration

Add cron-backup.conf for automated backup scheduling.
This commit is contained in:
2026-04-11 22:57:31 +08:00
parent 84d9ed28af
commit 54a73e66f4
18 changed files with 1767 additions and 0 deletions

View File

@@ -0,0 +1,54 @@
# Cron 备份配置示例
# 使用方法: crontab -e 并添加以下行
# 环境变量设置
SHELL=/bin/bash
PATH=/usr/local/bin:/usr/bin:/bin
BACKUP_DIR=/opt/user-management/backups
DB_PATH=/opt/user-management/data/user_management.db
CONFIG_PATH=/opt/user-management/configs/config.yaml
RETENTION_DAYS=30
# ============================================
# 备份任务
# ============================================
# 每天凌晨 2:00 执行备份
0 2 * * * /opt/user-management/scripts/backup/backup.sh >> /var/log/backup.log 2>&1
# 每周日凌晨 3:00 执行完整备份(包含上传到远程存储)
0 3 * * 0 /opt/user-management/scripts/backup/backup.sh && \
scp /opt/user-management/backups/latest.tar.gz backup@remote-server:/backups/
# 每天下午 6:00 检查备份状态并发送报告
0 18 * * * /opt/user-management/scripts/backup/backup.sh --verify || \
echo "Backup verification failed" | mail -s "Backup Alert" admin@example.com
# ============================================
# 清理任务
# ============================================
# 每月 1 日凌晨 4:00 清理超过 90 天的备份
0 4 1 * * find /opt/user-management/backups -name "*.tar.gz" -mtime +90 -delete
# ============================================
# 监控任务
# ============================================
# 每 15 分钟检查服务健康状态
*/15 * * * * curl -sf http://localhost:8080/api/v1/health || \
echo "Service down at $(date)" | mail -s "Service Alert" admin@example.com
# ============================================
# 日志轮转配置 (/etc/logrotate.d/user-management)
# ============================================
/var/log/backup.log {
daily
rotate 7
compress
delaycompress
missingok
notifempty
create 644 root root
}

View File

@@ -0,0 +1,13 @@
apiVersion: v2
name: user-management
description: A Helm chart for User Management System
type: application
version: 1.0.0
appVersion: "1.0.0"
keywords:
- user-management
- authentication
- rbac
maintainers:
- name: DevOps Team
email: devops@example.com

View File

@@ -0,0 +1,172 @@
# User Management System - Helm Chart
Kubernetes Helm Chart for deploying the User Management System.
## Prerequisites
- Kubernetes 1.19+
- Helm 3.2.0+
- ingress-nginx controller (for Ingress)
- cert-manager (for TLS, optional)
## Installation
```bash
# Add the repository
helm repo add user-management https://charts.example.com
helm repo update
# Install the chart
helm install user-management user-management/user-management \
--set config.jwtSecret="your-secret-key" \
--set config.adminEmail="admin@example.com"
```
## Using with Custom Values
```bash
# Create a values file
cat > values.yaml << EOF
replicaCount: 2
config:
jwtSecret: "your-production-secret-key"
adminEmail: "admin@example.com"
logLevel: "warn"
ingress:
enabled: true
hosts:
- host: ums.example.com
paths:
- path: /
tls:
- secretName: ums-tls
hosts:
- ums.example.com
resources:
limits:
cpu: 1000m
memory: 1Gi
EOF
# Install with custom values
helm install user-management user-management/user-management -f values.yaml
```
## Configuration
| Parameter | Description | Default |
|-----------|-------------|---------|
| `replicaCount` | Number of replicas | `1` |
| `image.repository` | Docker image repository | `user-management` |
| `image.tag` | Docker image tag | `latest` |
| `service.type` | Service type | `ClusterIP` |
| `service.port` | Service port | `8080` |
| `ingress.enabled` | Enable Ingress | `true` |
| `ingress.className` | Ingress class | `nginx` |
| `config.jwtSecret` | JWT signing secret (required) | `""` |
| `config.adminEmail` | Admin email | `admin@example.com` |
| `config.logLevel` | Log level | `info` |
| `resources.limits.cpu` | CPU limit | `500m` |
| `resources.limits.memory` | Memory limit | `512Mi` |
| `persistence.enabled` | Enable PVC | `true` |
| `persistence.size` | PVC size | `5Gi` |
| `autoscaling.enabled` | Enable HPA | `false` |
| `autoscaling.minReplicas` | Min replicas | `1` |
| `autoscaling.maxReplicas` | Max replicas | `3` |
## Production Best Practices
### 1. Use TLS
```bash
helm install user-management user-management/user-management \
--set config.jwtSecret="$(openssl rand -base64 32)" \
--set ingress.enabled=true \
--set ingress.tls[0].secretName=ums-tls \
--set ingress.tls[0].hosts[0]=ums.example.com
```
### 2. Set Resource Limits
```bash
helm install user-management user-management/user-management \
--set resources.limits.cpu="1000m" \
--set resources.limits.memory="1Gi" \
--set resources.requests.cpu="250m" \
--set resources.requests.memory="512Mi"
```
### 3. Enable Autoscaling
```bash
helm install user-management user-management/user-management \
--set autoscaling.enabled=true \
--set autoscaling.minReplicas=2 \
--set autoscaling.maxReplicas=10 \
--set autoscaling.targetCPUUtilizationPercentage=70
```
### 4. Use a Strong JWT Secret
```bash
# Generate a secure random secret
JWT_SECRET=$(openssl rand -base64 32 | tr -d '\n')
helm install user-management user-management/user-management \
--set config.jwtSecret="$JWT_SECRET"
```
## Upgrading
```bash
# Upgrade to a new version
helm upgrade user-management user-management/user-management
# Upgrade with new values
helm upgrade user-management user-management/user-management \
--set config.logLevel="debug"
```
## Uninstall
```bash
helm uninstall user-management
# Note: PVC data persists by default. To delete all data:
kubectl delete pvc -l app.kubernetes.io/name=user-management
```
## Troubleshooting
### Pod not starting
```bash
# Check pod status
kubectl get pods -l app.kubernetes.io/name=user-management
# View pod logs
kubectl logs -l app.kubernetes.io/name=user-management
# Describe pod for events
kubectl describe pod -l app.kubernetes.io/name=user-management
```
### Ingress not working
```bash
# Check ingress controller
kubectl get pods -n ingress-nginx
# Check ingress resource
kubectl get ingress -l app.kubernetes.io/name=user-management
# Check certificate
kubectl get certificate -l app.kubernetes.io/name=user-management
```
## License
Internal use only.

View File

@@ -0,0 +1,60 @@
{{/*
Expand the name of the chart.
*/}}
{{- define "user-management.name" -}}
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }}
{{- end }}
{{/*
Create a default fully qualified app name.
*/}}
{{- define "user-management.fullname" -}}
{{- if .Values.fullnameOverride }}
{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- $name := default .Chart.Name .Values.nameOverride }}
{{- if contains $name .Release.Name }}
{{- .Release.Name | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }}
{{- end }}
{{- end }}
{{- end }}
{{/*
Create chart name and version as used by the chart label.
*/}}
{{- define "user-management.chart" -}}
{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "_" "-" | trunc 63 | trimSuffix "-" }}
{{- end }}
{{/*
Common labels
*/}}
{{- define "user-management.labels" -}}
helm.sh/chart: {{ include "user-management.chart" . }}
{{ include "user-management.selectorLabels" . }}
{{- if .Chart.AppVersion }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
{{- end }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- end }}
{{/*
Selector labels
*/}}
{{- define "user-management.selectorLabels" -}}
app.kubernetes.io/name: {{ include "user-management.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- end }}
{{/*
Create the name of the service account to use
*/}}
{{- define "user-management.serviceAccountName" -}}
{{- if .Values.serviceAccount.create }}
{{- default (include "user-management.fullname" .) .Values.serviceAccount.name }}
{{- else }}
{{- default "default" .Values.serviceAccount.name }}
{{- end }}
{{- end }}

View File

@@ -0,0 +1,27 @@
{{- /*
ConfigMap template - stores non-sensitive configuration
*/ -}}
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "user-management.fullname" . }}-config
labels:
{{- include "user-management.labels" . | nindent 4 }}
data:
GIN_MODE: "release"
TZ: "Asia/Shanghai"
LOG_LEVEL: {{ .Values.config.logLevel | quote }}
ADMIN_EMAIL: {{ .Values.config.adminEmail | quote }}
---
{{- /*
Secret template - stores sensitive configuration
*/ -}}
apiVersion: v1
kind: Secret
metadata:
name: {{ include "user-management.fullname" . }}-config
labels:
{{- include "user-management.labels" . | nindent 4 }}
type: Opaque
stringData:
JWT_SECRET: {{ required "config.jwtSecret is required" .Values.config.jwtSecret | b64enc | quote }}

View File

@@ -0,0 +1,112 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "user-management.fullname" . }}
labels:
{{- include "user-management.labels" . | nindent 4 }}
spec:
replicas: {{ .Values.replicaCount }}
selector:
matchLabels:
{{- include "user-management.selectorLabels" . | nindent 6 }}
template:
metadata:
labels:
{{- include "user-management.selectorLabels" . | nindent 8 }}
annotations:
checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
spec:
{{- with .Values.imagePullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 8 }}
{{- end }}
serviceAccountName: {{ include "user-management.serviceAccountName" . }}
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
{{- if .Values.podAntiAffinity.enabled }}
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
{{- include "user-management.selectorLabels" . | nindent 12 }}
topologyKey: {{ .Values.podAntiAffinity.topologyKey }}
{{- end }}
containers:
- name: {{ .Chart.Name }}
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
ports:
- name: http
containerPort: 8080
protocol: TCP
envFrom:
- configMapRef:
name: {{ include "user-management.fullname" . }}-config
{{- if .Values.livenessProbe.enabled }}
livenessProbe:
httpGet:
path: {{ .Values.livenessProbe.path }}
port: http
initialDelaySeconds: {{ .Values.livenessProbe.initialDelaySeconds }}
periodSeconds: {{ .Values.livenessProbe.periodSeconds }}
timeoutSeconds: {{ .Values.livenessProbe.timeoutSeconds }}
failureThreshold: {{ .Values.livenessProbe.failureThreshold }}
{{- end }}
{{- if .Values.readinessProbe.enabled }}
readinessProbe:
httpGet:
path: {{ .Values.readinessProbe.path }}
port: http
initialDelaySeconds: {{ .Values.readinessProbe.initialDelaySeconds }}
periodSeconds: {{ .Values.readinessProbe.periodSeconds }}
timeoutSeconds: {{ .Values.readinessProbe.timeoutSeconds }}
failureThreshold: {{ .Values.readinessProbe.failureThreshold }}
{{- end }}
resources:
{{- toYaml .Values.resources | nindent 12 }}
volumeMounts:
- name: data
mountPath: /app/data
- name: config
mountPath: /app/configs
readOnly: true
- name: tmp
mountPath: /tmp
volumes:
- name: data
{{- if .Values.persistence.enabled }}
persistentVolumeClaim:
claimName: {{ include "user-management.fullname" . }}-data
{{- else }}
emptyDir: {}
{{- end }}
- name: config
secret:
secretName: {{ include "user-management.fullname" . }}-config
- name: tmp
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: {{ include "user-management.fullname" . }}
labels:
{{- include "user-management.labels" . | nindent 4 }}
spec:
type: {{ .Values.service.type }}
ports:
- port: {{ .Values.service.port }}
targetPort: http
protocol: TCP
name: http
selector:
{{- include "user-management.selectorLabels" . | nindent 4 }}

View File

@@ -0,0 +1,32 @@
{{- if .Values.autoscaling.enabled }}
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: {{ include "user-management.fullname" . }}
labels:
{{- include "user-management.labels" . | nindent 4 }}
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: {{ include "user-management.fullname" . }}
minReplicas: {{ .Values.autoscaling.minReplicas }}
maxReplicas: {{ .Values.autoscaling.maxReplicas }}
metrics:
{{- if .Values.autoscaling.targetCPUUtilizationPercentage }}
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: {{ .Values.autoscaling.targetCPUUtilizationPercentage }}
{{- end }}
{{- if .Values.autoscaling.targetMemoryUtilizationPercentage }}
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: {{ .Values.autoscaling.targetMemoryUtilizationPercentage }}
{{- end }}
{{- end }}

View File

@@ -0,0 +1,46 @@
{{- if .Values.ingress.enabled -}}
{{- $fullName := include "user-management.fullname" . -}}
{{- $svcPort := .Values.service.port -}}
{{- if and .Values.ingress.className (not (eq .Values.ingress.className "nginx")) }}
{{- panic "ERROR: ingress.className must be 'nginx' for this chart compatibility" }}
{{- end }}
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: {{ $fullName }}
labels:
{{- include "user-management.labels" . | nindent 4 }}
{{- with .Values.ingress.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
nginx.ingress.kubernetes.io/proxy-body-size: "10m"
nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
nginx.ingress.kubernetes.io/proxy-send-timeout: "300"
spec:
{{- if .Values.ingress.tls }}
ingressClassName: {{ .Values.ingress.className }}
{{- end }}
tls:
{{- range .Values.ingress.tls }}
- hosts:
{{- range .hosts }}
- {{ . | quote }}
{{- end }}
secretName: {{ .secretName }}
{{- end }}
rules:
{{- range .Values.ingress.hosts }}
- host: {{ .host | quote }}
http:
paths:
{{- range .paths }}
- path: {{ .path }}
pathType: {{ .pathType | default "Prefix" }}
backend:
service:
name: {{ $fullName }}
port:
number: {{ $svcPort }}
{{- end }}
{{- end }}
{{- end }}

View File

@@ -0,0 +1,17 @@
{{- if .Values.podDisruptionBudget.enabled }}
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: {{ include "user-management.fullname" . }}
labels:
{{- include "user-management.labels" . | nindent 4 }}
spec:
{{- if .Values.podDisruptionBudget.minAvailable }}
minAvailable: {{ .Values.podDisruptionBudget.minAvailable }}
{{- else }}
maxUnavailable: 1
{{- end }}
selector:
matchLabels:
{{- include "user-management.selectorLabels" . | nindent 6 }}
{{- end }}

View File

@@ -0,0 +1,15 @@
{{- if .Values.persistence.enabled -}}
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: {{ include "user-management.fullname" . }}-data
labels:
{{- include "user-management.labels" . | nindent 4 }}
spec:
accessModes:
- {{ .Values.persistence.accessMode | quote }}
resources:
requests:
storage: {{ .Values.persistence.size | quote }}
storageClassName: {{ .Values.persistence.storageClass | quote }}
{{- end }}

View File

@@ -0,0 +1,6 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: {{ include "user-management.serviceAccountName" . }}
labels:
{{- include "user-management.labels" . | nindent 4 }}

View File

@@ -0,0 +1,90 @@
# Default values for user-management.
replicaCount: 1
image:
repository: user-management
tag: latest
pullPolicy: IfNotPresent
imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""
service:
type: ClusterIP
port: 8080
ingress:
enabled: true
className: nginx
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/ssl-redirect: "true"
hosts:
- host: ums.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: ums-tls
hosts:
- ums.example.com
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 100m
memory: 256Mi
persistence:
enabled: true
storageClass: standard
accessMode: ReadWriteOnce
size: 5Gi
# Pod Anti-Affinity settings
podAntiAffinity:
enabled: true
topologyKey: kubernetes.io/hostname
# Readiness and Liveness probes
readinessProbe:
enabled: true
path: /api/v1/health/ready
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
livenessProbe:
enabled: true
path: /api/v1/health
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
# Pod Disruption Budget
podDisruptionBudget:
enabled: true
minAvailable: 1
# Horizontal Pod Autoscaler
autoscaling:
enabled: false
minReplicas: 1
maxReplicas: 3
targetCPUUtilizationPercentage: 70
targetMemoryUtilizationPercentage: 80
# Config
config:
jwtSecret: ""
adminEmail: "admin@example.com"
logLevel: "info"
# Ingress controller version (for annotation compatibility)
ingressControllerVersion: "1.0"