0%

OpenClaw3 K8s 部署最佳实践

OpenClaw3 K8s 部署最佳实践

摘要:本文记录了将 OpenClaw3 智能助手部署到 K8s 集群的完整过程,包含 5 小时实战经验、血泪教训和验证过的正确方案。

📊 部署概述

架构设计

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
┌─────────────────────────────────────────────────────┐
│ K8s Cluster │
│ ┌─────────────────────────────────────────────┐ │
│ │ openclaw namespace │ │
│ │ ┌─────────────────────────────────────┐ │ │
│ │ │ openclaw-gateway Deployment │ │ │
│ │ │ ┌───────────────────────────────┐ │ │ │
│ │ │ │ openclaw Container (4Gi) │ │ │ │
│ │ │ │ - Gateway: 18789 │ │ │ │
│ │ │ │ - Feishu WebSocket │ │ │ │
│ │ │ │ - Aliyun Bailian API │ │ │ │
│ │ │ └───────────────────────────────┘ │ │ │
│ │ │ ▼ PVC (200Gi) │ │ │
│ │ │ - config/ │ │ │
│ │ │ - workspace/ │ │ │
│ │ └─────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘

┌─────────────────────┐
│ External Services │
├─────────────────────┤
│ - Aliyun Bailian │
│ - Feishu Bot │
│ - Harbor Registry │
└─────────────────────┘

环境要求

项目 要求
K8s 版本 1.20+
存储 CephFS (ReadWriteMany)
镜像仓库 Harbor (私有)
内存 4Gi (Pod 限制)
CPU 2 Core (Pod 限制)

🔧 部署步骤

步骤 1:创建命名空间和 RBAC

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# 创建命名空间
kubectl create namespace openclaw

# 创建服务账户
kubectl create serviceaccount openclaw-sa -n openclaw

# 创建 ClusterRole
kubectl create clusterrole openclaw-clusterrole \
--verb=get,list,watch,create,update,patch,delete \
--resource=pods,pods/log,pods/exec,deployments,services,configmaps,secrets,persistentvolumeclaims

# 创建 ClusterRoleBinding
kubectl create clusterrolebinding openclaw-clusterrolebinding \
--clusterrole=openclaw-clusterrole \
--serviceaccount=openclaw:openclaw-sa

步骤 2:创建镜像拉取 Secret

1
2
3
4
5
kubectl create secret docker-registry harbor-registry-secret \
--docker-server=<harbor-address> \
--docker-username=<username> \
--docker-password=<password> \
-n openclaw

步骤 3:创建应用 Secret

1
2
3
4
5
kubectl create secret generic openclaw-secrets \
--from-literal=FEISHU_APP_ID='<your-app-id>' \
--from-literal=FEISHU_APP_SECRET='<your-app-secret>' \
--from-literal=DASHSCOPE_API_KEY='<your-api-key>' \
-n openclaw

步骤 4:创建 PVC

1
2
3
4
5
6
7
8
9
10
11
12
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: openclaw-data-pvc
namespace: openclaw
spec:
accessModes:
- ReadWriteMany
storageClassName: csi-cephfs-sc
resources:
requests:
storage: 200Gi

步骤 5:部署应用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
apiVersion: apps/v1
kind: Deployment
metadata:
name: openclaw-gateway
namespace: openclaw
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: openclaw
template:
metadata:
labels:
app: openclaw
spec:
serviceAccountName: openclaw-sa
imagePullSecrets:
- name: harbor-registry-secret
containers:
- name: openclaw
image: <harbor-address>/crystalforge/openclaw-cn-base:1.0.3-feishu
imagePullPolicy: Always
command: ['/bin/bash', '-c', 'openclaw gateway --allow-unconfigured']
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2
memory: 4Gi
livenessProbe:
httpGet:
path: /health
port: 18789
initialDelaySeconds: 60
periodSeconds: 30
timeoutSeconds: 10
failureThreshold: 3
ports:
- containerPort: 18789
name: gateway
env:
- name: TZ
value: Asia/Shanghai
- name: FEISHU_APP_ID
valueFrom:
secretKeyRef:
name: openclaw-secrets
key: FEISHU_APP_ID
- name: FEISHU_APP_SECRET
valueFrom:
secretKeyRef:
name: openclaw-secrets
key: FEISHU_APP_SECRET
- name: DASHSCOPE_API_KEY
valueFrom:
secretKeyRef:
name: openclaw-secrets
key: DASHSCOPE_API_KEY
volumeMounts:
- name: data-volume
mountPath: /root/.openclaw
subPath: config
- name: data-volume
mountPath: /openclaw/workspace
subPath: workspace
volumes:
- name: data-volume
persistentVolumeClaim:
claimName: openclaw-data-pvc

🚨 血泪教训(5 小时实战经验)

教训 1:配置文件格式

问题:Pod CrashLoopBackOff
错误JSON5: invalid character '}' at 1:503
原因:使用 heredoc 方式写入 JSON 导致格式损坏

错误做法

1
2
3
kubectl exec ... -- bash -c 'cat > config.json << EOF
{ ... }
EOF'

正确做法

1
2
3
4
5
6
7
8
9
10
# 1. 本地编辑并验证
vim config.json
jq . config.json > /dev/null && echo "✅ 格式正确"

# 2. 复制到 Pod
kubectl cp config.json namespace/pod:/path/config.json

# 3. 验证并重启
kubectl exec ... -- cat /path/config.json | jq .
kubectl rollout restart deployment/xxx

教训 2:存储方案选择

问题:Pod Pending
原因:多 PVC 方案超出集群配额

错误做法

1
2
3
4
5
6
7
8
volumes:
- name: config-volume
persistentVolumeClaim:
claimName: openclaw-config-pvc
- name: workspace-volume
persistentVolumeClaim:
claimName: openclaw-workspace-pvc
# ... 5 个 PVC

正确做法

1
2
3
4
5
6
7
8
9
10
11
volumes:
- name: data-volume
persistentVolumeClaim:
claimName: openclaw-data-pvc
volumeMounts:
- name: data-volume
mountPath: /root/.openclaw
subPath: config
- name: data-volume
mountPath: /openclaw/workspace
subPath: workspace

教训 3:配置更新方式

问题:配置无法保存
原因:ConfigMap 挂载是只读的

错误做法

1
2
3
4
volumes:
- name: config-volume
configMap:
name: openclaw-config

正确做法

1
2
3
4
5
6
7
8
volumes:
- name: data-volume
persistentVolumeClaim:
claimName: openclaw-data-pvc
volumeMounts:
- name: data-volume
mountPath: /root/.openclaw/.openclaw/openclaw.json
subPath: openclaw.json

✅ 验证清单

部署验证

1
2
3
4
5
6
7
8
9
10
11
12
# 1. Pod 状态
kubectl get pods -n openclaw
# 期望:Running (1/1)

# 2. 日志检查
kubectl logs -n openclaw deployment/openclaw-gateway --tail=50
# 期望:无错误,看到 "listening on ws://..."

# 3. 配置验证
kubectl exec -n openclaw deployment/openclaw-gateway -- \
cat /root/.openclaw/.openclaw/openclaw.json | jq .channels.feishu.enabled
# 期望:true

功能验证

1
2
3
4
5
6
7
8
9
10
11
# 1. Feishu 配对
# 发送消息到机器人,获取配对码
# openclaw pairing approve feishu <CODE>

# 2. 模型调用
# 发送问题到 Feishu 机器人
# 检查日志中的模型响应

# 3. 服务访问
curl http://<node-ip>:18789/health
# 期望:{"status":"ok"}

📞 故障排查

Pod CrashLoopBackOff

1
2
3
4
5
6
7
8
9
10
# 1. 查看日志
kubectl logs -n openclaw deployment/openclaw-gateway --tail=100

# 2. 检查配置
kubectl exec -n openclaw deployment/openclaw-gateway -- \
cat /root/.openclaw/.openclaw/openclaw.json | jq .

# 3. 修复配置
kubectl cp config.json namespace/pod:/root/.openclaw/.openclaw/openclaw.json
kubectl delete pod -n openclaw -l app=openclaw

Pod Pending

1
2
3
4
5
6
7
# 1. 查看事件
kubectl describe pod -n openclaw -l app=openclaw

# 2. 检查 PVC
kubectl get pvc -n openclaw

# 3. 简化为单 PVC 方案

镜像拉取失败

1
2
3
4
5
6
7
8
9
# 1. 检查 Secret
kubectl get secret harbor-registry-secret -n openclaw -o yaml

# 2. 验证 Harbor 认证
docker login <harbor-address>

# 3. 重新创建 Secret
kubectl delete secret harbor-registry-secret -n openclaw
kubectl create secret docker-registry ...

🎯 核心原则

简单 > 复杂,验证 > 假设,备份 > 修改

部署原则

  1. 单 PVC 方案 - 避免多 PVC 调度问题
  2. 无 initContainer - 先启动再配置
  3. 分步验证 - 每步确认成功再继续
  4. 本地验证 - JSON 配置先 jq 验证再部署

配置原则

  1. 备份优先 - 修改前先备份
  2. 格式验证 - jq 验证 JSON 格式
  3. kubectl cp - 不用 heredoc/echo 管道
  4. 重启验证 - 配置更新后重启 Pod

📚 相关资源


作者: John
日期: 2026-03-10
版本: v1.0
分类: DevOps/K8s