etcd备份与恢复

ETCD 存储 k8s 所有数据信息

ETCD 是k8s集群极为重要的一块服务,存储了集群所有的数据信息。同理,如果发生灾难或者 etcd 的数据丢失,都会影响集群数据的恢复,k8s中只用kube-apiserver和etcd进行数据交互。etcdctl版本: 3.3.18,kubernetes: v1.17.3。均采用二进制安装

ETCD 一些查询操作

  • 查看集群状态

    1
    2
    3
    4
    5
    ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/cert/ca.pem --cert=/etc/etcd/cert/etcd.pem --key=/etc/etcd/cert/etcd-key.pem --endpoints=https://172.21.17.32:2379,https://172.21.17.18:2379,https://172.21.16.204:2379 endpoint health

    https://172.21.16.204:2379 is healthy: successfully committed proposal: took = 15.614765ms
    https://172.21.17.32:2379 is healthy: successfully committed proposal: took = 40.200694ms
    https://172.21.17.18:2379 is healthy: successfully committed proposal: took = 230.022141ms
  • 获取某个 key 信息

    1
    2
    3
    ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/cert/ca.pem --cert=/etc/etcd/cert/etcd.pem --key=/etc/etcd/cert/etcd-key.pem --endpoints=https://172.21.17.32:2379,https://172.21.17.18:2379,https://172.21.16.204:2379 get /registry/apiregistration.k8s.io/apiservices/v1.apps

    {"kind":"APIService","apiVersion":"apiregistration.k8s.io/v1beta1","metadata":{"name":"v1.apps","uid":"5790ef34-84c1-432e-8ed8-4fe842c43dfe","creationTimestamp":"2020-03-26T07:46:37Z","labels":{"kube-aggregator.kubernetes.io/automanaged":"onstart"}},"spec":{"service":null,"group":"apps","version":"v1","groupPriorityMinimum":17800,"versionPriority":15},"status":{"conditions":[{"type":"Available","status":"True","lastTransitionTime":"2020-03-26T07:46:37Z","reason":"Local","message":"Local APIServices are always available"}]}}
  • 获取 etcd 版本信息

    1
    2
    3
    4
    ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/cert/ca.pem --cert=/etc/etcd/cert/etcd.pem --key=/etc/etcd/cert/etcd-key.pem --endpoints=https://172.21.17.32:2379,https://172.21.17.18:2379,https://172.21.16.204:2379 version

    etcdctl version: 3.3.18
    API version: 3.3
  • 获取 ETCD 所有的 key

    1
    2
    3
    4
    ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/cert/ca.pem --cert=/etc/etcd/cert/etcd.pem --key=/etc/etcd/cert/etcd-key.pem --endpoints=https://172.21.17.32:2379,https://172.21.17.18:2379,https://172.21.16.204:2379 get / --prefix --keys-only

    # 数据很多
    ………………

etcd 备份

在其中一台服务器操作即可

命令备份

1
2
3
4
5
6
7
ENDPOINTS="https://172.21.17.32:2379,https://172.21.17.18:2379,https://172.21.16.204:2379"
CACERT="/etc/kubernetes/cert/ca.pem"
CERT="/etc/etcd/cert/etcd.pem"
KEY="/etc/etcd/cert/etcd-key.pem"
DATE=`date +%Y%m%d-%H%M%S`
BACKUP_DIR="/opt/etcd_backup"
/usr/bin/etcdctl --cacert=${CACERT} --cert=${CERT} --key=${KEY} --endpoints="${ENDPOINTS}" snapshot save ${BACKUP_DIR}/mysnapshot-${DATE}.db

脚本进行备份

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
cat >etcd-back.sh <<EOF
#!/bin/bash
# 使用v3 版本进行备份,保留10分数据。加入定时任务,没半小时执行一次

# etcd vars
ENDPOINTS="https://172.21.17.32:2379,https://172.21.17.18:2379,https://172.21.16.204:2379"
CACERT="/etc/kubernetes/cert/ca.pem"
CERT="/etc/etcd/cert/etcd.pem"
KEY="/etc/etcd/cert/etcd-key.pem"
export ETCDCTL_API=3

# time
DATE=`date +%Y%m%d-%H%M%S`

# backup dir
BACKUP_DIR="/opt/etcd_backup"

[ ! -d ${BACKUP_DIR} ] && mkdir -p ${BACKUP_DIR}

# exec backup
/usr/bin/etcdctl --cacert=${CACERT} --cert=${CERT} --key=${KEY} --endpoints="${ENDPOINTS}" snapshot save ${BACKUP_DIR}/mysnapshot-${DATE}.db

# save 10
cd ${BACKUP_DIR}
ls -lt ${BACKUP_DIR}/*.db|awk '{if(NR>11){print "rm -rf "$9}}'|sh

恢复

准备工作

  • 停止所有 Master 上 kube-apiserver 服务

    1
    2
    3
    systemctl stop kube-apiserver

    ps -ef | grep kube-apiserver
  • 停止集群中所有 ETCD 服务

    1
    2
    3
    systemctl stop etcd

    systemctl status etcd

移除所有 ETCD 存储目录下数据

1
2
3
mkdir /tmp/etcd_backup/{data,wal}
mv /var/lib/etcd/data /tmp/etcd_backup/data
mv /var/lib/etcd/wal /tmp/etcd_backup/wal

拷贝 ETCD 备份快照

1
2
3
# 在备份的服务器上拷贝
scp -r /opt/etcd_backup/mysnapshot-20200413-090001.db 172.21.17.32:/root/
scp -r /opt/etcd_backup/mysnapshot-20200413-090001.db 172.21.17.18:/root/

恢复备份

这里etcd的数据和日志文件是分开存放的,所以在恢复的时候需要分别制定相应的目录。/var/lib/ecd移除下面的data,wal目录以后,在恢复的时候会自动的创建。不需要手动创建

1
2
3
4
5
6
7
8
# k8s-master01 机器上操作
ETCDCTL_API=3 etcdctl snapshot restore /root/mysnapshot-20200413-090001.db --name etcd01 --initial-cluster "etcd01=https://172.21.17.32:2380,etcd02=https://172.21.17.18:2380,etcd03=https://172.21.16.204:2380" --initial-cluster-token k8s-etcd-cluster --initial-advertise-peer-urls https://172.21.17.32:2380 --data-dir=/var/lib/etcd/data --wal-dir=/var/lib/etcd/wal

# k8s-master02 机器上操作
ETCDCTL_API=3 etcdctl snapshot restore /root/mysnapshot-20200413-090001.db --name etcd02 --initial-cluster "etcd01=https://172.21.17.32:2380,etcd02=https://172.21.17.18:2380,etcd03=https://172.21.16.204:2380" --initial-cluster-token k8s-etcd-cluster --initial-advertise-peer-urls https://172.21.17.18:2380 --data-dir=/var/lib/etcd/data --wal-dir=/var/lib/etcd/wal

# k8s-master03 机器上操作
ETCDCTL_API=3 etcdctl snapshot restore /root/mysnapshot-20200413-090001.db --name etcd03 --initial-cluster "etcd01=https://172.21.17.32:2380,etcd02=https://172.21.17.18:2380,etcd03=https://172.21.16.204:2380" --initial-cluster-token k8s-etcd-cluster --initial-advertise-peer-urls https://172.21.16.204:2380 --data-dir=/var/lib/etcd/data --wal-dir=/var/lib/etcd/wal

启动和检查etcd

1
2
3
4
5
6
7
8
9
# 三台 ETCD 都恢复完成后,依次登陆三台机器启动 ETCD

systemctl start etcd

# 三台 ETCD 启动完成,检查 ETCD 集群状态
ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/cert/ca.pem --cert=/etc/etcd/cert/etcd.pem --key=/etc/etcd/cert/etcd-key.pem --endpoints=https://172.21.17.32:2379,https://172.21.17.18:2379,https://172.21.16.204:2379 endpoint health
https://172.21.16.204:2379 is healthy: successfully committed proposal: took = 34.77035ms
https://172.21.17.32:2379 is healthy: successfully committed proposal: took = 35.320698ms
https://172.21.17.18:2379 is healthy: successfully committed proposal: took = 77.081728ms

启动kube-apiserver

1
2
3
4
5
6
7
8
9
10
11
# ETCD 全部健康,分别到每台 Master 启动 kube-apiserver
systemctl start kube-apiserver

# 检查 Kubernetes 集群是否恢复正常
kubectl get cs
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health":"true"}
etcd-2 Healthy {"health":"true"}
etcd-1 Healthy {"health":"true"}

总结

Kubernetes 集群备份主要是备份 ETCD 集群。而恢复时,主要考虑恢复整个顺序:
停止kube-apiserver --> 停止ETCD --> 恢复数据 --> 启动ETCD --> 启动kube-apiserve

  • 注意:备份ETCD集群时,只需要备份一个ETCD就行,恢复时,拿同一份备份数据恢复。
坚持原创技术分享,您的支持将鼓励我继续创作!
0%