metrics-server安装季

        metrics-server 通过 kube-apiserver 发现所有节点,然后调用 kubelet APIs(通过 https 接口)获得各节点(Node)和 Pod 的 CPU、Memory 等资源使用情况。Kubernetes 1.12 开始,kubernetes 的安装脚本移除了 Heapster,从 1.13 开始完全移除了对 Heapster 的支持,Heapster 不再被维护。

  • 替代方案如下:
    • 用于支持自动扩缩容的 CPU/memory HPA metrics:metrics-server
    • 通用的监控方案:使用第三方可以获取 Prometheus 格式监控指标的监控系统,如 Prometheus Operator
    • 事件传输:使用第三方工具来传输、归档 kubernetes events

        使用 metrics-server 替代 Heapster,将无法在 dashboard 中以图形展示 Pod 的内存和 CPU 情况,需要通过 Prometheus、Grafana 等监控方案来弥补。

1、监控架构

img

2、安装 metrics-server

  • 从 github clone 源码
    1
    2
    3
    4
    # git clone https://github.com/xxlaila/kubernetes-yaml.git
    # cd kubernetes-yaml/metrics-server
    # ls
    aggregated-metrics-reader.yaml auth-delegator.yaml auth-reader.yaml metrics-apiservice.yaml metrics-server-deployment.yaml metrics-server-service.yaml resource-reader.yaml
  • 注意: 之前在安装的时候遇到很多坑,而且网上看了教程基本上不能用,很坑,自己看网上教程,然后根据每一个错误来进行解决,终于,功夫不负有心人,花了一天半终于搞定啦。

3、metrics-server 文件修改

metrics-server yaml文件这里文件已经修改好了,可以直接拿来用,参考文献

3.1、metrics-server-deployment.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# cat metrics-server-deployment.yaml
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: metrics-server
namespace: kube-system
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: metrics-server
namespace: kube-system
labels:
k8s-app: metrics-server
spec:
selector:
matchLabels:
k8s-app: metrics-server
template:
metadata:
name: metrics-server
labels:
k8s-app: metrics-server
spec:
serviceAccountName: metrics-server
hostNetwork: true #增加行
volumes:
# mount in tmp so we can safely use from-scratch images and/or read-only containers
- name: tmp-dir
emptyDir: {}
containers:
- name: metrics-server
image: mirrorgooglecontainers/metrics-server-amd64:v0.3.3
imagePullPolicy: Always
volumeMounts:
- mountPath: /etc/ssl/kubernetes/
name: ca-ssl
command: # command内容均为增加
- /metrics-server
- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
- --requestheader-client-ca-file=/etc/ssl/kubernetes/front-proxy-ca.pem
- --kubelet-insecure-tls=true
volumes:
- name: ca-ssl
hostPath:
path: /etc/kubernetes/ssl
3.2、
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# cat resource-reader.yaml
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: system:metrics-server
rules:
- apiGroups:
- ""
resources:
- pods
- nodes
- nodes/stats
- namespaces
verbs:
- get
- list
- watch
- apiGroups: # 增加
- "extensions"
resources:
- deployments
verbs:
- get
- list
- update
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: system:metrics-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:metrics-server
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system

4、准备证书

这些证书文件主要用在Metrics API aggregator 上,参考文献

  • front-proxy-ca-csr.json

    1
    2
    3
    4
    5
    6
    7
    8
    # cat front-proxy-ca-csr.json 
    {
    "CN": "kubernetes",
    "key": {
    "algo": "rsa",
    "size": 2048
    }
    }
  • front-proxy-client-csr.json

    1
    2
    3
    4
    5
    6
    7
    {
    "CN": "front-proxy-client",
    "key": {
    "algo": "rsa",
    "size": 2048
    }
    }
4.1、生成证书
1
2
3
4
5
6
7
8
9
# cfssl gencert   -initca front-proxy-ca-csr.json | cfssljson -bare front-proxy-ca
# cfssl gencert \
-ca=front-proxy-ca.pem \
-ca-key=front-proxy-ca-key.pem \
-config=/root/ssl/kubernetes-gencert.json \
-profile=kubernetes \
front-proxy-client-csr.json | cfssljson -bare front-proxy-client
# ls *.pem
front-proxy-ca-key.pem front-proxy-ca.pem front-proxy-client-key.pem front-proxy-client.pem
  • 证书生成完成后,吧证书复制到所有的master节点和node节点

5、master修改配置文件

5.1、apiserver

在apiserver配置文件里面增加如下配置

1
2
3
4
5
6
7
8
9
--runtime-config=api/all=true \
--enable-aggregator-routing=true \
--requestheader-client-ca-file=/etc/kubernetes/ssl/front-proxy-ca.pem \
--requestheader-allowed-names=aggregator \
--requestheader-extra-headers-prefix=X-Remote-Extra- \
--requestheader-group-headers=X-Remote-Group \
--requestheader-username-headers=X-Remote-User \
--proxy-client-cert-file=/etc/kubernetes/ssl/front-proxy-client.pem \
--proxy-client-key-file=/etc/kubernetes/ssl/front-proxy-client-key.pem \
  • apiserver配置文件KUBE_API_ARGS内容如下
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    KUBE_API_ARGS=" --allow-privileged=true \
    --anonymous-auth=false \
    --alsologtostderr \
    --apiserver-count=3 \
    --audit-log-maxage=30 \
    --audit-log-maxbackup=3 \
    --audit-log-maxsize=100 \
    --enable-aggregator-routing=true \
    --audit-log-path=/var/log/kube-audit/audit.log \
    --audit-policy-file=/etc/kubernetes/audit-policy.yaml \
    --authorization-mode=Node,RBAC \
    --client-ca-file=/etc/kubernetes/ssl/kubernetes-ca.pem \
    --enable-bootstrap-token-auth \
    --enable-garbage-collector \
    --enable-logs-handler \
    --endpoint-reconciler-type=lease \
    --etcd-cafile=/etc/etcd/ssl/etcd-ca.pem \
    --etcd-certfile=/etc/etcd/ssl/etcd.pem \
    --etcd-keyfile=/etc/etcd/ssl/etcd-key.pem \
    --etcd-compaction-interval=0s \
    --event-ttl=168h0m0s \
    --kubelet-https=true \
    --kubelet-certificate-authority=/etc/kubernetes/ssl/kubernetes-ca.pem \
    --kubelet-client-certificate=/etc/kubernetes/ssl/kubelet-api-admin.pem \
    --kubelet-client-key=/etc/kubernetes/ssl/kubelet-api-admin-key.pem \
    --kubelet-timeout=3s \
    --runtime-config=api/all=true \
    --service-node-port-range=30000-50000 \
    --service-account-key-file=/etc/kubernetes/ssl/kubernetes-ca.pem \
    --tls-cert-file=/etc/kubernetes/ssl/kube-apiserver.pem \
    --tls-private-key-file=/etc/kubernetes/ssl/kube-apiserver-key.pem \
    --requestheader-client-ca-file=/etc/kubernetes/ssl/front-proxy-ca.pem \
    --requestheader-allowed-names=aggregator \
    --requestheader-extra-headers-prefix=X-Remote-Extra- \
    --requestheader-group-headers=X-Remote-Group \
    --requestheader-username-headers=X-Remote-User \
    --proxy-client-cert-file=/etc/kubernetes/ssl/front-proxy-client.pem \
    --proxy-client-key-file=/etc/kubernetes/ssl/front-proxy-client-key.pem \
    --v=2"
5.2、kube-control-manager

在controller-manager文件增加如下配置

1
--horizontal-pod-autoscaler-use-rest-clients=true \
  • kube-control-manager配置文件KUBE_CONTROLLER_MANAGER_ARGS如下
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    KUBE_CONTROLLER_MANAGER_ARGS="  --address=0.0.0.0 \
    --authentication-kubeconfig=/etc/kubernetes/kube-controller-manager.kubeconfig \
    --authorization-kubeconfig=/etc/kubernetes/kube-controller-manager.kubeconfig \
    --bind-address=0.0.0.0 \
    --cluster-name=kubernetes \
    --cluster-signing-cert-file=/etc/kubernetes/ssl/kubernetes-ca.pem \
    --cluster-signing-key-file=/etc/kubernetes/ssl/kubernetes-ca-key.pem \
    --client-ca-file=/etc/kubernetes/ssl/kubernetes-ca.pem \
    --controllers=*,bootstrapsigner,tokencleaner \
    --deployment-controller-sync-period=10s \
    --experimental-cluster-signing-duration=87600h0m0s \
    --enable-garbage-collector=true \
    --kubeconfig=/etc/kubernetes/kube-controller-manager.kubeconfig \
    --leader-elect=true \
    --node-monitor-grace-period=20s \
    --node-monitor-period=5s \
    --port=10252 \
    --pod-eviction-timeout=2m0s \
    --requestheader-client-ca-file=/etc/kubernetes/ssl/kubernetes-ca.pem \
    --terminated-pod-gc-threshold=50 \
    --tls-cert-file=/etc/kubernetes/ssl/kube-controller-manager.pem \
    --tls-private-key-file=/etc/kubernetes/ssl/kube-controller-manager-key.pem \
    --root-ca-file=/etc/kubernetes/ssl/kubernetes-ca.pem \
    --secure-port=10257 \
    --service-cluster-ip-range=10.254.0.0/16 \
    --service-account-private-key-file=/etc/kubernetes/ssl/kubernetes-ca-key.pem \
    --use-service-account-credentials=true \
    --horizontal-pod-autoscaler-use-rest-clients=true \
    --v=2"
5.3、重启服务
1
# systemctl  restart kube-apiserver.service &&systemctl  restart kube-controller-manager

6、node节点配置文件修改

node 节点修改修改kubelet文件

  • kubelet配置文件完成的KUBELET_ARGS参数

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    KUBELET_ARGS="  --address=0.0.0.0 \
    --allow-privileged \
    --anonymous-auth=false \
    --authorization-mode=Webhook \
    --authentication-token-webhook=true \
    --bootstrap-kubeconfig=/etc/kubernetes/bootstrap.kubeconfig \
    --client-ca-file=/etc/kubernetes/ssl/kubernetes-ca.pem \
    --cgroup-driver=cgroupfs \
    --cert-dir=/etc/kubernetes/ssl \
    --cluster-dns=10.254.0.2 \
    --cluster-domain=cluster.local \
    --eviction-soft=imagefs.available<15%,memory.available<512Mi,nodefs.available<15%,nodefs.inodesFree<10% \
    --eviction-soft-grace-period=imagefs.available=3m,memory.available=1m,nodefs.available=3m,nodefs.inodesFree=1m \
    --eviction-hard=imagefs.available<10%,memory.available<256Mi,nodefs.available<10%,nodefs.inodesFree<5% \
    --eviction-max-pod-grace-period=30 \
    --image-gc-high-threshold=80 \
    --image-gc-low-threshold=70 \
    --image-pull-progress-deadline=30s \
    --kubeconfig=/etc/kubernetes/kubelet.kubeconfig \
    --max-pods=100 \
    --minimum-image-ttl-duration=720h0m0s \
    --node-labels=node.kubernetes.io/k8s-node=true \
    --pod-infra-container-image=docker.io/kubernetes/pause:latest \
    --port=10250 \
    --read-only-port=0 \
    --rotate-certificates \
    --rotate-server-certificates \
    --fail-swap-on=false \
    --v=2"
  • 重启kubelet

    1
    # systemctl restart kubelet

7、创建metrics

通过yaml文件创建对应的资源

1
# kubectl create -f ./
7.1、查看运行情况
1
2
3
4
5
6
7
# kubectl -n kube-system get pods -l k8s-app=metrics-server
NAME READY STATUS RESTARTS AGE
metrics-server-84b786c9bb-7trdr 1/1 Running 0 62m

# kubectl get svc -n kube-system metrics-server
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
metrics-server ClusterIP 10.254.45.238 <none> 443/TCP 3h6m
7.2、获取v1beta1.metrics.k8s.io并验证

        之前v1beta1.metrics.k8s.io kube-system/metrics-server True 3h参数一直是v1beta1.metrics.k8s.io kube-system/metrics-server False (FailedDiscoveryCheck) 16m,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#  kubectl get apiservice
NAME SERVICE AVAILABLE AGE
v1. Local True 4d
v1.apps Local True 4d
v1.authentication.k8s.io Local True 4d
v1.authorization.k8s.io Local True 4d
v1.autoscaling Local True 4d
v1.batch Local True 4d
v1.networking.k8s.io Local True 4d
v1.rbac.authorization.k8s.io Local True 4d
v1.storage.k8s.io Local True 4d
v1alpha1.admissionregistration.k8s.io Local True 4d
v1alpha1.auditregistration.k8s.io Local True 4d
v1alpha1.rbac.authorization.k8s.io Local True 4d
v1alpha1.scheduling.k8s.io Local True 4d
v1alpha1.settings.k8s.io Local True 4d
v1alpha1.storage.k8s.io Local True 4d
v1beta1.admissionregistration.k8s.io Local True 4d
v1beta1.apiextensions.k8s.io Local True 4d
v1beta1.apps Local True 4d
v1beta1.authentication.k8s.io Local True 4d
v1beta1.authorization.k8s.io Local True 4d
v1beta1.batch Local True 4d
v1beta1.certificates.k8s.io Local True 4d
v1beta1.coordination.k8s.io Local True 4d
v1beta1.events.k8s.io Local True 4d
v1beta1.extensions Local True 4d
v1beta1.metrics.k8s.io kube-system/metrics-server True 3h
v1beta1.policy Local True 4d
v1beta1.rbac.authorization.k8s.io Local True 4d
v1beta1.scheduling.k8s.io Local True 4d
v1beta1.storage.k8s.io Local True 4d
v1beta2.apps Local True 4d
v2alpha1.batch Local True 4d
v2beta1.autoscaling Local True 4d
v2beta2.autoscaling Local True 4d

8、查看 metrics-server 输出的 metrics

8.1、通过 kube-apiserver 或 kubectl proxy 访问
8.2、直接使用 kubectl 命令访问
  • kubectl get –raw /apis/metrics.k8s.io/v1beta1/nodes
  • kubectl get –raw /apis/metrics.k8s.io/v1beta1/pods
  • kubectl get –raw /apis/metrics.k8s.io/v1beta1/nodes/
  • kubectl get –raw /apis/metrics.k8s.io/v1beta1/namespace//pods/
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
# kubectl get --raw "/apis/metrics.k8s.io/v1beta1" | jq .
{
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "metrics.k8s.io/v1beta1",
"resources": [
{
"name": "nodes",
"singularName": "",
"namespaced": false,
"kind": "NodeMetrics",
"verbs": [
"get",
"list"
]
},
{
"name": "pods",
"singularName": "",
"namespaced": true,
"kind": "PodMetrics",
"verbs": [
"get",
"list"
]
}
]
}

# kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes" | jq .
{
"kind": "NodeMetricsList",
"apiVersion": "metrics.k8s.io/v1beta1",
"metadata": {
"selfLink": "/apis/metrics.k8s.io/v1beta1/nodes"
},
"items": [
{
"metadata": {
"name": "172.21.16.204",
"selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/172.21.16.204",
"creationTimestamp": "2019-09-04T07:00:44Z"
},
"timestamp": "2019-09-04T07:00:40Z",
"window": "30s",
"usage": {
"cpu": "63788460n",
"memory": "1033152Ki"
}
},
{
"metadata": {
"name": "172.21.16.240",
"selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/172.21.16.240",
"creationTimestamp": "2019-09-04T07:00:44Z"
},
"timestamp": "2019-09-04T07:00:40Z",
"window": "30s",
"usage": {
"cpu": "41797865n",
"memory": "837420Ki"
}
},
{
"metadata": {
"name": "172.21.16.87",
"selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/172.21.16.87",
"creationTimestamp": "2019-09-04T07:00:44Z"
},
"timestamp": "2019-09-04T07:00:34Z",
"window": "30s",
"usage": {
"cpu": "37347688n",
"memory": "851232Ki"
}
}
]
}
  • /apis/metrics.k8s.io/v1beta1/nodes 和 /apis/metrics.k8s.io/v1beta1/pods 返回的 usage 包含 CPU 和 Memory;

使用 kubectl top

使用 kubectl top 命令查看集群节点资源使用情况,kubectl top 命令从 metrics-server 获取集群节点基本的指标信息

1
2
3
4
5
# kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
172.21.16.204 69m 1% 1008Mi 13%
172.21.16.240 41m 2% 817Mi 23%
172.21.16.87 39m 1% 831Mi 23%

metrics到这里就已经成功的部署,参数没有一一介绍,后期有时间在列出来

这里还有很多参考的文档没有一一列出来,主要是浏览器被关闭啦,感谢那些参考的文档,😊😊😊

坚持原创技术分享,您的支持将鼓励我继续创作!
0%