k8s v1.14 prometheus与grafana部署

介绍

        kube-prometheus是读取Metrcs、etcd、api的其中数据。

查看etcd的metrics输出信息

1
# curl --cacert /etc/kubernetes/ssl/ca.pem --cert /etc/etcd/ssl/etcd.pem --key /etc/etcd/ssl/etcd-key.pem https://172.21.17.30:2379/metrics

查看kube-apiserver的metrics信息

1
# kubectl get --raw /metrics

下载官方的yaml文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# git clone https://github.com/coreos/kube-prometheus

# cd kube-prometheus/manifests
# mkdir -p operator node-exporter alertmanager grafana kube-state-metrics prometheus serviceMonitor adapter

# mv *-serviceMonitor* serviceMonitor/
# mv grafana-* grafana/
# mv kube-state-metrics-* kube-state-metrics/
# mv alertmanager-* alertmanager/
# mv node-exporter-* node-exporter/
# mv prometheus-adapter* adapter/
# mv prometheus-* prometheus/

# cd setup/
# mv prometheus-operator* ../operator/
# mv 0namespace-namespace.yaml ../

# cd ..
# ls -ltrh
-rw-r--r-- 1 root root 60 Dec 3 17:45 0namespace-namespace.yaml
drwxr-xr-x 2 root root 219 Dec 3 17:46 grafana
drwxr-xr-x 2 root root 305 Dec 3 17:46 kube-state-metrics
drwxr-xr-x 2 root root 200 Dec 3 17:46 node-exporter
drwxr-xr-x 2 root root 4.0K Dec 3 17:47 operator
drwxr-xr-x 2 root root 149 Dec 4 13:40 alertmanager
drwxr-xr-x 2 root root 4.0K Dec 5 09:56 prometheus
drwxr-xr-x 2 root root 4.0K Dec 5 10:01 adapter
drwxr-xr-x 2 root root 4.0K Dec 5 11:55 serviceMonitor

部署

        部署前需要修改文件;

创建监控etcd secret

        etcd 监控要用到证书同时需要修改prometheus-prometheus.yaml。

1
2
3
4
# kubectl -n monitoring create secret generic etcd-certs --from-file=/etc/kubernetes/ssl/ca.pem --from-file=etcd-key.pem --from-file=etcd.pem

# kubectl get secret -n monitoring
etcd-certs Opaque 3 5d15h

修改prometheus-prometheus.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# cd prometheus/
# vim prometheus-prometheus.yaml
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
labels:
prometheus: k8s
name: k8s
namespace: monitoring
spec:
alerting:
alertmanagers:
- name: alertmanager-main
namespace: monitoring
port: web
baseImage: quay.io/prometheus/prometheus
nodeSelector:
kubernetes.io/os: linux
podMonitorSelector: {}
replicas: 2
# 添加etcd 证书
secrets:
- etcd-certs
resources:
requests:
memory: 400Mi
# 设置数据保留天数
retention: 7d
# 创建外部存储pvc
storage:
volumeClaimTemplate:
spec:
accessModes:
- ReadWriteOnce
metadata:
annotations:
storageclass.kubernetes.io/is-default-class: true
labels:
prometheus: prometheus-data-pvc
resources:
requests:
storage: 25Gi
ruleSelector:
matchLabels:
prometheus: k8s
role: alert-rules
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: prometheus-k8s
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector: {}
version: v2.11.0

        storageclass.kubernetes.io/is-default-class: true 是设置的默认动态存储,可以参考kube-nfs-动态存储

部署应用

        部署前吧adapter 目录下面的 prometheus-adapter-apiService.yaml 重命名,因为前面安装了metrics。如果这里在覆盖安装,就会导致metrics.k8s.io报错。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# kubectl apply -f 0namespace-namespace.yaml
namespace/monitoring created

# kubectl apply -f operator/
# kubectl -n monitoring get pod|grep operator
prometheus-operator-548c6dc45c-vz6l6 1/1 Running 0 40h

# kubectl apply -f adapter/
# kubectl apply -f alertmanager/
# kubectl apply -f node-exporter/
# kubectl apply -f kube-state-metrics/
# kubectl apply -f grafana/
# kubectl apply -f prometheus/
# kubectl apply -f serviceMonitor/

查看部署状态

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
# kubectl get all -n monitoring
NAME READY STATUS RESTARTS AGE
pod/alertmanager-main-0 2/2 Running 0 40h
pod/alertmanager-main-1 2/2 Running 0 40h
pod/alertmanager-main-2 2/2 Running 0 40h
pod/grafana-5db74b88f4-7mt8c 1/1 Running 0 40h
pod/kube-state-metrics-54f98c4687-mz5lq 3/3 Running 0 18h
pod/node-exporter-hb66c 2/2 Running 0 40h
pod/node-exporter-l2s8g 2/2 Running 0 40h
pod/node-exporter-sjbmg 2/2 Running 0 40h
pod/node-exporter-vw87m 2/2 Running 0 40h
pod/node-exporter-zr8fk 2/2 Running 0 40h
pod/node-exporter-zxcwl 2/2 Running 0 40h
pod/prometheus-adapter-8667948d79-tcz47 1/1 Running 0 18h
pod/prometheus-k8s-0 3/3 Running 1 20h
pod/prometheus-k8s-1 3/3 Running 1 20h
pod/prometheus-operator-548c6dc45c-vz6l6 1/1 Running 0 40h

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/alertmanager-main ClusterIP 10.254.101.249 <none> 9093/TCP 40h
service/alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 40h
service/etcd ClusterIP None <none> 2379/TCP 17h
service/grafana ClusterIP 10.254.214.6 <none> 3000/TCP 40h
service/kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 18h
service/node-exporter ClusterIP None <none> 9100/TCP 40h
service/prometheus-adapter ClusterIP 10.254.60.49 <none> 443/TCP 18h
service/prometheus-k8s ClusterIP 10.254.41.152 <none> 9090/TCP 40h
service/prometheus-operated ClusterIP None <none> 9090/TCP 20h
service/prometheus-operator ClusterIP None <none> 8080/TCP 40h

NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/node-exporter 6 6 6 6 6 kubernetes.io/os=linux 40h

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/grafana 1/1 1 1 40h
deployment.apps/kube-state-metrics 1/1 1 1 18h
deployment.apps/prometheus-adapter 1/1 1 1 18h
deployment.apps/prometheus-operator 1/1 1 1 40h

NAME DESIRED CURRENT READY AGE
replicaset.apps/grafana-5db74b88f4 1 1 1 40h
replicaset.apps/kube-state-metrics-54f98c4687 1 1 1 18h
replicaset.apps/prometheus-adapter-8667948d79 1 1 1 18h
replicaset.apps/prometheus-operator-548c6dc45c 1 1 1 40h

NAME READY AGE
statefulset.apps/alertmanager-main 3/3 40h
statefulset.apps/prometheus-k8s 2/2 20h

配置ingress

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
# cat >ingress-monitor.yaml <<EOF 
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: prometheus-web-ui
namespace: monitoring
annotations:
kubernetes.io/ingress.class: traefik
spec:
rules:
- host: prometheus.xxlaila.cn
http:
paths:
- path: /
backend:
serviceName: prometheus-k8s
servicePort: 9090
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: grafana-web-ui
namespace: monitoring
annotations:
kubernetes.io/ingress.class: traefik
spec:
rules:
- host: grafana.xxlaila.cn
http:
paths:
- path: /
backend:
serviceName: grafana
servicePort: 3000
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: alertmanager-web-ui
namespace: monitoring
annotations:
kubernetes.io/ingress.class: traefik
spec:
rules:
- host: alertmanager.xxlaila.cn
http:
paths:
- path: /
backend:
serviceName: alertmanager-main
servicePort: 9093
EOF

# kubectl apply -f ingress-monitor.yaml

         在浏览器打开域名即可访问

常用应用监控

         kubernetes 自身常见的监控有kube-apiserver、kube-scheduler、kube-controller-manager、etcd。node节点常见的有kubelet、kube-proxy。在serviceMonitor目录下面默认的文件只能满足kube-apiserver、kubelet两个,其他的修改单独修改文件才能监控。

         上面阐述的是集群是二进制方式安装,而不是以pod形式进行安装。

kube-scheduler监控

        kube-scheduler的service、endpoints不在kubernetes集群内,可以通过kubectl get ep -n kube-system 进行查看,修改 prometheus-serviceMonitorKubeScheduler.yaml,在该文件添加如下内容或者新起一个文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# cat >> prometheus-serviceMonitorKubeScheduler.yaml <<EOF
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: kube-scheduler
name: kube-scheduler
namespace: kube-system
spec:
clusterIP: None
ports:
- name: http-metrics
port: 10251
protocol: TCP
targetPort: 10251
sessionAffinity: None
type: ClusterIP
---
apiVersion: v1
kind: Endpoints
metadata:
labels:
k8s-app: kube-scheduler
name: kube-scheduler
namespace: kube-system
subsets:
- addresses:
- ip: 172.21.17.30
- ip: 172.21.17.31
- ip: 172.21.16.110
ports:
- name: http-metrics
port: 10251
protocol: TCP
EOF

# kubectl apply -f prometheus-serviceMonitorKubeScheduler.yaml

kube-controller-manager 监控

        kube-controller-manager修改,因为kubernetes 集群是采用ssl证书安装,默认的kube-controller-manager是没有使用ssl加密的,所以这里需要使用ssl证书,及https,否则不能监控。就会提示什么403、x509、400错误。

  • prometheus-serviceMonitorKubeControllerManager.yaml 修改

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    # cat prometheus-serviceMonitorKubeControllerManager.yaml
    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
    labels:
    k8s-app: kube-controller-manager
    name: kube-controller-manager
    namespace: monitoring
    spec:
    endpoints:
    - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    interval: 30s
    port: https-metrics
    scheme: https
    tlsConfig:
    insecureSkipVerify: true
    metricRelabelings:
    - action: drop
    regex: etcd_(debugging|disk|request|server).*
    sourceLabels:
    - __name__
    jobLabel: k8s-app
    namespaceSelector:
    matchNames:
    - kube-system
    selector:
    matchLabels:
    k8s-app: kube-controller-manager
  • 新建kube-controller-manager-service.yaml

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    # cat >kube-controller-manager-service.yaml <<EOF
    apiVersion: v1
    kind: Service
    metadata:
    labels:
    k8s-app: kube-controller-manager
    name: kube-controller-manager
    namespace: kube-system
    spec:
    clusterIP: None
    ports:
    - name: https-metrics
    port: 10252
    protocol: TCP
    targetPort: 10252
    sessionAffinity: None
    type: ClusterIP
    ---
    apiVersion: v1
    kind: Endpoints
    metadata:
    labels:
    k8s-app: kube-controller-manager
    name: kube-controller-manager
    namespace: kube-system
    subsets:
    - addresses:
    - ip: 172.21.17.30
    - ip: 172.21.17.31
    - ip: 172.21.16.110
    ports:
    - name: https-metrics
    port: 10252
    protocol: TCP
    EOF
  • 执行创建

    1
    2
    # kubectl apply -f prometheus-serviceMonitorKubeControllerManager.yaml
    # kubectl apply -f kube-controller-manager-service.yaml

etcd 监控

         etcd 不在k8s集群内部所以要创建Endpoints、Service

  • prometheus-serviceMonitoretcd.yaml
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    # cat > prometheus-serviceMonitoretcd.yaml<<EOF
    apiVersion: v1
    kind: Service
    metadata:
    labels:
    k8s-app: etcd
    name: etcd
    namespace: kube-system
    spec:
    clusterIP: None
    ports:
    - name: https-metrics
    port: 2379
    protocol: TCP
    targetPort: 2379
    sessionAffinity: None
    type: ClusterIP
    ---
    apiVersion: v1
    kind: Endpoints
    metadata:
    labels:
    k8s-app: etcd
    name: etcd
    namespace: kube-system
    subsets:
    - addresses:
    - ip: 172.21.17.30
    - ip: 172.21.17.31
    - ip: 172.21.16.110
    ports:
    - name: https-metrics
    port: 2379
    protocol: TCP
    ---
    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
    labels:
    app: etcd
    name: etcd
    namespace: monitoring
    spec:
    endpoints:
    - interval: 10s
    port: https-metrics
    scheme: https
    tlsConfig:
    caFile: /etc/prometheus/secrets/etcd-certs/ca.pem
    certFile: /etc/prometheus/secrets/etcd-certs/etcd.pem
    keyFile: /etc/prometheus/secrets/etcd-certs/etcd-key.pem
    insecureSkipVerify: true
    namespaceSelector:
    matchNames:
    - kube-system
    selector:
    matchLabels:
    k8s-app: etcd
    EOF

    # kubectl apply -f prometheus-serviceMonitoretcd.yaml

kube-proxy 监控

         kube-proxy的metrics收集端口为10249,可以查看kub-proxy的安装文档。使用的是http方式,不需要ssl加密

  • 新建 kube-proxy.yaml

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    # cat >  kube-proxy.yaml <<EOF
    apiVersion: v1
    kind: Service
    metadata:
    labels:
    k8s-app: kube-proxy
    name: kube-proxy
    namespace: kube-system
    spec:
    clusterIP: None
    ports:
    - name: http-metrics
    port: 10249
    protocol: TCP
    targetPort: 10249
    sessionAffinity: None
    type: ClusterIP
    ---
    apiVersion: v1
    kind: Endpoints
    metadata:
    labels:
    k8s-app: kube-proxy
    name: kube-proxy
    namespace: kube-system
    subsets:
    - addresses:
    - ip: 172.21.16.204
    - ip: 172.21.16.231
    - ip: ……
    ports:
    - name: http-metrics
    port: 10249
    protocol: TCP
    EOF
  • 新建 prometheus-serviceMonitorProxy.yaml

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    # cat > prometheus-serviceMonitorProxy.yaml <<EOF
    ---
    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
    labels:
    k8s-app: kube-proxy
    name: kube-proxy
    namespace: monitoring
    spec:
    endpoints:
    - interval: 30s
    port: http-metrics
    jobLabel: k8s-app
    namespaceSelector:
    matchNames:
    - kube-system
    selector:
    matchLabels:
    k8s-app: kube-proxy
    EOF
  • 执行创建

    1
    2
    # kubectl apply -f prometheus-serviceMonitorProxy.yaml
    # kubectl apply -f kube-proxy.yaml

traefik 监控

  • 新建prometheus-serviceMonitorTraefix.yaml
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    # cat > prometheus-serviceMonitorTraefix.yaml <<EOF
    ---
    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
    labels:
    k8s-app: traefik-ingress
    name: traefik-ingress
    namespace: monitoring
    spec:
    jobLabel: k8s-app
    endpoints:
    - port: admin #---设置为traefik 8080端口名称 admin
    interval: 30s
    selector:
    matchLabels:
    k8s-app: traefik-ingress
    namespaceSelector:
    matchNames:
    - kube-system
    EOF

    # kubectl apply -f prometheus-serviceMonitorTraefix.yaml

        前提是能打开traefix 的metrics页面,跟着我前面的文档安装,默认是开启的。

grafana 修改

        grafana默认安装后,需要安装插件,否则饼状图无法显示。而且我们还需要倒入官方的一些dashbord 模版,默认grafana安装如果pod 重建之后什么都没有了,这时候我们需要建立一个pvc,吧数据保存到磁盘里面,即使grafana重建之后数据还在。不受任何影响。

新建grafana-pvc.yaml

  • grafana-pvc.yaml
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    # cat > grafana-pvc.yaml <<EOF
    kind: PersistentVolumeClaim
    apiVersion: v1
    metadata:
    name: grafana-pvc
    namespace: monitoring
    spec:
    accessModes:
    - ReadWriteMany
    storageClassName: xxlaila-nfs-storage
    resources:
    requests:
    storage: 5Gi
    EOF

    # kubectl apply -f grafana-pvc.yaml

修改granfana-deployment.yaml

  • granfana-deployment.yaml
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    # 修改
    volumes:
    #- emptyDir: {}
    - name: grafana-storage
    persistentVolumeClaim:
    claimName: grafana-pvc
    - name: grafana-datasources
    # 新增
    - mountPath: /grafana-dashboard-definitions/0/grafana-dashboard-k8s-traefik-ingress
    name: grafana-dashboard-k8s-traefik-ingress
    readOnly: false
    - mountPath: /grafana-dashboard-definitions/0/grafana-dashboard-k8s-etcd-clusters-as-service
    name: grafana-dashboard-k8s-etcd-clusters-as-service
    readOnly: false
    - mountPath: /grafana-dashboard-definitions/0/grafana-dashboard-k8s-etcd-cluster-as-pod
    name: grafana-dashboard-k8s-etcd-cluster-as-pod
    readOnly: false
    - mountPath: /grafana-dashboard-definitions/0/grafana-dashboard-k8s-etcd-server
    name: grafana-dashboard-k8s-etcd-server
    readOnly: false

    # 新增
    - configMap:
    name: grafana-dashboard-k8s-etcd-clusters-as-service
    name: grafana-dashboard-k8s-etcd-clusters-as-service
    - configMap:
    name: grafana-dashboard-k8s-etcd-cluster-as-pod
    name: grafana-dashboard-k8s-etcd-cluster-as-pod
    - configMap:
    name: grafana-dashboard-k8s-etcd-server
    name: grafana-dashboard-k8s-etcd-server
    - configMap:
    name: grafana-dashboard-k8s-traefik-ingress
    name: grafana-dashboard-k8s-traefik-ingress

        上述新增值需要吧dashbord的模版倒入grafana-dashboardDefinitions.yaml文件里面,格式可以参考里面的格式,记住数据库需要修改,否则无法链接数据库,dashbord无法显示。

查看service、endpoints

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# kubectl get svc,endpoints -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/etcd ClusterIP None <none> 2379/TCP 3m41s
service/kube-controller-manager ClusterIP None <none> 10252/TCP 16h
service/kube-dns ClusterIP 10.254.0.2 <none> 53/UDP,53/TCP,9153/TCP 7d16h
service/kube-proxy ClusterIP None <none> 10249/TCP 37m
service/kube-scheduler ClusterIP None <none> 10251/TCP 18h
service/kubelet ClusterIP None <none> 10250/TCP 40h
service/kubernetes-dashboard NodePort 10.254.139.196 <none> 443:31417/TCP 6d18h
service/metrics-server ClusterIP 10.254.196.151 <none> 443/TCP 2d23h

NAME ENDPOINTS AGE
endpoints/etcd 172.21.16.110:2379,172.21.17.30:2379,172.21.17.31:2379 3m41s
endpoints/kube-controller-manager 172.21.16.110:10252,172.21.17.30:10252,172.21.17.31:10252 16h
endpoints/kube-dns 10.244.1.46:53,10.244.4.36:53,10.244.1.46:53 + 3 more... 7d16h
endpoints/kube-proxy 172.21.16.204:10249,172.21.16.231:10249,172.21.17.34:10249 + 3 more... 37m
endpoints/kube-scheduler 172.21.16.110:10251,172.21.17.30:10251,172.21.17.31:10251 7d16h
endpoints/kubelet 172.21.16.204:10255,172.21.16.231:10255,172.21.17.34:10255 + 15 more... 40h
endpoints/kubernetes-dashboard 10.244.6.27:8443 6d18h
endpoints/metrics-server 172.21.17.34:4443 2d23h

查看接口信息

1
2
3
4
5
6
# kubectl api-versions| grep monitoring
monitoring.coreos.com/v1

# kubectl get --raw "/apis/monitoring.coreos.com/v1"|jq .

# kubectl get --raw "/apis/monitoring.coreos.com/v1/servicemonitors"|jq .

查看验证

Prometheus 的Targets监控

img
img
img
img
img

granfa 查看

img
img
img

坚持原创技术分享,您的支持将鼓励我继续创作!
0%