prometheus监控k8s集群

k8s集群监控之prometheus

prometheus官网:官网地址

1.1 Prometheus的特点

  • 多维度数据模型,使用时间序列数据库TSDB而不使用mysql。

  • 灵活的查询语言PromQL。

  • 不依赖分布式存储,单个服务器节点是自主的。

  • 主要基于HTTP的pull方式主动采集时序数据。

  • 也可通过pushgateway获取主动推送到网关的数据。

  • 通过服务发现或者静态配置来发现目标服务对象。

  • 支持多种多样的图表和界面展示,比如Grafana等。

1.2 基本原理

1.2.1 原理说明

Prometheus的基本原理是通过各种exporter提供的HTTP协议接口

周期性抓取被监控组件的状态,任意组件只要提供对应的HTTP接口就可以接入监控。

不需要任何SDK或者其他的集成过程,非常适合做虚拟化环境监控系统,比如VM、Docker、Kubernetes等。

互联网公司常用的组件大部分都有exporter可以直接使用,如Nginx、MySQL、Linux系统信息等。

1.2.2 架构图:

img

1.2.3 三大组件

  • Server 主要负责数据采集和存储,提供PromQL查询语言的支持。

  • Alertmanager 警告管理器,用来进行报警。

  • Push Gateway 支持临时性Job主动推送指标的中间网关。

1.2.4 架构服务过程

Prometheus Daemon负责定时去目标上抓取metrics(指标)数据
每个抓取目标需要暴露一个http服务的接口给它定时抓取。
支持通过配置文件、文本文件、Zookeeper、DNS SRV Lookup等方式指定抓取目标。

PushGateway用于Client主动推送metrics到PushGateway
而Prometheus只是定时去Gateway上抓取数据。
适合一次性、短生命周期的服务。

Prometheus在TSDB数据库存储抓取的所有数据
通过一定规则进行清理和整理数据,并把得到的结果存储到新的时间序列中。

Prometheus通过PromQL和其他API可视化地展示收集的数据。
支持Grafana、Promdash等方式的图表数据可视化。
Prometheus还提供HTTP API的查询方式,自定义所需要的输出。

Alertmanager是独立于Prometheus的一个报警组件
支持Prometheus的查询语句,提供十分灵活的报警方式。

1.2.5 常用的exporter

prometheus不同于zabbix,没有agent,使用的是针对不同服务的exporter

正常情况下,监控k8s集群及node,pod,常用的exporter有四个:

1.部署kube-state-metrics
我调度了多少个 replicas?现在可用的有几个?
多少个 Pod 是 running/stopped/terminated 状态?
Pod 重启了多少次?
我有多少 job 在运行中?
而这些则是 kube-state-metrics 提供的内容,它基于 client-go 开发,轮询 Kubernetes API,并将 Kubernetes的结构化信息转换为metrics。



​ #1.官网:我们k8s集群是1.22.x版本,所以选择v2.2.4,具体版本对应看官网
https://github.com/kubernetes/kube-state-metrics
https://github.com/kubernetes/kube-state-metrics/tree/v2.2.4/examples/standard

#注意:里面存放的是我们的yaml文件,自己去下载安装



​ #2.下载并推送到我们私有镜像仓库harbor上
​ [root@k8s-node01 ~]# mkdir /k8s-yaml/kube-state-metrics -p
​ [root@k8s-node01 ~]# cd /k8s-yaml/kube-state-metrics
​ [root@k8s-master01 kube-state-metrics]# ls
​ cluster-role-binding.yaml cluster-role.yaml deployment.yaml service-account.yaml service.yaml
​ #这里需要上传或者下载需要的镜像到每个节点
​ #3.创建资源
​ [root@k8s-master01 kube-state-metrics]# kubectl create -f .

#4.验证测试
[root@k8s-master01 ~]# kubectl get pod -n kube-system -o wide |grep kube-state-metrics
[root@k8s-master01 kube-state-metrics]# curl 172.161.125.55:8080/healthz
OK
2.部署node-exporter
node-exporter是监控node的,需要每个节点启动一个,所以使用ds控制器
# 官网地址:https://github.com/prometheus/node_exporter
作用:将宿主机的/proc   system  目录挂载给容器,是容器能获取node节点宿主机信息
[root@k8s-master01 ~]# docker pull prom/node-exporter:latest
[root@k8s-master01 ~]# mkdir -p /k8s-yaml/node-exporter/
[root@k8s-master01 ~]# cd /k8s-yaml/node-exporter/
[root@k8s-master01 node-exporter]# vim node-exporter-ds.yaml 
[root@k8s-master01 node-exporter]# cat node-exporter-ds.yaml 
kind: DaemonSet
apiVersion: apps/v1
metadata:
  name: node-exporter
  namespace: kube-system
  labels:
    daemon: "node-exporter"
    grafanak8sapp: "true"
spec:
  selector:
    matchLabels:
      daemon: "node-exporter"
      grafanak8sapp: "true"
  template:
    metadata:
      name: node-exporter
      labels:
        daemon: "node-exporter"
        grafanak8sapp: "true"
    spec:
      containers:
      - name: node-exporter
        image: prom/node-exporter:latest
        imagePullPolicy: IfNotPresent
        args:
        - --path.procfs=/host_proc
        - --path.sysfs=/host_sys
        ports:
        - name: node-exporter
          hostPort: 9100
          containerPort: 9100
          protocol: TCP
        volumeMounts:
        - name: sys
          readOnly: true
          mountPath: /host_sys
        - name: proc
          readOnly: true
          mountPath: /host_proc
      imagePullSecrets:
      - name: harbor
      restartPolicy: Always
      hostNetwork: true
      volumes:
      - name: proc
        hostPath: 
          path: /proc
          type: ""
      - name: sys
        hostPath:
          path: /sys
          type: "
[root@k8s-master01 node-exporter]# kubectl apply -f node-exporter-ds.yaml 
3. 部署cadvisor

收集k8s集群docker容器内部使用资源信息 blackbox-exporte
收集k8s集群docker容器服务是否存活

该exporter是通过和kubelet交互,取到Pod运行时的资源消耗情况,并将接口暴露给Prometheus。

  • cadvisor由于要获取每个node上的pod信息,因此也需要使用daemonset方式运行

  • cadvisor采用daemonset方式运行在node节点上,通过污点的方式排除master

  • 同时将部分宿主机目录挂载到本地,如docker的数据目录

​ #官网地址:https://github.com/google/cadvisor
https://github.com/google/cadvisor/tree/v0.42.0/deploy/kubernetes/base
​ [root@k8s-master01 ~]# mkdir /k8s-yaml/cadvisor && cd /k8s-yaml/cadvisor
​ [root@k8s-master01 cadvisor]# cat ds.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: cadvisor
namespace: kube-system
labels:
app: cadvisor
spec:
selector:
matchLabels:
name: cadvisor
template:
metadata:
labels:
name: cadvisor
spec:
hostNetwork: true
#------pod的tolerations与node的Taints配合,做POD指定调度----
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
#-------------------------------------
containers:
- name: cadvisor
image: gcr.io/cadvisor/cadvisor:v0.39.0
imagePullPolicy: IfNotPresent
volumeMounts:
- name: rootfs
mountPath: /rootfs
readOnly: true
- name: var-run
mountPath: /var/run
- name: sys
mountPath: /sys
readOnly: true
- name: docker
mountPath: /var/lib/docker
readOnly: true
ports:
- name: http
containerPort: 4194
protocol: TCP
readinessProbe:
tcpSocket:
port: 4194
initialDelaySeconds: 5
periodSeconds: 10
args:
- --housekeeping_interval=10s
- --port=4194
terminationGracePeriodSeconds: 30
volumes:
- name: rootfs
hostPath:
path: /
- name: var-run
hostPath:
path: /var/run
- name: sys
hostPath:
path: /sys
- name: docker
hostPath:
path: /data/docker

#修改运算节点软连接 (所有node节点)
#1.node01修改
[root@k8s-node01 ~]# mount -o remount,rw /sys/fs/cgroup/
[root@k8s-node01 ~]# ln -s /sys/fs/cgroup/cpu,cpuacct/ /sys/fs/cgroup/cpuacct,cpu
[root@k8s-node01 ~]# ll /sys/fs/cgroup/ | grep cpu

#2.node02修改
[root@k8s-node02 ~]# mount -o remount,rw /sys/fs/cgroup/
[root@k8s-node02 ~]# ln -s /sys/fs/cgroup/cpu,cpuacct/ /sys/fs/cgroup/cpuacct,cpu
[root@k8s-node02 ~]# ll /sys/fs/cgroup/ | grep cpu
lrwxrwxrwx 1 root root 11 Nov 17 13:41 cpu -> cpu,cpuacct
lrwxrwxrwx 1 root root 11 Nov 17 13:41 cpuacct -> cpu,cpuacct
lrwxrwxrwx 1 root root 27 Dec 19 01:52 cpuacct,cpu -> /sys/fs/cgroup/cpu,cpuacct/
dr-xr-xr-x 5 root root  0 Nov 17 13:41 cpu,cpuacct
dr-xr-xr-x 3 root root  0 Nov 17 13:41 cpuset

#注:这一步只是将原本的可读更改为了可读,可写
[root@k8s-master01 cadvisor]# kubectl apply -f daemonset.yaml 
daemonset.apps/cadvisor created
#镜像需要获得,课程中已经配置了镜像地址直接下载就好
[root@k8s-master01 cadvisor]# kubectl -n kube-system get pod -o wide|grep cadvisor
cadvisor-29nv5  1/1     Running   0  2m17s   192.168.1.112    k8s-master02   
cadvisor-lnpwj  1/1     Running   0  2m17s   192.168.1.114    k8s-node01     
cadvisor-wmr57  1/1     Running   0  2m17s   192.168.1.111    k8s-master01   
cadvisor-zcz78  1/1     Running   0  2m17s   192.168.1.115    k8s-node02

[root@k8s-master01 ~]# netstat -luntp|grep 4194
tcp6       0      0 :::4194 
4 部署blackbox-exporter 黑盒监控

Prometheus 官方提供的 exporter 之一,可以提供 http、dns、tcp、icmp 的监控数据采集

官方github地址:https://github.com/prometheus/blackbox_exporter
https://github.com/prometheus/blackbox_exporter/tree/v0.18.0/config/testdata
  • 准备工作目录



​ [root@k8s-master01 ~]# docker pull prom/blackbox-exporter:v0.18.0
​ [root@k8s-master01 ~]# mkdir -p /k8s-yaml/blackbox-exporter && cd /k8s-yaml/blackbox-exporter

  • 资源清单创建


​ [root@k8s-master01 blackbox-exporter]# vim blackbox-exporter-configmap.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
apiVersion: v1
kind: ConfigMap
metadata:
labels:
app: blackbox-exporter
name: blackbox-exporter
namespace: kube-system
data:
blackbox.yml: |-
modules:
http_2xx:
prober: http
timeout: 2s
http:
valid_http_versions: ["HTTP/1.1", "HTTP/2"]
valid_status_codes: [200,301,302,404]
method: GET
preferred_ip_protocol: "ip4"
tcp_connect:
prober: tcp
timeout: 2s

[root@k8s-master01 blackbox-exporter]# vim blackbox-exporter-deployment.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
  name: blackbox-exporter
  namespace: kube-system
  labels:
    app: blackbox-exporter
  annotations:
    kubernetes.io/replicationcontroller: Deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: blackbox-exporter
  template:
    metadata:
      labels:
        app: blackbox-exporter
    spec:
      volumes:
      - name: config
        configMap:
          name: blackbox-exporter
          defaultMode: 420
      containers:
      - name: blackbox-exporter
        image: prom/blackbox-exporter:v0.18.0
        imagePullPolicy: IfNotPresent
        args:
        - --config.file=/etc/blackbox_exporter/blackbox.yml
        - --log.level=info
        - --web.listen-address=:9115
        ports:
        - name: blackbox-port
          containerPort: 9115
          protocol: TCP
        resources:
          limits:
            cpu: 200m
            memory: 256Mi
          requests:
            cpu: 100m
            memory: 50Mi
        volumeMounts:
        - name: config
          mountPath: /etc/blackbox_exporter
        readinessProbe:
          tcpSocket:
            port: 9115
          initialDelaySeconds: 5
          timeoutSeconds: 5
          periodSeconds: 10
          successThreshold: 1
          failureThreshold: 3

[root@k8s-master01 blackbox-exporter]# vim blackbox-exporter-service.yaml
kind: Service
apiVersion: v1
metadata:
  name: blackbox-exporter
  namespace: kube-system
spec:
  type: NodePort
  selector:
    app: blackbox-exporter
  ports:
    - name: blackbox-port
      port: 9115
      targetPort: 9115
      nodePort: 10015
      protocol: TCP

#说明:这里也可以使用ingress-nginx进行转发
[root@k8s-master01 blackbox-exporter]# kubectl create -f .
configmap/blackbox-exporter created
deployment.apps/blackbox-exporter created
service/blackbox-exporter created
[root@k8s-master01 blackbox-exporter]# kubectl get pod -n kube-system
NAME                                       READY   STATUS    RESTARTS        AGE
blackbox-exporter-59fd868bfc-j4nfv         1/1     Running   0               2m37s
[root@k8s-master01 blackbox-exporter]# kubectl get svc -n kube-system
NAME                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                 
blackbox-exporter    NodePort    10.100.192.71   <none>        9115:10015/TCP          

#如果在创建的时候提示端口区间则去修改集群端口或者更改成30000以上的端口比如说30015
#然后浏览器去访问我们的IP:10015
[root@k8s-master01 blackbox-exporter]# curl 192.168.1.114:10015
..........
    <h1>Blackbox Exporter</h1>
..........
[root@k8s-master01 blackbox-exporter]#

2、部署prometheus server

2.1 准备prometheus server环境
官方dockerhub地址:https://hub.docker.com/r/prom/prometheus
官方github地址:https://github.com/prometheus/prometheus
https://github.com/prometheus/prometheus/tree/v2.32.0/config/testdata
当前最新版:2.32.0
#1.准备目录
[root@k8s-master01 ~]# mkdir /k8s-yaml/prometheus-server && cd /k8s-yaml/prometheus-server
  • 1.资源清单rabc


​ [root@k8s-master01 prometheus-server]# vim rbac.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/cluster-service: "true"
name: prometheus
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/cluster-service: "true"
name: prometheus
rules:
- apiGroups:
- ""
resources:
- nodes
- nodes/metrics
- services
- endpoints
- pods
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- configmaps
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/cluster-service: "true"
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: kube-system
2.2.准备deploment资源清单

--web.enable-lifecycle 启用远程热加载配置文件,配置文件改变后不用重启prometheus

curl -X POST http://localhost:9090/-/reload

storage.tsdb.min-block-duration=10m 只加载10分钟数据

storage.tsdb.retention=72h 保留72小时数据

[root@k8s-master01 prometheus-server]# vim depoyment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "5"
  labels:
    name: prometheus
  name: prometheus
  namespace: kube-system
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 7
  selector:
    matchLabels:
      app: prometheus
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      hostAliases:
      - ip: "192.168.1.111"
        hostnames:
        - "k8s-master01"
      - ip: "192.168.1.112"
        hostnames:
        - "k8s-master02"
      - ip: "192.168.1.114"
        hostnames:
        - "k8s-node01"
        hostnames:
      - ip: "192.168.1.115"
        hostnames:
        - "k8s-node02"
      containers:
      - name: prometheus
        image: prom/prometheus:v2.32.0
        imagePullPolicy: IfNotPresent
        command:
        - /bin/prometheus
        args:
        - --config.file=/data/etc/prometheus.yaml
        #- --storage.tsdb.path=/data/prom-db
        - --storage.tsdb.min-block-duration=10m
        - --storage.tsdb.retention=72h
        - --web.enable-lifecycle
        ports:
        - containerPort: 9090
          protocol: TCP
        volumeMounts:
        - mountPath: /data/
          name: data
        resources:
          requests:
            cpu: "1000m"
            memory: "1.5Gi"
          limits:
            cpu: "2000m"
            memory: "3Gi"
      imagePullSecrets:
      - name: harbor
      securityContext:
        runAsUser: 0
      serviceAccountName: prometheus
      volumes:
      - name: data
        nfs:
          path: /data/nfs-volume/prometheus/
          server: 192.168.1.115

#注意这里要在某一台机器上创建nfs存储,这里我是在192.168.1.115  k8s-node02创建的(生产环境可以使用其它比如说glusterfs存储都可以)
[root@k8s-node02 ~]# yum install nfs-utils -y
[root@k8s-node02 ~]# mkdir -p /data/nfs-volume/prometheus/
[root@k8s-node02 ~]# mkdir -p /data/nfs-volume/prometheus/etc
[root@k8s-node02 ~]# mkdir -p /data/nfs-volume/prometheus/prom-db
[root@k8s-node02 ~]# cat /etc/exports
/data/nfs-volume/prometheus/ * (rw,fsid=0,sync)
[root@k8s-node02 ~]# systemctl start nfs-server
[root@k8s-node02 ~]# systemctl enable nfs-server
  • 准备svc资源清单(准备完了service也可以使用ingress-nginx的方式配置域名访问,这里我没配置)


​ [root@k8s-master01 prometheus-server]# vim service.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
apiVersion: v1
kind: Service
metadata:
name: prometheus
namespace: kube-system
spec:
type: NodePort
ports:
- port: 9090
protocol: TCP
targetPort: 9090
nodePort: 10090
selector:
app: prometheus

#在master创建资源文件
[root@k8s-master01 prometheus-server]# kubectl create -f .
  • 创建prometheus配置文件 注意:因为设置了共享,所以文件创建在nfs 192.168.1.115这台机器

配置文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
[root@k8s-node02 ~]# cd /data/nfs-volume/prometheus/etc/
[root@k8s-node02 etc]# cat prometheus.yaml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https

- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
# 监听 Kubernetes 集群中的所有 Pod(其他可选角色如 node、service 等)。

relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
作用: 仅保留带有注解 prometheus.io/scrape: "true" Pod。
逻辑:
检查 Pod 的注解 prometheus.io/scrape 的值。
若值为 true,则保留该目标;否则丢弃。

- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
Pod 注解 prometheus.io/path 的值作为指标路径。
若注解为 prometheus.io/path: "/metrics",则目标的指标路径变为 http://<pod_ip>:<port>/metrics。

- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
组合 Pod IP 和注解 prometheus.io/port 的端口,生成完整的监控地址。
逻辑:
__address__ 默认是 pod_ip:container_port,但可能不符合实际监控端口。
通过正则 ([^:]+)(?::\d+)?;(\d+) 提取 Pod IP 和注解中的端口,格式化为 pod_ip:port。
示例: __address__ 10.1.2.3,注解 prometheus.io/port 8080,则最终地址为 10.1.2.3:8080。

- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
Pod 的所有标签(Labels)作为监控目标的标签
逻辑:
匹配 __meta_kubernetes_pod_label_<label_name> 的元标签,将其转换为 <label_name>: <label_value>。
示例: Pod 标签 app: nginx 会变为目标的标签 app: nginx

- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
显式添加 kubernetes_namespace kubernetes_pod_name 标签,便于筛选
逻辑:
将元标签 __meta_kubernetes_namespace __meta_kubernetes_pod_name 的值分别映射到自定义标签。

- job_name: 'kubernetes-kubelet'
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __address__
replacement: ${1}:10015

- job_name: 'kubernetes-cadvisor'
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __address__
replacement: ${1}:4194

- job_name: 'kubernetes-kube-state'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_grafanak8sapp]
regex: .*true.*
action: keep
- source_labels: ['__meta_kubernetes_pod_label_daemon', '__meta_kubernetes_pod_node_name']
regex: 'node-exporter;(.*)'
action: replace
target_label: nodename

- job_name: 'blackbox_http_pod_probe'
metrics_path: /probe
kubernetes_sd_configs:
- role: pod
params:
module: [http_2xx]
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_blackbox_scheme]
action: keep
regex: http
- source_labels: [__address__, __meta_kubernetes_pod_annotation_blackbox_port, __meta_kubernetes_pod_annotation_blackbox_path]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+);(.+)
replacement: $1:$2$3
target_label: __param_target
- action: replace
target_label: __address__
replacement: blackbox-exporter.kube-system:9115
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name

- job_name: 'blackbox_tcp_pod_probe'
metrics_path: /probe
kubernetes_sd_configs:
- role: pod
params:
module: [tcp_connect]
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_blackbox_scheme]
action: keep
regex: tcp
- source_labels: [__address__, __meta_kubernetes_pod_annotation_blackbox_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __param_target
- action: replace
target_label: __address__
replacement: blackbox-exporter.kube-system:9115
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name

- job_name: 'traefik'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
action: keep
regex: traefik
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
  • 启动Prometheus



​ [root@k8s-master01 prometheus-server]# ls
​ depoyment.yaml rbac.yaml service.yaml
​ [root@k8s-master01 prometheus-server]# kubectl apply -f .

2.3 浏览器验证

访问192.168.1.114:10090 ,如果能成功访问的话,表示启动成功

点击 status- >configuration就是我们的配置文件

img

  • 2.4使服务能被prometheus自动监控

点击 status- >targets,展示的就是我们在prometheus.yml中配置的job-name,这些targets基本可以满足我们收集数据的需求。

4个编号的job-name已经被发现并获取数据。

思考:如何将应用pod等监控进来

3.grafana部署

官方dockerhub地址:https://hub.docker.com/r/grafana/grafana
官方github地址:https://github.com/grafana/grafana
grafana官网:https://grafana.com/



​ #首先需要创建共享存储目录 192.168.1.115
​ [root@k8s-node02 grafana]# cat /etc/exports
​ /data/nfs-volume/prometheus/ * (rw,fsid=0,sync)
​ /data/nfs/grafana/ * (rw,sync)
​ [root@k8s-node02 grafana]# mkdir /data/nfs/grafana/ -p
​ [root@k8s-node02 grafana]# systemctl restart nfs

#创建grafana需要的yaml文件
[root@k8s-master01 ~]# mkdir /k8s-yaml/grafana && cd /k8s-yaml/grafana
[root@k8s-master01 grafana]# vim grafana.yaml
[root@k8s-master01 grafana]# cat grafana.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana
  namespace: kube-system
  labels:
    app: grafana
spec:
  replicas: 1
  selector:
    matchLabels:
      app: grafana
  template:
    metadata:
      labels:
        app: grafana
    spec:
      containers:
      - name: grafana
        image: grafana/grafana:latest
        ports:
        - containerPort: 3000
        volumeMounts:
---

apiVersion: v1
kind: Service
metadata:
  name: grafana-svc
  namespace: kube-system
spec:
  ports:
  - port: 3000
    targetPort: 3000
    nodePort: 3303
  type: NodePort
  selector:
    app: grafana
    
[root@k8s-master01 grafana]# kubectl create -f grafana.yaml 
deployment.apps/grafana created
service/grafana-svc created
[root@k8s-master01 grafana]# kubectl get pod -n kube-system |grep grafana
grafana-6dc6566c6b-44wsw                   1/1     Running       0               22s

11074 11670   8588  

prometheus监控k8s集群
https://blog.t-ao.cn/2025/05/12/prometheus监控k8s集群/
作者
TAO
发布于
2025年5月12日
许可协议