当前位置：首页 > news >正文

Prometheus+Grafana+K8s构建监控告警系统

news 来源：原创 2025/8/24 18:18:22

一、技术介绍

Prometheus、Grafana及K8S服务发现详解
Prometheus简介

Prometheus是一个开源的监控系统和时间序列数据库，最初由SoundCloud开发，现已成为CNCF(云原生计算基金会)的毕业项目‌。它专注于实时监控和告警，特别适合云原生和分布式系统的监控‌。

Prometheus的核心功能
数据采集‌：通过Pull模型定期从目标服务拉取指标数据，支持HTTP端点、Pushgateway(用于短期任务)等多种采集方式‌
数据存储‌：使用高效的时间序列数据库(TSDB)存储指标数据，支持数据压缩和持久化‌
查询语言‌：提供强大的PromQL查询语言，用于分析和聚合时间序列数据‌
告警功能‌：支持基于PromQL的告警规则配置，告警信息可发送到Alertmanager进行分组、去重和路由‌
多维度数据模型‌：数据以键值对形式存储，支持多维度标签(Labels)，便于灵活查询和聚合‌
Prometheus的架构特点
采用HTTP协议周期性抓取被监控组件的状态，任何提供HTTP接口的组件都可以接入‌
不依赖分布式存储，单个服务器节点可直接工作‌
支持服务发现或静态配置发现目标‌
适用于以机器为中心的监控以及高度动态面向服务架构的监控‌
Grafana简介

Grafana是一个开源的分析和可视化平台，允许用户从各种后端源(包括Prometheus)可视化数据‌。它提供了动态且交互式的仪表板，用于展示监控数据‌。

Grafana的核心特性
可自定义仪表板‌：创建视觉丰富、互动性强的仪表板‌
数据源灵活性‌：支持广泛的数据来源，包括Prometheus、Elasticsearch和InfluxDB等‌
警报和通知‌：根据可视化指标定义和触发警报‌
查询构建器‌：简化对支持的后端查询的创建过程‌
通用性‌：不仅适用于展示Prometheus数据，也适用于其他数据可视化需求‌
Prometheus与Grafana在K8S中的协同

在Kubernetes(K8S)环境中，Prometheus和Grafana是两个非常流行的开源工具组合‌：

Prometheus负责收集K8S集群和容器化应用的指标数据‌
Grafana负责展示这些数据，通过仪表盘直观呈现系统运行状况‌
这种组合为开发者和运维人员提供了强大而灵活的监控解决方案‌
基于K8S的服务发现作用

在Kubernetes环境中，基于服务发现的功能对Prometheus监控至关重要：

自动发现监控目标‌：Prometheus可以自动发现K8S集群中的Pod、Service等资源作为监控目标‌
动态适应环境变化‌：当K8S集群中的服务扩缩容或更新时，服务发现机制能自动更新监控目标列表‌
简化配置管理‌：无需手动维护监控目标列表，减少配置工作量‌
支持多集群监控‌：通过服务发现机制，Prometheus可以监控多个K8S集群‌

Prometheus通过定期从静态配置的监控目标或基于服务发现自动配置的目标中拉取数据，新拉取到的数据会先存储在内存缓存区，当数据量超过配置阈值时，就会持久化到存储设备中‌。

二、实战部署node-exporter

[root@node1 ~]# ctr -n k8s.io images import node-exporter.tar.gz
unpacking docker.io/prom/node-exporter:v0.16.0 (sha256:efc8140e40b5c940d67056cb56d720ed66965eabe03865ab1595705f4f847009)...done

[root@master ~]# kubectl create ns monitor-sa
namespace/monitor-sa created
[root@master ~]# kubectl get ns
NAME              STATUS   AGE
default           Active   27m
kube-node-lease   Active   27m
kube-public       Active   27m
kube-system       Active   27m
monitor-sa        Active   4s

[root@master prometheus]# cat node-export.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: monitor-sa
labels:
    name: node-exporter
spec:
selector:
    matchLabels:
     name: node-exporter
template:
    metadata:
      labels:
        name: node-exporter
    spec:
      hostPID: true
      hostIPC: true
      hostNetwork: true
      containers:
      - name: node-exporter
        image: prom/node-exporter:v0.16.0
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 9100
        resources:
          requests:
            cpu: 0.15
        securityContext:
          privileged: true
        args:
        - --path.procfs
        - /host/proc
        - --path.sysfs
        - /host/sys
        - --collector.filesystem.ignored-mount-points
        - '"^/(sys|proc|dev|host|etc)($|/)"'
        volumeMounts:
        - name: dev
          mountPath: /host/dev
        - name: proc
          mountPath: /host/proc
        - name: sys
          mountPath: /host/sys
        - name: rootfs
          mountPath: /rootfs
      tolerations:
      - key: "node-role.kubernetes.io/control-plane"
        operator: "Exists"
        effect: "NoSchedule"
      volumes:
        - name: proc
          hostPath:
            path: /proc
        - name: dev
          hostPath:
            path: /dev
        - name: sys
          hostPath:
            path: /sys
        - name: rootfs
          hostPath:
            path: /

[root@master prometheus]# kubectl get pods -n monitor-sa
NAME                  READY   STATUS    RESTARTS   AGE
node-exporter-gwvcf   1/1     Running   0          3m9s
node-exporter-wzck7   1/1     Running   0          4m24s

[root@master prometheus]# curl 192.168.40.180:9100/metrics | grep node_load
% Total    % Received % Xferd Average Speed   Time    Time     Time Current
                                 Dload Upload   Total   Spent    Left Speed
100 81708 100 81708    0     0 10.3M      0 --:--:-- --:--:-- --:--:-- 11.1M
# HELP node_load1 1m load average.
# TYPE node_load1 gauge
node_load1 0.13
# HELP node_load15 15m load average.
# TYPE node_load15 gauge
node_load15 0.26
# HELP node_load5 5m load average.
# TYPE node_load5 gauge
node_load5 0.14

Kubernetes 中部署 Node Exporter 的详细解释
操作流程概述

这段代码展示了在 Kubernetes 集群中部署 Prometheus Node Exporter 的完整过程，主要包括以下几个步骤：

导入 Node Exporter 镜像到容器运行时
创建监控专用的命名空间
部署 Node Exporter 的 DaemonSet
验证 Pod 运行状态
详细解释
1. 导入 Node Exporter 镜像
bash
Copy Code
ctr -n k8s.io images import node-exporter.tar.gz
unpacking docker.io/prom/node-exporter:v0.16.0...done

ctr 是 containerd 容器运行时的命令行工具
-n k8s.io 指定了命名空间为 k8s.io（Kubernetes 使用的命名空间）
从 node-exporter.tar.gz 文件中导入镜像
解压并加载了 prom/node-exporter:v0.16.0 镜像
2. 创建监控命名空间
bash
Copy Code
kubectl create ns monitor-sa
kubectl get ns

创建了一个名为 monitor-sa 的命名空间
列出所有命名空间确认创建成功
命名空间用于隔离监控相关的资源
3. Node Exporter DaemonSet 配置
yaml
Copy Code
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: monitor-sa
labels:
name: node-exporter

定义了一个 DaemonSet 资源（确保每个节点运行一个 Pod 副本）
部署在 monitor-sa 命名空间
设置了 name: node-exporter 标签
yaml
Copy Code
spec:
selector:
    matchLabels:
      name: node-exporter
template:
    metadata:
      labels:
        name: node-exporter

使用标签选择器匹配 Pod
Pod 模板中也设置了相同的标签
yaml
Copy Code
spec:
hostPID: true
hostIPC: true
hostNetwork: true

hostPID: true - 使用主机 PID 命名空间，可以查看主机进程
hostIPC: true - 使用主机 IPC 命名空间
hostNetwork: true - 使用主机网络栈，直接暴露主机网络信息
yaml
Copy Code
containers:
- name: node-exporter
image: prom/node-exporter:v0.16.0
imagePullPolicy: IfNotPresent
ports:
- containerPort: 9100

使用 prom/node-exporter:v0.16.0 镜像
IfNotPresent 拉取策略表示本地有镜像就不从仓库拉取
暴露 9100 端口（Node Exporter 默认端口）
yaml
Copy Code
resources:
requests:
cpu: 0.15
securityContext:
privileged: true

请求 0.15 个 CPU 核心
以特权模式运行（需要访问主机系统信息）
yaml
Copy Code
args:
- --path.procfs
- /host/proc
- --path.sysfs
- /host/sys
- --collector.filesystem.ignored-mount-points
- '"^/(sys|proc|dev|host|etc)($|/)"'

指定 procfs 和 sysfs 的挂载路径
忽略特定挂载点的文件系统统计信息
yaml
Copy Code
volumeMounts:
- name: dev
mountPath: /host/dev
- name: proc
mountPath: /host/proc
- name: sys
mountPath: /host/sys
- name: rootfs
mountPath: /rootfs

挂载主机的 /dev、/proc、/sys 和 / 到容器内
使 Node Exporter 能访问主机系统信息
yaml
Copy Code
tolerations:
- key: "node-role.kubernetes.io/control-plane"
operator: "Exists"
effect: "NoSchedule"

容忍控制平面节点的污点
确保 Node Exporter 也能在 master/control-plane 节点上运行
yaml
Copy Code
volumes:
- name: proc
hostPath:
    path: /proc
- name: dev
hostPath:
    path: /dev
- name: sys
hostPath:
    path: /sys
- name: rootfs
hostPath:
    path: /

定义主机路径卷，映射主机系统目录到容器
4. 验证 Pod 运行状态
bash
Copy Code
kubectl get pods -n monitor-sa
NAME                    READY   STATUS    RESTARTS   AGE
node-exporter-gwvcf     1/1     Running   0          3m9s
node-exporter-wzck7     1/1     Running   0          4m24s

列出 monitor-sa 命名空间中的 Pod
显示两个 Node Exporter Pod 正常运行（假设集群有两个节点）
每个节点一个 Pod（DaemonSet 的特性）
总结

这段配置实现了：

在每个 Kubernetes 节点（包括控制平面节点）上部署一个 Node Exporter
Node Exporter 可以收集主机级别的监控指标（CPU、内存、磁盘、网络等）
通过 9100 端口暴露指标，供 Prometheus 抓取
使用适当的权限和挂载访问主机系统信息

这种部署方式是 Kubernetes 监控的常见模式，为集群提供了基础的主机级监控能力

三、实战部署Prometheus

[root@master prometheus]# kubectl create sa monitor -n monitor-sa

[root@master prometheus]# kubectl create clusterrolebinding monitor-binding --clusterrole=cluster-admin --serviceaccount=monitor-sa:monitor -n monitor-sa

[root@master prometheus]# kubectl create clusterrolebinding monitor-binding-2 --clusterrole=cluster-admin --user=system:serviceaccount:monitor:monitor-sa -n monitor-sa

[root@node1 ~]# mkdir -p /data
[root@node1 ~]# chmod -R 777 /data/
[root@node1 ~]# ls -ld /data/
drwxrwxrwx 2 root root 6 Mar 13 03:44 /data/

[root@master prometheus]# kubectl apply -f prometheus-cfg.yaml
configmap/prometheus-config created
[root@master prometheus]# cat prometheus-cfg.yaml
---
kind: ConfigMap
apiVersion: v1
metadata:
labels:
    app: prometheus
name: prometheus-config
namespace: monitor-sa
data:
prometheus.yml: |
    global:
      scrape_interval: 15s
      scrape_timeout: 10s
      evaluation_interval: 1m
    scrape_configs:
    - job_name: 'kubernetes-node'
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - source_labels: [__address__]
        regex: '(.*):10250'
        replacement: '${1}:9100'
        target_label: __address__
        action: replace
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
    - job_name: 'kubernetes-node-cadvisor'
      kubernetes_sd_configs:
      - role: node
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
    - job_name: 'kubernetes-apiserver'
      kubernetes_sd_configs:
      - role: endpoints
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: default;kubernetes;https
    - job_name: 'kubernetes-service-endpoints'
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
        action: replace
        target_label: __scheme__
        regex: (https?)
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
        action: replace
        target_label: __address__
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: kubernetes_name

[root@master prometheus]# cat prometheus-deploy.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-server
namespace: monitor-sa
labels:
    app: prometheus
spec:
replicas: 1
selector:
    matchLabels:
      app: prometheus
      component: server
    #matchExpressions:
    #- {key: app, operator: In, values: [prometheus]}
    #- {key: component, operator: In, values: [server]}
template:
    metadata:
      labels:
        app: prometheus
        component: server
      annotations:
        prometheus.io/scrape: 'false'
    spec:
      nodeName: node1
      serviceAccountName: monitor
      containers:
      - name: prometheus
        image: prom/prometheus:v2.33.5
        imagePullPolicy: IfNotPresent
        command:
          - prometheus
          - --config.file=/etc/prometheus/prometheus.yml
          - --storage.tsdb.path=/prometheus
          - --storage.tsdb.retention=720h
          - --web.enable-lifecycle
        ports:
        - containerPort: 9090
          protocol: TCP
        volumeMounts:
        - mountPath: /etc/prometheus
          name: prometheus-config
        - mountPath: /prometheus/
          name: prometheus-storage-volume
      volumes:
        - name: prometheus-config
          configMap:
            name: prometheus-config
        - name: prometheus-storage-volume
          hostPath:
           path: /data
           type: Directory

[root@master prometheus]# cat prometheus-svc.yaml
apiVersion: v1
kind: Service
metadata:
name: prometheus
namespace: monitor-sa
labels:
    app: prometheus
spec:
type: NodePort
ports:
    - port: 9090
      targetPort: 9090
      protocol: TCP
selector:
    app: prometheus
    component: server

[root@master prometheus]# kubectl get svc -n monitor-sa
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
prometheus NodePort 10.106.100.212 <none> 9090:31197/TCP 2m28s

Kubernetes 中部署 Prometheus 监控系统的详细解释
1. 创建服务账号和权限绑定
bash
Copy Code
# 创建服务账号
kubectl create sa monitor -n monitor-sa

# 绑定集群管理员角色到服务账号
kubectl create clusterrolebinding monitor-binding --clusterrole=cluster-admin --serviceaccount=monitor-sa:monitor -n monitor-sa

# 第二个权限绑定（可能有误，serviceaccount和namespace不匹配）
kubectl create clusterrolebinding monitor-binding-2 --clusterrole=cluster-admin --user=system:serviceaccount:monitor:monitor-sa -n monitor-sa

这部分代码创建了一个服务账号(ServiceAccount)并赋予其集群管理员权限：

monitor 服务账号将被Prometheus使用来访问Kubernetes API
授予了cluster-admin角色，使Prometheus能够访问所有资源（生产环境应考虑更细粒度的权限）
2. 准备存储目录
bash
Copy Code
mkdir -p /data
chmod -R 777 /data/

在节点上创建了/data目录并设置权限，这将作为Prometheus的持久化存储位置。

3. Prometheus配置(ConfigMap)

prometheus-cfg.yaml定义了一个ConfigMap，包含Prometheus的主配置文件：

yaml
Copy Code
global:
scrape_interval: 15s # 抓取间隔
scrape_timeout: 10s # 抓取超时
evaluation_interval: 1m # 规则评估间隔

scrape_configs:
# 监控Kubernetes节点
- job_name: 'kubernetes-node'
    kubernetes_sd_configs: [{role: node}]
    relabel_configs:
      - source_labels: [__address__]
        regex: '(.*):10250'
        replacement: '${1}:9100' # 将kubelet端口(10250)替换为node-exporter端口(9100)
        target_label: __address__
        action: replace
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+) # 保留节点标签

# 监控容器指标(cAdvisor)
- job_name: 'kubernetes-node-cadvisor'
    kubernetes_sd_configs: [{role: node}]
    scheme: https # 使用HTTPS
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    relabel_configs: [...]
    # 重写指标路径为cAdvisor端点

# 监控API Server
- job_name: 'kubernetes-apiserver'
    kubernetes_sd_configs: [{role: endpoints}]
    scheme: https
    tls_config: {...}
    relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: default;kubernetes;https # 只保留API Server的endpoint

# 监控服务端点
- job_name: 'kubernetes-service-endpoints'
    kubernetes_sd_configs: [{role: endpoints}]
    relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
        action: keep
        regex: true # 只抓取注解了prometheus.io/scrape=true的服务
      # 其他重标签配置...

4. Prometheus部署(Deployment)

prometheus-deploy.yaml定义了Prometheus的部署：

yaml
Copy Code
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-server
namespace: monitor-sa
spec:
replicas: 1
selector:
    matchLabels:
      app: prometheus
      component: server
template:
    metadata:
      labels:
        app: prometheus
        component: server
      annotations:
        prometheus.io/scrape: 'false' # 避免监控自己
    spec:
      nodeName: node1 # 指定部署节点
      serviceAccountName: monitor # 使用之前创建的服务账号
      containers:
      - name: prometheus
        image: prom/prometheus:v2.33.5
        command:
          - prometheus
          - --config.file=/etc/prometheus/prometheus.yml # 配置文件路径
          - --storage.tsdb.path=/prometheus # 数据存储路径
          - --storage.tsdb.retention=720h # 数据保留30天
          - --web.enable-lifecycle # 启用配置热加载
        ports:
        - containerPort: 9090 # Prometheus Web端口
        volumeMounts:
        - mountPath: /etc/prometheus
          name: prometheus-config # 挂载配置
        - mountPath: /prometheus/
          name: prometheus-storage-volume # 挂载数据卷
      volumes:
      - name: prometheus-config
        configMap:
          name: prometheus-config # 使用之前创建的ConfigMap
      - name: prometheus-storage-volume
        hostPath:
          path: /data # 使用节点上的/data目录
          type: Directory

5. Prometheus服务(Service)

prometheus-svc.yaml(未完整显示)通常用于创建Service，暴露Prometheus的Web界面。

总结

这套配置实现了:

使用Kubernetes原生服务发现自动监控集群组件
监控节点指标、容器指标(cAdvisor)和API Server
通过注解(prometheus.io/scrape)选择性监控服务
使用主机路径实现数据持久化
通过ConfigMap管理配置，支持热更新

这种部署方式为Kubernetes集群提供了全面的监控能力，是云原生监控的典型实现

四、部署grafana

[root@master prometheus]# kubectl apply -f grafana.yaml
deployment.apps/monitoring-grafana configured
service/monitoring-grafana unchanged
[root@master prometheus]# cat grafana.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: monitoring-grafana
namespace: kube-system
spec:
replicas: 1
selector:
    matchLabels:
      task: monitoring
      k8s-app: grafana
template:
    metadata:
      labels:
        task: monitoring
        k8s-app: grafana
    spec:
      nodeName:
      containers:
      - name: grafana
        image: grafana/grafana:8.4.5
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 3000
          protocol: TCP
        volumeMounts:
        - mountPath: /etc/ssl/certs
          name: ca-certificates
          readOnly: true
        - mountPath: /var
          name: grafana-storage
        - mountPath: /var/lib/grafana/
          name: lib
        env:
        - name: INFLUXDB_HOST
          value: monitoring-influxdb
        - name: GF_SERVER_HTTP_PORT
          value: "3000"
          # The following env variables are required to make Grafana accessible via
          # the kubernetes api-server proxy. On production clusters, we recommend
          # removing these env variables, setup auth for grafana, and expose the grafana
          # service using a LoadBalancer or a public IP.
        - name: GF_AUTH_BASIC_ENABLED
          value: "false"
        - name: GF_AUTH_ANONYMOUS_ENABLED
          value: "true"
        - name: GF_AUTH_ANONYMOUS_ORG_ROLE
          value: Admin
        - name: GF_SERVER_ROOT_URL
          # If you're only using the API Server proxy, set this value instead:
          # value: /api/v1/namespaces/kube-system/services/monitoring-grafana/proxy
          value: /
      volumes:
      - name: ca-certificates
        hostPath:
          path: /etc/ssl/certs
      - name: grafana-storage
        hostPath:
          path: /var/lib/grafana-storage
          type: DirectoryOrCreate
      - name: lib
        hostPath:
         path: /var/lib/grafana/
         type: DirectoryOrCreate
---
apiVersion: v1
kind: Service
metadata:
labels:
    # For use as a Cluster add-on (https://github.com/kubernetes/kubernetes/tree/master/cluster/addons)
    # If you are NOT using this as an addon, you should comment out this line.
    kubernetes.io/cluster-service: 'true'
    kubernetes.io/name: monitoring-grafana
name: monitoring-grafana
namespace: kube-system
spec:
# In a production setup, we recommend accessing Grafana through an external Loadbalancer
# or through a public IP.
# type: LoadBalancer
# You could also use NodePort to expose the service at a randomly-generated port
# type: NodePort
ports:
- port: 80
    targetPort: 3000
selector:
    k8s-app: grafana
type: NodePort

grafana.yaml 文件解析

这个 grafana.yaml 文件是一个 Kubernetes 资源定义文件，用于在 Kubernetes 集群中部署 Grafana 监控仪表盘。文件包含两个部分：一个 Deployment 和一个 Service。

Deployment 部分
apiVersion‌: apps/v1 表示使用的 Kubernetes API 版本。
kind‌: Deployment 表示这是一个部署资源。
metadata‌:
name‌: 部署的名称，这里是 monitoring-grafana。
namespace‌: 部署所在的命名空间，这里是 kube-system。
spec‌:
replicas‌: 副本数量为 1，表示只会部署一个 Grafana 实例。
selector‌: 用于选择哪些 Pod 属于这个部署。
template‌: Pod 的模板。
metadata‌: Pod 的元数据，包括标签。
spec‌: Pod 的规格。
nodeName‌: 指定 Pod 部署在哪个节点上，这里是 xianchaonode1。
containers‌:
name‌: 容器的名称，这里是 grafana。
image‌: 容器使用的镜像，这里是 grafana/grafana:8.4.5。
imagePullPolicy‌: 镜像拉取策略，这里是 IfNotPresent，表示如果镜像已经存在则不拉取。
ports‌: 容器暴露的端口，这里是 TCP 协议的 3000 端口。
volumeMounts‌: 挂载的卷。
env‌: 环境变量，用于配置 Grafana。
volumes‌: 定义的卷。
Service 部分
apiVersion‌: v1 表示使用的 Kubernetes API 版本。
kind‌: Service 表示这是一个服务资源。
metadata‌:
labels‌: 服务的标签。
name‌: 服务的名称，这里是 monitoring-grafana。
namespace‌: 服务所在的命名空间，这里是 kube-system。
spec‌:
ports‌: 服务暴露的端口，这里是 80 端口，目标端口是 Grafana 容器的 3000 端口。
selector‌: 用于选择哪些 Pod 作为服务的后端，这里是选择标签 k8s-app: grafana 的 Pod。
type‌: 服务的类型，这里是 NodePort，表示服务会在每个节点的随机端口上暴露，并且可以通过 <NodeIP>:<NodePort> 的方式访问。
总结

这个文件定义了一个 Grafana 部署，它部署在 xianchaonode1 节点上，使用 grafana/grafana:8.4.5 镜像，并且暴露了一个 NodePort 类型的服务，可以通过集群节点的 IP 和随机分配的端口访问 Grafana 的 Web 界面。文件中还配置了一些环境变量，用于设置 Grafana 的配置，例如 InfluxDB 的主机名、Grafana 的 HTTP 端口、认证设置等。

注意‌：在生产环境中，通常建议使用 LoadBalancer 或公共 IP 来暴露 Grafana 服务，并配置适当的认证和授权机制来保护 Grafana 的访问。

相关文章：