使用Helm3安装Prometheus-Operator + InfluxDB

在上篇文章中，Prometheus-Operator 手动入门实战，已经手动安装了Prometheus-Operator并且手动安装了Prometheus、Alertmanager、ServiceMonitor、PrometheusRule这些资源，明白的运行原理，但是实际生产中，需要监控的有很多包括Kubernetes集群本身，以及跑在Kubernetes集群之上的应用，都需要我们手动的去添加监控非常的麻烦，所以这篇文章主要记录使用Helm3安装prometheus-operator，来减轻手动的繁琐步骤。

本文采用的环境以及版本：

Kubernetes 1.15.5 + Calico v3.6.5
Helm v3.0.0
InfluxDB Chart Version：3.0.1，对应的InfluxDB版本：1.7.6
Prometheus-Operator Chart Version：8.2.2 对应的是 Prometheus-Operator版本： v0.34.0
所有的都安装到monitoring 命名空间中

前提依赖

Kubernetes 测试环境，参考：使用kubeadm快速安装kubernetes 1.15.5测试环境
Helm3 已经安装完毕并配置好repo，参考：抢先试用Helm3，并安装nginx-ingress-controller

生产环境要考虑Prometheus的持久化存储问题，当然可以参考前面的Prometheus-Operator 手动入门实战配置storage。不过当有多台prometheus server的时候数据的备份就很麻烦，所以这里使用influxDB作为prometheus server的remote-read，remote-write。

InfluxDB 也是比较流行的时序数据库，参考官网

摘自官方：https://prometheus.io/docs/operating/integrations

File Service Discovery
For service discovery mechanisms not natively supported by Prometheus, file-based service discovery provides an interface for integrating.

Docker Swarm

Packet

Scaleway

Remote Endpoints and Storage
The remote write and remote read features of Prometheus allow transparently sending and receiving samples. This is primarily intended for long term storage. It is recommended that you perform careful evaluation of any solution in this space to confirm it can handle your data volumes.

AppOptics: write

Azure Data Explorer: read and write

Chronix: write

Cortex: read and write

CrateDB: read and write

Elasticsearch: write

Gnocchi: write

Graphite: write

InfluxDB: read and write

IRONdb: read and write

Kafka: write

M3DB: read and write

OpenTSDB: write

PostgreSQL/TimescaleDB: read and write

SignalFx: write

Splunk: read and write

TiKV: read and write

Thanos: write

VictoriaMetrics: write

Wavefront: write

Alertmanager Webhook Receiver
For notification mechanisms not natively supported by the Alertmanager, the webhook receiver allows for integration.

Alertsnitch: saves alerts to a MySQL database

AWS SNS

DingTalk 这里有钉钉的

GELF

IRC Bot

JIRAlert

Phabricator / Maniphest

prom2teams: forwards notifications to Microsoft Teams

ServiceNow

SMS: supports multiple providers 还有短信

SNMP traps

Telegram bot

XMPP Bot

Zoom

Management
Prometheus does not include configuration management functionality, allowing you to integrate it with your existing systems or build on top of it.

Prometheus Operator: Manages Prometheus on top of Kubernetes

Promgen: Web UI and configuration generator for Prometheus and Alertmanager

Other

karma: alert dashboard

PushProx: Proxy to transverse NAT and similar network setups

Promregator: discovery and scraping for Cloud Foundry applications

Helm3 安装InfluxDB

在下载或者安装Chart的时候一定要记得更新Helm repo：

$ helm repo update

搜索influxDB：

$ helm search repo influxdb
NAME            	CHART VERSION	APP VERSION	DESCRIPTION                                       
stable/influxdb 	3.0.1        	1.7.6      	Scalable datastore for metrics, events, and rea...
stable/kapacitor	1.1.3        	1.5.2      	InfluxDB's native data processing engine. It ca...

因为我们有很多的参数要定义，所以先下载下来，然后更改values.yaml

$ helm pull stable/influxdb
$ tar xf influxdb-3.0.1.tgz

备份默认的values，方便回滚：

$ cd influxdb
$ cp values.yaml{,.ori}

更改values.yaml，如下，只截取了更改的部分：

persistence:
  enabled: true
  ## influxdb data Persistent Volume Storage Class
  ## If defined, storageClassName: <storageClass>
  ## If set to "-", storageClassName: "", which disables dynamic provisioning
  ## If undefined (the default) or set to null, no storageClassName spec is
  ##   set, choosing the default provisioner.  (gp2 on AWS, standard on
  ##   GKE, AWS & OpenStack)
  ##
  storageClass: rook-ceph-rbd # 指定storageClass，我这里没有storageClass，所以这里我关闭了。
  annotations:
  accessMode: ReadWriteOnce
  size: 8Gi
resources: # 资源建议稍微大些，否则查询很慢，可以安装后根据Grafana的大屏的值，再修改
  requests:
    memory: 2Gi
    cpu: 0.5
  limits:
    memory: 3Gi
    cpu: 1
ingress: # 开启ingress
  enabled: true
  tls: false
  # secretName: my-tls-cert # 如果需要开启tls的请取消注释，并手动创建该secret
  hostname: influxdb.test.aws.test.com
  annotations:
    kubernetes.io/ingress.class: "nginx"

env: #在这里设置管理员admin密码，新加数据库，新加用户名密码，默认新加用户会有新加数据库的所有权限；注意：不要再setDefaultUser里设定，setDefaultUser里设定的话只能新加管理员，其他的数据库，新用户都创建不了。
  - name: INFLUXDB_ADMIN_ENABLED
    value: "true"
  - name: INFLUXDB_ADMIN_USER
    value: "admin"
  - name: INFLUXDB_ADMIN_PASSWORD
    value: "Adm1n123"
  - name: INFLUXDB_DB
    value: "prometheus"
  - name: INFLUXDB_USER
    value: "prometheus"
  - name: INFLUXDB_USER_PASSWORD
    value: "Pr0m123"
config:
  data:
    max_series_per_database: 0
    max_values_per_tag: 0
  admin:
    enabled: true
  http:
    auth_enabled: true
initScripts: # 设定数据保留策略，默认是一直保留，需要手动清理，这里设置成保留180天。
  enabled: true
  scripts:
    retention.iql: |+
      CREATE RETENTION POLICY "prometheus_retention_policy" on "prometheus" DURATION 180d REPLICATION 1 DEFAULT

注意：

上面的配置是：

开启influxdb auth认证
创建admin用户，密码为Adm1n123
创建prometheus数据库
创建用户prometheus，密码为Pr0m123；此时prometheus用户对prometheus数据库有读写权限
对数据库prometheus设置保留策略：保留180天

警告：

警告，经过测试后，如果开启auth，并通过设定setDefaultUser（prometheus job）去设定管理员的用户名密码，此时管理员用户名密码是设置OK了，但是并没有创建数据库，也没有设置用户。所以这里建议不要用setDefaultUser这个，用env的方式，但是使用env的方式，如果进入到容器内是可以通过env命令查看到这些敏感信息的，需要做好pod/exec 权限。

参考InfluxDB helm chart

参考InfluxDB docker

参考InfluxDB dockerfile

安装influxdb：

$ kubectl create ns monitoring
$ cd influxdb # 进入influxdb helm chart目录
$ helm install influxdb ./ --namespace monitoring
NAME: influxdb
LAST DEPLOYED: Thu Nov 21 18:04:11 2019
NAMESPACE: monitoring
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
InfluxDB can be accessed via port 8086 on the following DNS name from within your cluster:

- http://influxdb.monitoring:8086

You can easily connect to the remote instance with your local influx cli. To forward the API port to localhost:8086 run the following:

- kubectl port-forward --namespace monitoring $(kubectl get pods --namespace monitoring -l app=influxdb -o jsonpath='{ .items[0].metadata.name }') 8086:8086

You can also connect to the influx cli from inside the container. To open a shell session in the InfluxDB pod run the following:

- kubectl exec -i -t --namespace monitoring $(kubectl get pods --namespace monitoring -l app=influxdb -o jsonpath='{.items[0].metadata.name}') /bin/sh

To tail the logs for the InfluxDB pod run the following:

- kubectl logs -f --namespace monitoring $(kubectl get pods --namespace monitoring -l app=influxdb -o jsonpath='{ .items[0].metadata.name }')

# 验证是否启动
$ kubectl -n monitoring get all
NAME             READY   STATUS    RESTARTS   AGE
pod/influxdb-0   1/1     Running   0          31m

NAME               TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
service/influxdb   ClusterIP   10.100.50.188   <none>        8086/TCP,8083/TCP,8088/TCP   31m

NAME                        READY   AGE
statefulset.apps/influxdb   1/1     31m

登陆到influxdb-0这个pod，验证一下上面的定义是否都正常：

$ kubectl -n monitoring exec -it influxdb-0 -- bash
bash-4.4# influx
Connected to http://localhost:8086 version 1.7.6
InfluxDB shell version: 1.7.6
Enter an InfluxQL query
> show databases # 查看数据库列表，发现没有权限
ERR: unable to parse authentication credentials
Warning: It is possible this error is due to not setting a database.
Please set a database with the command "use <database>".

> auth admin Adm1n123  # 使用admin账户登陆，再次查看数据库
> show databases # 可以看到已经创建了prometheus数据库
name: databases
name
----
prometheus
_internal

> show users  # 查看所有用户
user       admin
----       -----
admin      true # 管理员
prometheus false # prometheus用户，非管理员

> show grants for prometheus # 查看prometheus的权限，对prometheus数据库有所有权限
database   privilege
--------   ---------
prometheus ALL PRIVILEGE

> use prometheus # 使用数据库prometheus
Using database prometheus
> show retention policies # 查询prometheus数据库的数据保留策略。4320/24=180
name                        duration  shardGroupDuration replicaN default
----                        --------  ------------------ -------- -------
autogen                     0s        168h0m0s           1        false
prometheus_retention_policy 4320h0m0s 168h0m0s           1        true

设置都是正常的，具体的influxdb的操作请参考：

然后检查一下influxdb的配置文件：

bash-4.4# cat /etc/influxdb/influxdb.conf 
reporting-disabled = false
bind-address = ":8088"

[meta]
  dir = "/var/lib/influxdb/meta"
  retention-autocreate = true
  logging-enabled = true

[data]
  dir = "/var/lib/influxdb/data"
  wal-dir = "/var/lib/influxdb/wal"
  query-log-enabled = true
  cache-max-memory-size = 1073741824
  cache-snapshot-memory-size = 26214400
  cache-snapshot-write-cold-duration = "10m0s"
  compact-full-write-cold-duration = "4h0m0s"
  max-series-per-database = 0
  max-values-per-tag = 0
  trace-logging-enabled = false

[coordinator]
  write-timeout = "10s"
  max-concurrent-queries = 0
  query-timeout = "0s"
  log-queries-after = "0s"
  max-select-point = 0
  max-select-series = 0
  max-select-buckets = 0

[retention]
  enabled = true
  check-interval = "30m0s"

[shard-precreation]
  enabled = true
  check-interval = "10m0s"
  advance-period = "30m0s"

[admin]
  enabled = true
  bind-address = ":8083"
  https-enabled = false
  https-certificate = "/etc/ssl/influxdb.pem"

[monitor]
  store-enabled = true
  store-database = "_internal"
  store-interval = "10s"

[subscriber]
  enabled = true
  http-timeout = "30s"
  insecure-skip-verify = false
  ca-certs = ""
  write-concurrency = 40
  write-buffer-size = 1000

[http]
  enabled = true
  bind-address = ":8086"
  flux-enabled = true
  auth-enabled = true # 已经开启auth，必须通过认证才能进行操作
  log-enabled = true
  write-tracing = false
  pprof-enabled = true
  https-enabled = false
  https-certificate = "/etc/ssl/influxdb.pem"
  https-private-key = ""
  max-row-limit = 10000
  max-connection-limit = 0
  shared-secret = "beetlejuicebeetlejuicebeetlejuice"
  realm = "InfluxDB"
  unix-socket-enabled = false
  bind-socket = "/var/run/influxdb.sock"

# TODO: allow multiple graphite listeners

[[graphite]]
  enabled = false
  bind-address = ":2003"
  database = "graphite"
  retention-policy = "autogen"
  protocol = "tcp"
  batch-size = 5000
  batch-pending = 10
  batch-timeout = "1s"
  consistency-level = "one"
  separator = "."
  udp-read-buffer = 0

# TODO: allow multiple collectd listeners with templates

[[collectd]]
  enabled = false
  bind-address = ":25826"
  database = "collectd"
  retention-policy = "autogen"
  batch-size = 5000
  batch-pending = 10
  batch-timeout = "10s"
  read-buffer = 0
  typesdb = "/usr/share/collectd/types.db"
  security-level = "none"
  auth-file = "/etc/collectd/auth_file"

# TODO: allow multiple opentsdb listeners with templates

[[opentsdb]]
  enabled = false
  bind-address = ":4242"
  database = "opentsdb"
  retention-policy = "autogen"
  consistency-level = "one"
  tls-enabled = false
  certificate = "/etc/ssl/influxdb.pem"
  batch-size = 1000
  batch-pending = 5
  batch-timeout = "1s"
  log-point-errors = true

# TODO: allow multiple udp listeners with templates

[[udp]]
  enabled = false
  bind-address = ":8089"
  database = "udp"
  retention-policy = "autogen"
  batch-size = 5000
  batch-pending = 10
  read-buffer = 0
  batch-timeout = "1s"
  precision = "ns"

[continuous_queries]
  log-enabled = true
  enabled = true
  run-interval = "1s"

[logging]
  format =  "auto"
  level =  "info"
  supress-logo = false

注意：开源版本的InfluxDB（InfluxDB OSS）不支持集群模式，生产要记得备份：备份参考官网

InfluxDB 已经安装完毕，下面开始安装prometheus-operator。

Helm3 安装prometheus-operator

因为我们有很多的自定义参数要更改，同样先要将prometheus-operator的chart下载下来：

更新Helm repo：

$ helm repo update

然后搜索prometheus-operator:

$ helm search repo prometheus-operator
NAME                      	CHART VERSION	APP VERSION	DESCRIPTION                                       
stable/prometheus-operator	8.2.2        	0.34.0     	Provides easy monitoring definitions for Kubern...

下载prometheus-operator chart：

$ helm pull stable/prometheus-operator
$ tar xf prometheus-operator-8.2.2.tgz
$ ls
prometheus-operator                 prometheus-operator-8.2.2.tgz

首先我们备份一下默认的values.yaml文件：

$ cd prometheus-operator
$ cp values.yaml{,.ori}
$ ls
charts  Chart.yaml  CONTRIBUTING.md  crds  README.md  requirements.lock  requirements.yaml  templates  values.yaml  values.yaml.ori

看一下整个Chart的目录结构吧：

$ tree ./
./
├── charts
│   ├── grafana # 依赖：安装Grafana 大屏
│   │   ├── Chart.yaml
│   │   ├── ci
│   │   │   ├── default-values.yaml
│   │   │   ├── with-dashboard-json-values.yaml
│   │   │   └── with-dashboard-values.yaml
│   │   ├── dashboards
│   │   │   └── custom-dashboard.json
│   │   ├── README.md
│   │   ├── templates
│   │   │   ├── clusterrolebinding.yaml
│   │   │   ├── clusterrole.yaml
│   │   │   ├── configmap-dashboard-provider.yaml
│   │   │   ├── configmap.yaml
│   │   │   ├── dashboards-json-configmap.yaml
│   │   │   ├── deployment.yaml
│   │   │   ├── headless-service.yaml
│   │   │   ├── _helpers.tpl
│   │   │   ├── ingress.yaml
│   │   │   ├── NOTES.txt
│   │   │   ├── poddisruptionbudget.yaml
│   │   │   ├── podsecuritypolicy.yaml
│   │   │   ├── _pod.tpl
│   │   │   ├── pvc.yaml
│   │   │   ├── rolebinding.yaml
│   │   │   ├── role.yaml
│   │   │   ├── secret-env.yaml
│   │   │   ├── secret.yaml
│   │   │   ├── serviceaccount.yaml
│   │   │   ├── service.yaml
│   │   │   ├── statefulset.yaml
│   │   │   └── tests
│   │   │       ├── test-configmap.yaml
│   │   │       ├── test-podsecuritypolicy.yaml
│   │   │       ├── test-rolebinding.yaml
│   │   │       ├── test-role.yaml
│   │   │       ├── test-serviceaccount.yaml
│   │   │       └── test.yaml
│   │   └── values.yaml
│   ├── kube-state-metrics # 依赖：安装kube-state-metrics
│   │   ├── Chart.yaml
│   │   ├── OWNERS
│   │   ├── README.md
│   │   ├── templates
│   │   │   ├── clusterrolebinding.yaml
│   │   │   ├── clusterrole.yaml
│   │   │   ├── deployment.yaml
│   │   │   ├── _helpers.tpl
│   │   │   ├── NOTES.txt
│   │   │   ├── podsecuritypolicy.yaml
│   │   │   ├── psp-clusterrolebinding.yaml
│   │   │   ├── psp-clusterrole.yaml
│   │   │   ├── serviceaccount.yaml
│   │   │   ├── servicemonitor.yaml
│   │   │   └── service.yaml
│   │   └── values.yaml
│   └── prometheus-node-exporter # 依赖：安装node-exporter
│       ├── Chart.yaml
│       ├── OWNERS
│       ├── README.md
│       ├── templates
│       │   ├── daemonset.yaml
│       │   ├── endpoints.yaml
│       │   ├── _helpers.tpl
│       │   ├── monitor.yaml
│       │   ├── NOTES.txt
│       │   ├── psp-clusterrolebinding.yaml
│       │   ├── psp-clusterrole.yaml
│       │   ├── psp.yaml
│       │   ├── serviceaccount.yaml
│       │   └── service.yaml
│       └── values.yaml
├── Chart.yaml
├── CONTRIBUTING.md
├── crds # 所需的CRD
│   ├── crd-alertmanager.yaml
│   ├── crd-podmonitor.yaml
│   ├── crd-prometheusrules.yaml
│   ├── crd-prometheus.yaml
│   └── crd-servicemonitor.yaml
├── README.md
├── requirements.lock
├── requirements.yaml
├── templates # 本chart 的模板文件
│   ├── alertmanager # alertmanager的
│   │   ├── alertmanager.yaml
│   │   ├── ingress.yaml
│   │   ├── podDisruptionBudget.yaml
│   │   ├── psp-clusterrolebinding.yaml
│   │   ├── psp-clusterrole.yaml
│   │   ├── psp.yaml
│   │   ├── secret.yaml
│   │   ├── serviceaccount.yaml
│   │   ├── servicemonitor.yaml
│   │   └── service.yaml
│   ├── exporters # 监控k8s的组件：servicemonitor
│   │   ├── core-dns
│   │   │   ├── servicemonitor.yaml
│   │   │   └── service.yaml
│   │   ├── kube-api-server
│   │   │   └── servicemonitor.yaml
│   │   ├── kube-controller-manager
│   │   │   ├── endpoints.yaml
│   │   │   ├── servicemonitor.yaml
│   │   │   └── service.yaml
│   │   ├── kube-dns
│   │   │   ├── servicemonitor.yaml
│   │   │   └── service.yaml
│   │   ├── kube-etcd
│   │   │   ├── endpoints.yaml
│   │   │   ├── servicemonitor.yaml
│   │   │   └── service.yaml
│   │   ├── kubelet
│   │   │   └── servicemonitor.yaml
│   │   ├── kube-proxy
│   │   │   ├── endpoints.yaml
│   │   │   ├── servicemonitor.yaml
│   │   │   └── service.yaml
│   │   ├── kube-scheduler
│   │   │   ├── endpoints.yaml
│   │   │   ├── servicemonitor.yaml
│   │   │   └── service.yaml
│   │   ├── kube-state-metrics
│   │   │   └── serviceMonitor.yaml
│   │   └── node-exporter
│   │       └── servicemonitor.yaml
│   ├── grafana # Grafana 的大屏配置文件，以及搜集Grafana自身的serviceMonitor
│   │   ├── configmap-dashboards.yaml
│   │   ├── configmaps-datasources.yaml
│   │   ├── dashboards
│   │   │   ├── etcd.yaml
│   │   │   ├── k8s-cluster-rsrc-use.yaml
│   │   │   ├── k8s-node-rsrc-use.yaml
│   │   │   ├── k8s-resources-cluster.yaml
│   │   │   ├── k8s-resources-namespace.yaml
│   │   │   ├── k8s-resources-pod.yaml
│   │   │   ├── k8s-resources-workloads-namespace.yaml
│   │   │   ├── k8s-resources-workload.yaml
│   │   │   ├── nodes.yaml
│   │   │   ├── persistentvolumesusage.yaml
│   │   │   ├── pods.yaml
│   │   │   └── statefulset.yaml
│   │   ├── dashboards-1.14
│   │   │   ├── apiserver.yaml
│   │   │   ├── cluster-total.yaml
│   │   │   ├── controller-manager.yaml
│   │   │   ├── etcd.yaml
│   │   │   ├── k8s-coredns.yaml
│   │   │   ├── k8s-resources-cluster.yaml
│   │   │   ├── k8s-resources-namespace.yaml
│   │   │   ├── k8s-resources-node.yaml
│   │   │   ├── k8s-resources-pod.yaml
│   │   │   ├── k8s-resources-workloads-namespace.yaml
│   │   │   ├── k8s-resources-workload.yaml
│   │   │   ├── kubelet.yaml
│   │   │   ├── namespace-by-pod.yaml
│   │   │   ├── namespace-by-workload.yaml
│   │   │   ├── node-cluster-rsrc-use.yaml
│   │   │   ├── node-rsrc-use.yaml
│   │   │   ├── nodes.yaml
│   │   │   ├── persistentvolumesusage.yaml
│   │   │   ├── pods.yaml
│   │   │   ├── pod-total.yaml
│   │   │   ├── prometheus-remote-write.yaml
│   │   │   ├── prometheus.yaml
│   │   │   ├── proxy.yaml
│   │   │   ├── scheduler.yaml
│   │   │   ├── statefulset.yaml
│   │   │   └── workload-total.yaml
│   │   └── servicemonitor.yaml
│   ├── _helpers.tpl
│   ├── NOTES.txt
│   ├── prometheus # prometheus实例
│   │   ├── additionalAlertmanagerConfigs.yaml
│   │   ├── additionalAlertRelabelConfigs.yaml
│   │   ├── additionalPrometheusRules.yaml
│   │   ├── additionalScrapeConfigs.yaml
│   │   ├── clusterrolebinding.yaml
│   │   ├── clusterrole.yaml
│   │   ├── ingressperreplica.yaml
│   │   ├── ingress.yaml
│   │   ├── podDisruptionBudget.yaml
│   │   ├── podmonitors.yaml
│   │   ├── prometheus.yaml
│   │   ├── psp-clusterrolebinding.yaml
│   │   ├── psp-clusterrole.yaml
│   │   ├── psp.yaml
│   │   ├── rules # prometheusrule
│   │   │   ├── alertmanager.rules.yaml
│   │   │   ├── etcd.yaml
│   │   │   ├── general.rules.yaml
│   │   │   ├── k8s.rules.yaml
│   │   │   ├── kube-apiserver.rules.yaml
│   │   │   ├── kube-prometheus-node-alerting.rules.yaml
│   │   │   ├── kube-prometheus-node-recording.rules.yaml
│   │   │   ├── kubernetes-absent.yaml
│   │   │   ├── kubernetes-apps.yaml
│   │   │   ├── kubernetes-resources.yaml
│   │   │   ├── kubernetes-storage.yaml
│   │   │   ├── kubernetes-system.yaml
│   │   │   ├── kube-scheduler.rules.yaml
│   │   │   ├── node-network.yaml
│   │   │   ├── node.rules.yaml
│   │   │   ├── node-time.yaml
│   │   │   ├── prometheus-operator.yaml
│   │   │   └── prometheus.rules.yaml
│   │   ├── rules-1.14
│   │   │   ├── alertmanager.rules.yaml
│   │   │   ├── etcd.yaml
│   │   │   ├── general.rules.yaml
│   │   │   ├── k8s.rules.yaml
│   │   │   ├── kube-apiserver.rules.yaml
│   │   │   ├── kube-prometheus-node-recording.rules.yaml
│   │   │   ├── kubernetes-absent.yaml
│   │   │   ├── kubernetes-apps.yaml
│   │   │   ├── kubernetes-resources.yaml
│   │   │   ├── kubernetes-storage.yaml
│   │   │   ├── kubernetes-system-apiserver.yaml
│   │   │   ├── kubernetes-system-controller-manager.yaml
│   │   │   ├── kubernetes-system-kubelet.yaml
│   │   │   ├── kubernetes-system-scheduler.yaml
│   │   │   ├── kubernetes-system.yaml
│   │   │   ├── kube-scheduler.rules.yaml
│   │   │   ├── node-exporter.rules.yaml
│   │   │   ├── node-exporter.yaml
│   │   │   ├── node-network.yaml
│   │   │   ├── node.rules.yaml
│   │   │   ├── node-time.yaml
│   │   │   ├── prometheus-operator.yaml
│   │   │   └── prometheus.yaml
│   │   ├── serviceaccount.yaml
│   │   ├── servicemonitors.yaml
│   │   ├── servicemonitor.yaml
│   │   ├── serviceperreplica.yaml
│   │   └── service.yaml
│   └── prometheus-operator # 安装prometheus-operator
│       ├── admission-webhooks
│       │   ├── job-patch
│       │   │   ├── clusterrolebinding.yaml
│       │   │   ├── clusterrole.yaml
│       │   │   ├── job-createSecret.yaml
│       │   │   ├── job-patchWebhook.yaml
│       │   │   ├── psp.yaml
│       │   │   ├── rolebinding.yaml
│       │   │   ├── role.yaml
│       │   │   └── serviceaccount.yaml
│       │   ├── mutatingWebhookConfiguration.yaml
│       │   └── validatingWebhookConfiguration.yaml
│       ├── cleanup-crds.yaml
│       ├── clusterrolebinding.yaml
│       ├── clusterrole.yaml
│       ├── crds.yaml
│       ├── deployment.yaml
│       ├── psp-clusterrolebinding.yaml
│       ├── psp-clusterrole.yaml
│       ├── psp.yaml
│       ├── serviceaccount.yaml
│       ├── servicemonitor.yaml
│       └── service.yaml
├── values.yaml
└── values.yaml.ori

33 directories, 229 files

直接编辑values.yaml，下面截取更改的部分：

# alertmanger 的配置
alertmanager:
  config:
    global:
      resolve_timeout: 5m
      smtp_from: alert@test.com
      smtp_smarthost: smtphm.qiye.163.com:465
      smtp_hello: alert@test.com
      smtp_auth_username: alert@test.com
      smtp_auth_password: xxxxxxx
      smtp_require_tls: false
      # wechat
      wechat_api_secret: Pxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxc
      wechat_api_corp_id: wxxxxxxxxx7
    route:
      group_by: ['job','alertname','instance']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 12h
      receiver: 'email-receiver'
      routes:
      - match_re:
          job: pushgw|grafana
        receiver: 'wechat-receiver'
    receivers:
    - name: 'email-receiver'
      email_configs:
      - to: xxxxxxx@test.com
        send_resolved: true
    - name: 'wechat-receiver'
      wechat_configs:
      - send_resolved: true
        agent_id: 1xxxxxx3
        to_user: '@all'
  ingress:
    enabled: true
    annotations:
      kubernetes.io/ingress.class: "nginx"
    hosts:
      - prometheus.test.aws.test.com
    ## Paths to use for ingress rules - one path should match the alertmanagerSpec.routePrefix
    ##
    paths:
    - /alertmanager
    tls: [] # 如果需要开启，删除[]，并取消注释西面的行，并配置好。
    # - secretName: alertmanager-general-tls
    #   hosts:
    #   - alertmanager.example.com
  alertmanagerSpec:
    image:
      repository: quay.azk8s.cn/prometheus/alertmanager  
    logFormat: json
    replicas: 3
    retention: 24h
    externalUrl: http://prometheus.test.aws.test.com/alertmanager
    routePrefix: /alertmanager
    resources:
      requests:
        memory: 256Mi
        cpu: 100m
      limits:
        memory: 256Mi
        cpu: 100m

# grafana 的配置
grafana:
  adminPassword: xxxxxxx
  ingress:
    enabled: true
    annotations:
      kubernetes.io/ingress.class: nginx
    hosts:
      - grafana.test.aws.microoak.cn
    path: /
    tls: [] # 如果需要开启，删除[]，并取消注释西面的行，并配置好。
    # - secretName: grafana-general-tls
    #   hosts:
    #   - grafana.example.com

# 监控项配置：
kubeEtcd:
  enabled: true
  service:
    port: 2381
    targetPort: 2381


# prometheus operator配置
prometheusOperator:
  image:
    repository: quay.azk8s.cn/coreos/prometheus-operator
  configmapReloadImage:
    repository: quay.azk8s.cn/coreos/configmap-reload
  prometheusConfigReloaderImage:
    repository: quay.azk8s.cn/coreos/prometheus-config-reloader
  hyperkubeImage:
    repository: gcr.azk8s.cn/google-containers/hyperkube
  resources:
    limits:
      cpu: 500m
      memory: 500Mi
    requests:
      cpu: 100m
      memory: 100Mi
      
# prometheus 配置
prometheus:
  ingress:
    enabled: true
    annotations:
      kubernetes.io/ingress.class: "nginx"
    hosts:
      - prometheus.test.aws.test.com
    paths:
      - /
    tls: [] # 如果需要开启，删除[]，并取消注释西面的行，并配置好。
      # - secretName: prometheus-general-tls
      #   hosts:
      #     - prometheus.example.com
  prometheusSpec:
    image:
      repository: quay.azk8s.cn/prometheus/prometheus
    retention: 1d
    logFormat: json
    remoteRead:
      - url: "http://influxdb:8086/api/v1/prom/read?db=prometheus&u=prometheus&p=Pr0m123"
    remoteWrite:
      - url: "http://influxdb:8086/api/v1/prom/write?db=prometheus&u=prometheus&p=Pr0m123"
    remoteWriteDashboards: true
    resources:
      requests:
        memory: 400Mi
        cpu: 0.5
      limits:
        memory: 800Mi
        cpu: 0.8

首先安装一下CRD，以免安装prometheus operator的时候报错：

$ cd prometheus-operator
$ kubectl apply -f crds/

然后再安装prometheus-operator，并禁用创建CRD，参考，注意是在monitoring 空间下：

$ cd prometheus-operator
$ helm install prometheus --namespace=monitoring ./ --set prometheusOperator.createCustomResource=false

检查状态：

$ kubectl -n monitoring get all
NAME                                                         READY   STATUS      RESTARTS   AGE
pod/alertmanager-prometheus-prometheus-oper-alertmanager-0   2/2     Running     0          8m18s
pod/alertmanager-prometheus-prometheus-oper-alertmanager-1   2/2     Running     0          8m18s
pod/alertmanager-prometheus-prometheus-oper-alertmanager-2   2/2     Running     0          8m18s
pod/influxdb-0                                               1/1     Running     0          15h
pod/prometheus-grafana-c89877b8c-k87pb                       2/2     Running     0          13m
pod/prometheus-kube-state-metrics-57d6c55b56-qjpdx           1/1     Running     0          13m
pod/prometheus-prometheus-node-exporter-frs6j                1/1     Running     0          13m
pod/prometheus-prometheus-node-exporter-ktpzj                1/1     Running     0          13m
pod/prometheus-prometheus-node-exporter-r2ngs                1/1     Running     0          13m
pod/prometheus-prometheus-oper-admission-patch-9l55v         0/1     Completed   0          13m
pod/prometheus-prometheus-oper-operator-9568b7df6-4nrhw      2/2     Running     0          13m
pod/prometheus-prometheus-prometheus-oper-prometheus-0       3/3     Running     1          8m8s


NAME                                              TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
service/alertmanager-operated                     ClusterIP   None             <none>        9093/TCP,9094/TCP,9094/UDP   8m18s
service/influxdb                                  ClusterIP   10.100.50.188    <none>        8086/TCP,8083/TCP,8088/TCP   15h
service/prometheus-grafana                        ClusterIP   10.100.76.213    <none>        80/TCP                       13m
service/prometheus-kube-state-metrics             ClusterIP   10.100.229.32    <none>        8080/TCP                     13m
service/prometheus-operated                       ClusterIP   None             <none>        9090/TCP                     8m8s
service/prometheus-prometheus-node-exporter       ClusterIP   10.100.59.29     <none>        9100/TCP                     13m
service/prometheus-prometheus-oper-alertmanager   ClusterIP   10.100.81.192    <none>        9093/TCP                     13m
service/prometheus-prometheus-oper-operator       ClusterIP   10.100.225.101   <none>        8080/TCP,443/TCP             13m
service/prometheus-prometheus-oper-prometheus     ClusterIP   10.100.200.80    <none>        9090/TCP                     13m

NAME                                                 DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/prometheus-prometheus-node-exporter   3         3         3       3            3           <none>          13m

NAME                                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/prometheus-grafana                    1/1     1            1           13m
deployment.apps/prometheus-kube-state-metrics         1/1     1            1           13m
deployment.apps/prometheus-prometheus-oper-operator   1/1     1            1           13m

NAME                                                            DESIRED   CURRENT   READY   AGE
replicaset.apps/prometheus-grafana-c89877b8c                    1         1         1       13m
replicaset.apps/prometheus-kube-state-metrics-57d6c55b56        1         1         1       13m
replicaset.apps/prometheus-prometheus-oper-operator-9568b7df6   1         1         1       13m

NAME                                                                    READY   AGE
statefulset.apps/alertmanager-prometheus-prometheus-oper-alertmanager   3/3     8m18s
statefulset.apps/influxdb                                               1/1     15h
statefulset.apps/prometheus-prometheus-prometheus-oper-prometheus       1/1     8m8s


NAME                                                   COMPLETIONS   DURATION   AGE
job.batch/prometheus-prometheus-oper-admission-patch   1/1           5m32s      13m

通过之前设定的ingress地址，打开prometheus server的web界面查看所有的target是否正常：

有两个失败的：

etcd
kube-proxy

对于etcd：

默认存放在/etc/kubernetes/manifests/

$ cd /etc/kubernetes/manifests/
$ ll
total 16K
-rw------- 1 root root 1.9K Nov 12 13:30 etcd.yaml
-rw------- 1 root root 2.6K Nov 14 10:35 kube-apiserver.yaml
-rw------- 1 root root 2.7K Nov 14 10:44 kube-controller-manager.yaml
-rw------- 1 root root 1012 Nov 12 13:30 kube-scheduler.yaml

此kubernetes版本(1.15.5)中还没有增加–listen-metrics-urls=http://127.0.0.1:2381，但是对于kubernetes 1.16.3已经增加了这个参数，因为我们在values.yaml中指定了etcd的metrics端口为2381，所以在这里增加参数如下：

$ sudo vim etcd.yaml
spec:
  containers:
  - command:
    - etcd
    - --advertise-client-urls=https://172.17.0.7:2379
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --client-cert-auth=true
    - --data-dir=/var/lib/etcd
    - --initial-advertise-peer-urls=https://172.17.0.7:2380
    - --initial-cluster=k8s01.test.awsbj.cn=https://172.17.0.7:2380
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --listen-client-urls=https://127.0.0.1:2379,https://172.17.0.7:2379
    - --listen-peer-urls=https://172.17.0.7:2380
    - --listen-metrics-urls=http://0.0.0.0:2381 # 增加这个参数然后保存，kubelet会自动应用，etcd会重新部署
    - --name=k8s01.test.awsbj.cn
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-client-cert-auth=true
    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --snapshot-count=10000
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt

再次检查target etcd状态，发现已经OK了。

对于kube-proxy：

我们检查一下这台机器的端口：

$ sudo netstat -lnutp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name   
tcp        0      0 127.0.0.1:10249         0.0.0.0:*               LISTEN      31604/kube-proxy
...

默认是监听在127.0.0.1的，我们进入pod容器内查看一下kube-proxy的命令帮助：

--bind-address 0.0.0.0                         The IP address for the proxy server to serve on (set to 0.0.0.0 for all IPv4 interfaces and `::` for all IPv6 interfaces) (default 0.0.0.0)
--healthz-bind-address 0.0.0.0                 The IP address for the health check server to serve on (set to 0.0.0.0 for all IPv4 interfaces and `::` for all IPv6 interfaces) (default 0.0.0.0:10256)
--healthz-port int32                           The port to bind the health check server. Use 0 to disable. (default 10256)
--metrics-bind-address 0.0.0.0                 The IP address for the metrics server to serve on (set to 0.0.0.0 for all IPv4 interfaces and `::` for all IPv6 interfaces) (default 127.0.0.1:10249)
--metrics-port int32                           The port to bind the metrics server. Use 0 to disable. (default 10249)

检查发现：–metrics-bind-address默认就是127.0.0.1，找到问题后我们更改一下kube-proxy的yaml文件：

它的yaml默认是在kubernetes中kube-system命名空间中daemonset.apps/kube-proxy定义的：

spec:
  containers:
  - command:
    - /usr/local/bin/kube-proxy
    - --config=/var/lib/kube-proxy/config.conf
    - --hostname-override=$(NODE_NAME)
    - --metrics-bind-address=0.0.0.0 # 增加一行

保存应用，稍等后再次查看prometheus target中kube-proxy是否OK，我们发现还是不行的。

我们注意到kube-proxy的命令行启动时指定了个配置文件：–config=/var/lib/kube-proxy/config.conf，默认是通过configMap挂载的，我们查看一下kube-proxy的configMap：

$ kubectl -n kube-system get cm kube-proxy -o yaml
apiVersion: v1
data:
  config.conf: |-
    apiVersion: kubeproxy.config.k8s.io/v1alpha1
    bindAddress: 0.0.0.0
    clientConnection:
      acceptContentTypes: ""
      burst: 10
      contentType: application/vnd.kubernetes.protobuf
      kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
      qps: 5
    clusterCIDR: 10.101.0.0/16
    configSyncPeriod: 15m0s
    conntrack:
      maxPerCore: 32768
      min: 131072
      tcpCloseWaitTimeout: 1h0m0s
      tcpEstablishedTimeout: 24h0m0s
    enableProfiling: false
    healthzBindAddress: 0.0.0.0:10256
    hostnameOverride: ""
    iptables:
      masqueradeAll: false
      masqueradeBit: 14
      minSyncPeriod: 0s
      syncPeriod: 30s
    ipvs:
      excludeCIDRs: null
      minSyncPeriod: 0s
      scheduler: ""
      strictARP: false
      syncPeriod: 30s
    kind: KubeProxyConfiguration
    metricsBindAddress: 127.0.0.1:10249 # 发现这里还是127.0.0.1，我们将这里更改成：0.0.0.0:10249
    mode: ipvs
    nodePortAddresses: null
    oomScoreAdj: -999
    portRange: ""
    resourceContainer: /kube-proxy
    udpIdleTimeout: 250ms
    winkernel:
      enableDSR: false
      networkName: ""
      sourceVip: ""

$ kubectl -n kube-system edit cm kube-proxy

然后删除kube-proxy的pod，重新应用新的配置。

$ kubectl -n kube-system delete po -l k8s-app=kube-proxy

再次检查OK了：

对于prometheus 中target kube-proxy无法链接的问题，直接修改kube-proxy对应的configMap即可，修改启动参数是不生效的。

参考

所有的target都正常了：

查看Grafana所有大屏列表：

查看InfluxDB的资源占用：可以依据这里定义influxdb的资源占用，以免造成性能不够的问题。

好了，通过Helm安装InfluxDB + Prometheus-Operator已经完成了，下面我们添加一下对InfluxDB的metrics的搜集，还记得怎么定义的嘛？参考：Prometheus-Operator 手动入门实战:

我们先来检查一下：Helm已经安装好了Prometheus资源定义，看看通过哪些Lables去匹配serviceMonitor和podMonitor的：

$ kubectl -n monitoring get prometheuses.monitoring.coreos.com prometheus-prometheus-oper-prometheus -o yaml --export
Flag --export has been deprecated, This flag is deprecated and will be removed in future.
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  generation: 1
  labels:
    app: prometheus-operator-prometheus
    chart: prometheus-operator-8.2.2
    heritage: Helm
    release: prometheus
  name: prometheus-prometheus-oper-prometheus
  selfLink: /apis/monitoring.coreos.com/v1/namespaces/monitoring/prometheuses/prometheus-prometheus-oper-prometheus
spec:
  alerting:
    alertmanagers:
    - name: prometheus-prometheus-oper-alertmanager
      namespace: monitoring
      pathPrefix: /alertmanager
      port: web
  baseImage: quay.azk8s.cn/prometheus/prometheus
  enableAdminAPI: false
  externalUrl: http://prometheus.test.aws.microoak.cn/
  listenLocal: false
  logFormat: json
  logLevel: info
  paused: false
  podMonitorNamespaceSelector: {}
  podMonitorSelector: # 搜集匹配这个label的podMonitor
    matchLabels:
      release: prometheus
  portName: web
  remoteRead:
  - url: http://influxdb:8086/api/v1/prom/read?db=prometheus&u=prometheus&p=Pr0m123
  remoteWrite:
  - url: http://influxdb:8086/api/v1/prom/write?db=prometheus&u=prometheus&p=Pr0m123
  replicas: 1
  resources:
    limits:
      cpu: 0.5
      memory: 1500Mi
    requests:
      cpu: 0.5
      memory: 1500Mi
  retention: 1d
  routePrefix: /
  ruleNamespaceSelector: {}
  ruleSelector:
    matchLabels:
      app: prometheus-operator
      release: prometheus
  securityContext:
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: prometheus-prometheus-oper-prometheus
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector:
    matchLabels:
      release: prometheus # 搜集匹配这个label的serviceMonitor
  version: v2.13.1

我们再看一下这两个资源都有什么吧：

$ kubectl -n monitoring get servicemonitors.monitoring.coreos.com 
NAME                                                 AGE
prometheus-prometheus-oper-alertmanager              144m
prometheus-prometheus-oper-apiserver                 144m
prometheus-prometheus-oper-coredns                   144m
prometheus-prometheus-oper-grafana                   144m
prometheus-prometheus-oper-kube-controller-manager   144m
prometheus-prometheus-oper-kube-etcd                 144m
prometheus-prometheus-oper-kube-proxy                144m
prometheus-prometheus-oper-kube-scheduler            144m
prometheus-prometheus-oper-kube-state-metrics        144m
prometheus-prometheus-oper-kubelet                   144m
prometheus-prometheus-oper-node-exporter             144m
prometheus-prometheus-oper-operator                  144m
prometheus-prometheus-oper-prometheus                144m

$ kubectl -n monitoring get podmonitors.monitoring.coreos.com 
No resources found.

查看一下influxdb的svc：

# 查看influxdb svc labels
$ kubectl -n monitoring get svc --show-labels 
NAME                                      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE    LABELS
influxdb                                  ClusterIP   10.100.50.188    <none>        8086/TCP,8083/TCP,8088/TCP   17h    app=influxdb,chart=influxdb-3.0.1,heritage=Helm,release=influxdb

# 查看具体的端口名称
$ kubectl -n monitoring get svc influxdb -o yaml --export
Flag --export has been deprecated, This flag is deprecated and will be removed in future.
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: null
  labels:
    app: influxdb
    chart: influxdb-3.0.1
    heritage: Helm
    release: influxdb
  name: influxdb
  selfLink: /api/v1/namespaces/monitoring/services/influxdb
spec:
  ports:
  - name: api # 使用这个port
    port: 8086
    protocol: TCP
    targetPort: 8086
  - name: admin
    port: 8083
    protocol: TCP
    targetPort: 8083
  - name: rpc
    port: 8088
    protocol: TCP
    targetPort: 8088
  selector:
    app: influxdb
  sessionAffinity: None
  type: ClusterIP

测试一下influxdb的meitrics：

$ curl 10.100.50.188:8086/metrics
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 7.6328e-05
go_gc_duration_seconds{quantile="0.25"} 0.000107326
go_gc_duration_seconds{quantile="0.5"} 0.000118199
go_gc_duration_seconds{quantile="0.75"} 0.000155008
go_gc_duration_seconds{quantile="1"} 0.07417821
go_gc_duration_seconds_sum 3.874605216
go_gc_duration_seconds_count 2198
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 31
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.11"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 4.37425856e+08
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 3.84833321384e+11

好了所需要的信息都已经找到，我们定义一个serviceMonitor来让prometheus搜取influxdb的metrics：

kind: ServiceMonitor
apiVersion: monitoring.coreos.com/v1
metadata:
  name: prometheus-prometheus-oper-influxdb
  namespace: monitoring
  labels:
    app: prometheus-influxdb
    release: prometheus # 必须要指定这个label，以便让prometheus筛选到自己
spec:
  endpoints:
  - path: /metrics
    interval: 30s
    port: api # 指定influxdb的metrics端口名称，在其svc中定义的
  namespaceSelector:
    matchNames:
    - monitoring
  selector: # 指定筛选influxdb svc的标签
    matchLabels:
      release: influxdb
      app: influxdb

稍等片刻，operator自动生成prometheus的配置，等到prometheus搜集到会在target中看到：

好了，我们再来看一下influxdb中存储的prometheus数据：

$ kubectl -n monitoring exec -it influxdb-0 -- sh
/ # influx
Connected to http://localhost:8086 version 1.7.6
InfluxDB shell version: 1.7.6
Enter an InfluxQL query
> auth prometheus Pr0m123

> use prometheus
Using database prometheus

> show measurements
name: measurements
name
----
:kube_pod_info_node_count:
:node_memory_MemAvailable_bytes:sum
ALERTS
ALERTS_FOR_STATE
APIServiceOpenAPIAggregationControllerQueue1_adds
APIServiceOpenAPIAggregationControllerQueue1_depth
APIServiceOpenAPIAggregationControllerQueue1_longest_running_processor_microseconds
up
......

# 我们查看一下up这个measurement
> select * from up limit 10
name: up
time                __name__ endpoint      instance           job                                 namespace   node                pod                                                 prometheus                                       prometheus_replica                                 service                               value
----                -------- --------      --------           ---                                 ---------   ----                ---                                                 ----------                                       ------------------                                 -------                               -----
1574386880730000000 up       https-metrics 172.17.0.7:10250   kubelet                             kube-system k8s01.test.awsbj.cn                                                     monitoring/prometheus-prometheus-oper-prometheus prometheus-prometheus-prometheus-oper-prometheus-0 prometheus-prometheus-oper-kubelet    1
1574386883246000000 up       http          10.101.253.87:8080 prometheus-prometheus-oper-operator monitoring                      prometheus-prometheus-oper-operator-9568b7df6-4nrhw monitoring/prometheus-prometheus-oper-prometheus prometheus-prometheus-prometheus-oper-prometheus-0 prometheus-prometheus-oper-operator   1
1574386884733000000 up       https-metrics 172.17.0.213:10250 kubelet                             kube-system k8s02.test.awsbj.cn                                                     monitoring/prometheus-prometheus-oper-prometheus prometheus-prometheus-prometheus-oper-prometheus-0 prometheus-prometheus-oper-kubelet    1
1574386884871000000 up       http-metrics  172.17.0.213:10249 kube-proxy                          kube-system                     kube-proxy-9gtz5                                    monitoring/prometheus-prometheus-oper-prometheus prometheus-prometheus-prometheus-oper-prometheus-0 prometheus-prometheus-oper-kube-proxy 0
1574386885147000000 up       https-metrics 172.17.0.7:10250   kubelet                             kube-system k8s01.test.awsbj.cn                                                     monitoring/prometheus-prometheus-oper-prometheus prometheus-prometheus-prometheus-oper-prometheus-0 kubelet                               1
1574386886447000000 up       http-metrics  172.17.0.230:10249 kube-proxy                          kube-system                     kube-proxy-pwb7l                                    monitoring/prometheus-prometheus-oper-prometheus prometheus-prometheus-prometheus-oper-prometheus-0 prometheus-prometheus-oper-kube-proxy 0
1574386887508000000 up       https-metrics 172.17.0.230:10250 kubelet                             kube-system k8s03.test.awsbj.cn                                                     monitoring/prometheus-prometheus-oper-prometheus prometheus-prometheus-prometheus-oper-prometheus-0 kubelet                               1
1574386888264000000 up       https-metrics 172.17.0.230:10250 kubelet                             kube-system k8s03.test.awsbj.cn                                                     monitoring/prometheus-prometheus-oper-prometheus prometheus-prometheus-prometheus-oper-prometheus-0 kubelet                               1
1574386889313000000 up       http-metrics  10.101.253.65:9153 coredns                             kube-system                     coredns-5c98db65d4-xn6fg                            monitoring/prometheus-prometheus-oper-prometheus prometheus-prometheus-prometheus-oper-prometheus-0 prometheus-prometheus-oper-coredns    1
1574386889485000000 up       metrics       172.17.0.213:9100  node-exporter                       monitoring                      prometheus-prometheus-node-exporter-frs6j           monitoring/prometheus-prometheus-oper-prometheus prometheus-prometheus-prometheus-oper-prometheus-0 prometheus-prometheus-node-exporter   1
>

再来查看一下prometheus 中的up：

发现了什么：

prometheus将每个metrics按照名称都存在influxdb的measurement中，这个measurement就相当于一个表。
Prometheus 的示例Sample(value)变成InfluxDB的field字段使用value做field key，它永远是float类型。
Prometheus labels变成InfluxDB的tags。
所有的# HELP 和 # TYPE在InfluxDB中都被忽略。

参考：

https://zhuanlan.zhihu.com/p/79561704

https://github.com/helm/charts/tree/master/stable/prometheus-operator#configuration # Prometheus-Operator Helm Chart 配置解释

https://docs.influxdata.com/influxdb/v1.7/supported_protocols/prometheus/ # 如何配置prometheus 使用开启了auth的influxdb。

本文到这里就结束了，欢迎期待后面的文章。您可以关注下方的公众号二维码，在第一时间查看新文章。