Prometheus 安装
官网 https://prometheus.io/download/*
Github https://github.com/prometheus/prometheus/releases
1 | cd /tmp |
默认端口9090
1 | nohup ./prometheus --config.file=prometheus.yml > /var/log/prometheus.log 2>&1 & |
开启防火墙
1 | firewall-cmd --zone=public --add-port=9090/tcp --permanent |
Node_exporter 安装
Github https://github.com/prometheus/node_exporter/releases
1 | cd /tmp |
默认端口9100
1 | nohup ./node_exporter > /var/log/node_exporter.log 2>&1 & |
开启防火墙
1 | firewall-cmd --zone=public --add-port=9100/tcp --permanent |
修改prometheus配置文件 prometheus.yml ,例下:
scrape_configs:
- job_name: ‘prometheus’
static_configs:
- targets: [‘prometheusHostIp:9090’]
- job_name: ‘server’
static_configs:
- targets: [‘node_exporterHostIp:9100’]
AlertManager安装
Github https://github.com/prometheus/alertmanager/releases
1 | cd /tmp |
默认端口9093
1 | nohup ./alertmanager >> /var/log/alertmanager.log 2>&1 & |
开启防火墙
1 | firewall-cmd --zone=public --add-port=9093/tcp --permanent |
开机启动脚本
#开机自启动脚本及配置方法
#脚本放置目录/etc/rc.d/init.d
#================nodeexporter==================
#!/bin/bash
#chkconfig: - 85 15
#description:开机自启脚本
nohup /usr/local/node_exporter/node_exporter > /var/log/node_exporter.log 2>&1 &
#===================END=======================
#================prometheus==================
#chkconfig: - 85 15
#description:开机自启脚本
nohup /usr/local/prometheus/prometheus –config.file=/usr/local/prometheus/prometheus.yml > /var/log/prometheus.log 2>&1 &
#===================END=======================
#================AlertManager==================
#chkconfig: - 85 15
#description:开机自启脚本
nohup /usr/local/alertmanager/alertmanager >> /var/log/alertmanager.log 2>&1 &
#===================END=======================
#==========shell命令=================
chmod +x XXX.sh
chkconfig –add XXX.sh
chkconfig XXX.sh on
Prometheus基本配置文件模板
#=============Prometheus基本配置文件模板===================
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
scrape_timeout: 10s # scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- scheme: http
static_configs:
- targets:
- “localhost:9093”
rule_files:
- ‘/usr/local/prometheus/rules/hostBase.yml’
# Here it’s Prometheus itself.
scrape_configs:
# The job name is added as a label
job=<job_name>to any timeseries scraped from this config.- job_name: ‘prometheus’
# metrics_path defaults to ‘/metrics’
# scheme defaults to ‘http’.
static_configs:
- targets: [‘localhost:9090’]
- job_name: ‘server_ansible & Jenkins’
static_configs:
- targets: [‘主机IP:9100’]
AlertManager 相关配置
AlertManager.yml 配置
global:
resolve_timeout: 5m
wechat_api_url: “https://qyapi.weixin.qq.com/cgi-bin/" # 这个暂时不用改,照抄
wechat_api_secret: “相应的api_secret”
wechat_api_corp_id: “相应的corp_id”
templates:
- ‘/usr/local/alertmanager/template/wechat.tmpl’
route:
group_by: [‘alertname’]
group_wait: 10s
group_interval: 10s
repeat_interval: 20m
receiver: wechat
receivers:
- name: “wechat”
wechat_configs:
- send_resolved: true
to_user: “@all”
corp_id: “Corp_id同上”
#to_party: “1”
agent_id: “1000002”
api_secret: “Api_secret同上”
wechat告警通知模板
{{define "wechat.default.message" }} {{ if gt (len .Alerts.Firing) 0 -}}Alerts Firing:
{{ range .Alerts}}========start==========
告警程序:prometheus_alert
告警级别: {{ .Labels.severity }}
告警类型: {{ .Labels.alertname }}
故障主机: {{ .Labels.instance }}
告警主题: {{ .Annotations.summary }}
告警详情: {{ .Annotations.description }}
触发时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }}
=========end==========
{{- end }} {{- end }} {{ if gt (len .Alerts.Resolved) 0 -}}Alerts Resolved:
{{ range .Alerts}}========start==========
告警程序:prometheus_alert
告警级别: {{ .Labels.severity }}
告警类型: {{ .Labels.alertname }}
故障主机: {{ .Labels.instance }}
告警主题: {{ .Annotations.summary }}
触发时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }}
恢复时间: {{ .EndsAt.Format "2006-01-02 15:04:05" }}
=========end==========
{{- end }} {{- end }} {{- end }}