Alertmanager主要负责对Prometheus产生的告警进行统一处理,因此在Alertmanager配置中一般会包含以下几个主要部分:
● 全局配置(global):用于定义一些全局的公共参数,如全局的SMTP配置,Slack配置等内容;
● 模板(templates):用于定义告警通知时的模板,如HTML模板,邮件模板等;
● 告警路由(route):根据标签匹配,确定当前告警应该如何处理;
● 接收人(receivers):接收人是一个抽象的概念,它可以是一个邮箱也可以是微信,Slack或者Webhook等,接收 人一般配合告警路由使用;
● 抑制规则(inhibit_rules):合理设置抑制规则可以减少垃圾告警的产生
jstar -zxvf alertmanager-0.29.0-rc.1.linux-amd64.tar.gz
cd alertmanager-0.29.0-rc.1.linux-amd64
启动
js./alertmanager

在配置中修改
yml# 全局配置
global:
scrape_interval: 15s
evaluation_interval: 15s
# 告警配置
alerting:
alertmanagers:
- static_configs:
- targets: ['192.168.201.110:9093']
# 规则文件配置
rule_files:
- '/server/prometheus/prometheus-3.5.0.linux-amd64/alert_rules/*.rules'
# 监控目标配置
scrape_configs:
# 监控Prometheus自身
- job_name: "prometheus-server"
file_sd_configs:
- files:
- /server/prometheus/node_exporter.yml
refresh_interval: 5s
# 监控Nginx
- job_name: nginx
static_configs:
- targets: ['192.168.201.104:9113']
labels:
name: "nginx"
# 监控Kubernetes
- job_name: K8S
static_configs:
- targets: ['192.168.201.100:31666']
labels:
name: "k8s"
# 监控Redis
- job_name: redis
static_configs:
- targets: ['192.168.201.104:9121']
labels:
name: "redis"
# 监控Docker
- job_name: docker
static_configs:
- targets: ['192.168.201.104:8080']
labels:
name: "docker"
# 监控MySQL
- job_name: mysql
static_configs:
- targets: ['192.168.201.104:9104']
labels:
name: "mysql"
# 监控Blackbox Exporter自身
- job_name: 'blackbox-exporter'
static_configs:
- targets: ['117.72.79.70:9115']
labels:
name: "blackbox-exporter"
# 通过Blackbox Exporter监控外部网站
#http配置
- job_name: "blackbox_http"
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- https://www.baidu.com
- https://songxuan.vip
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 117.72.79.70:9115
# 监控告警alertmanager9093端口
- job_name: alertmanager
static_configs:
- targets: ['192.168.201.110:9093']
labels:
name: "alertmanager"
然后在普罗米修斯目录下创建一个alert_rules
jsmkdir -p /server/prometheus/prometheus-3.5.0.linux-amd64/alert_rules
然后创建一个普罗米修斯的告警
jsvim /server/prometheus/prometheus-3.5.0.linux-amd64/alert_rules/nginx.rules
jsgroups:
- name: nginx_alerts
rules:
- alert: NginxServiceDown
# 修改后的表达式:同时检测指标为0和指标不存在的情况
expr: nginx_up == 0 OR absent(nginx_up)
for: 1m
labels:
severity: critical
annotations:
summary: "Nginx服务不可用 (实例 {{ $labels.instance }})"
description: "Nginx 服务已停止运行或无法访问超过1分钟。"
最后普罗米修斯这里也要配置好
然后重启普罗米修斯。


本文作者:松轩(^U^)
本文链接:
版权声明:本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!