Alertmanager Guide for the Monitoring Stack¶
Alertmanager receives alerts from Prometheus and decides how they should be grouped, routed, silenced, and repeated.
What It Handles¶
- Grouping related alerts
- Routing alerts to the right destination
- Silencing planned maintenance noise
- Preventing duplicate or unnecessary notifications
A Practical Routing Model¶
- Group alerts by service, team, or severity
- Send critical alerts immediately
- Batch warning alerts to reduce noise
- Silence known maintenance windows
Good Routing Ideas¶
- Infrastructure alerts to the platform team
- Probe failures to the owning service team
- Low-severity alerts to email or chat summaries
- High-severity alerts to paging tools
Useful Concepts to Add Over Time¶
group_bygroup_waitgroup_intervalrepeat_interval- Inhibition rules
- Receiver-specific routes
Practical Advice¶
- Avoid paging for every single alert
- Keep severity labels consistent in Prometheus rules
- Test routes before relying on them in incidents