Fuente

CatOps | Continuing with our AI week.AI in SRE: What's Actually Coming in 2026 ...

1 410 Vistas/Alcance 2026-06-18 11:23 Mensaje №2925

Continuing with our AI week.AI in SRE: What's Actually Coming in 2026 is telling a story of AI coming for help with incident response.The article suggests trying an AI tool for real investigation or data collection for postmortems. To clarify this, in my experience, you don’t need to have a dedicated tool, a general purpose AI agent with some harness (skills and scripts) would do. You should try it! AI does the job of data gathering incredibly well. Yet, the results are indeed not perfect.Another good point in this article is data quality. AI results are as good as context you provide. I witnessed two prominent failure modes so far:1. Inference on incomplete data: a person with limited access (typically a developer) asks their agent to investigate an alert. The agent comes to some conclusion. At the same time, a person with elevated access (typically a systems engineer) asks their agent to investigate the same alert and gets a different result, likely because some data is only available via kubectl events, etc. The fix for that is not to allow everyone to do everything, the fix is to revisit your observability pipelines and ensure that you ship all the relevant data, which is easier said than done.2. Agent that cries "wolves": if you have a pollutant in your logs, or simply an event that happens very often, agents like to correlate it with everything. If your clusters are elastic, an agent could blame node count fluctuations for every error. The problem here is that once node count fluctuation actually causes a problem, you will be the one to ignore this hint from an agent, because it suggests it every single time.If you are ready to share more AI failure modes specifically related to SRE in Ukrainian, welcome to our chat.#ai #sre

Enlace directo

CatOps

CatOps | Continuing with our AI week.AI in SRE: What's Actually Coming in 2026 ...

Consultas populares