Site reliability engineering can be described as an engineering principle that is applied to products and services to help find an appropriate level of reliability. Moving away from a focus on component monitoring and '5 9s' targets, SRE looks at how products and services are being used and what good actually looks like from a business perspective.
Jason shared insights into service level indicators, objectives and agreements, as well as talking about how error budgets can be used to make space for innovation.
SRE sits within a wider organizational context that changes how we structure our product teams and how we think of 'done' - this episode will challenge you to look at your products and services in new ways.
You can learn more about SRE via these links:
Site reliability engineering - how Google runs production systems (via O'Reilly)
Free SRE online training (via Microsoft)
SRE weekly (email service)
You can watch the episode below, and ask follow up questions to our panel using the hashtag "ITSMCrowd" on twitter.
Got a burning topic you want to discuss on the ITSM Crowd? Contact us and we’ll be in touch.