Site Reliability Engineering
Hope is not a strategy
A site reliability engineer (SRE) is a software engineer that focuses on solving operational issues with software, creating Service-Level-Objects minimizing toil of manual tasks, reduce the cost of failure, and share ownership with developers.
Tenants
- Availability
- Latency
- Performance
- Efficiency
- [[change-management]]
- [[monitoring]]
- [[capacity-planning]]
Principles
- Embracing-risk
- Eliminating-toil
- Monitoring-distributed-systems
- Automation
- Release-engineering
- Simplicity