03.06.21 4 Min read
Book a Demo
In the current IT market, one of the hottest job roles is the Site Reliability Engineer (SRE). In January 2019, according to LinkedIn, being an SRE is the second most promising job in the USA. These Statistics were cited:
In this post we will have a look at what an SRE does in their daily work, a little history on Site Reliability Engineering, and what the foundations are; and how you can become an SRE.
DevOps and Site Reliability Engineering are different disciplines, but they are not competitors. They complement each other. That blog post explained the differences between Site Reliability Engineering and DevOps. Here we will strictly focus on characteristics of the SRE role.
Site Reliability Engineering is the application of software engineering to operational problems. The word ‘Reliability’ means an SRE has a particular role in an organisation and the Software Development Life Cycle. SREs teach application developers how to build reliable services. Next to that, they ensure that the computer systems of an organisation run correctly, 24/7. Security, stability and scalability are very important here. The business wants reliable services.
Site reliability engineers create a bridge between development and operations by applying a software engineering mindset to system administration topics.
An SRE is, therefore, a vital role within an organization. Typical SRE activities include:
Have a look at “The Ultimate Guide to SRE Acronyms” if you want to learn how to “talk SRE.”
The term ‘Site Reliability Engineer’ originated at Google by Ben Treynor Sloss, VP of engineering, in 2003. He was hired by Google to manage a team of software developers running a production environment. Continuous development, integration and operations demanded a new way of thinking. That’s how Site Reliability Engineering came to be.
Ben Treynor Sloss explained the core of the SRE role in this interview:
“SRE is fundamentally doing work that has historically been done by an operations team, but using engineers with software expertise and banking on the fact that these engineers are inherently both predisposed to, and have the ability to, substitute automation for human labor. In general, an SRE team is responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning.”
Now that we know the origin of SRE, we can askt, what is this role built on?
Site Reliability Engineering is based on the following:
These fundamental elements are embedded in the job of an SRE in a balanced and efficient manner, to deal with the daily work in the organisation. To do this, an SRE needs a toolbox.
A Site Reliability Engineer works with the following software, languages, and tools:
As you can see, an SRE must have Development and Operations skills to automate the manual skills of a development team.
Currently, SREs are high in demand. But it is not an easy job. As stated earlier, an SRE needs development and operations skills – a Pi-shaped skill set. For this skill set, an SRE has to be proficient in both trades; not just one or the other, which defines a T-shaped skill set. This makes SRE a very demanding and practical career. It can be beneficial to have a solid understanding and knowledge base to start from, check out the Top 10 SRE Books to Read in 2021. However, itt can also be learned on the job with the right motivation and endurance. Most SREs have a software development or system and networking engineering background or education.
At Google, SREs do at least 50% development during their daily job. An SRE is still a software developer; an engineer doing operations.
Do you want to become an SRE? Big tech companies, Google included, want you because they know SREs are very hard to find. Is this because a good SRE ultimately ‘automates their way out of a job’?
We hope that this article showed you what a Site Reliability Engineer does, why it is in high demand and how you can become one. For more information, you can take a look at Google’s take on SRE as well as this excellent series of videos that they posted on YouTube.
Originally posted by StackPulse.
If your work relates to site reliability engineering, incident response or even just plain-old DevOps, it’s easy to feel like you are drowning in a sea of acronyms. The IT world, in general, is riddled with acronyms (Wikipedia lists hundreds of them) that can be hard for the uninitiated to decipher; but the world of […]
Schedule a call with our experts. Discover new technology and get recommendations to improve your performance.