Why is SRE Becoming 2021’s Hottest Hire?

4 Min read

In the current IT market, one of the hottest job roles is the Site Reliability Engineer (SRE). In January 2019, according to LinkedIn, being an SRE is the second most promising job in the USA.  These Statistics were cited:

  • Median Base Salary: $200,000
  • Job Openings (YoY Growth): 1,400+ (72%)
  • Career Advancement Score (out of 10): 9

In this post we will have a look at what an SRE does in their daily work, a little history on Site Reliability Engineering, and what the foundations are; and how you can become an SRE.

Reduce your AWS costs by over 50%

Discover your Cloud Saving Potential – Answer just 5 simple questions. AppsFlyer, Playtika, Lufthansa, IBM, top leading companies are already using our FinOps services.

Reduce your AWS costs by over 50%

What Does an SRE Do?

DevOps and Site Reliability Engineering are different disciplines, but they are not competitors. They complement each other. That blog post explained the differences between Site Reliability Engineering and DevOps. Here we will strictly focus on characteristics of the SRE role.

Site Reliability Engineering is the application of software engineering to operational problems. The word ‘Reliability’ means an SRE has a particular role in an organisation and the Software Development Life Cycle. SREs teach application developers how to build reliable services. Next to that, they ensure that the computer systems of an organisation run correctly, 24/7. Security, stability and scalability are very important here. The business wants reliable services.

Site reliability engineers create a bridge between development and operations by applying a software engineering mindset to system administration topics.

An SRE is, therefore, a vital role within an organization. Typical SRE activities include:

  • Develop and manage scalable, secure and stable systems
  • Conduct Incident analysis
  • Analyze performance and  create improvement plans
  • Monitor efficiency systems
  • Manage risks
  • Automate manual tasks within the SDLC
  • Build automated service tools, logs and test environments to ease the engineers’ workload
  • Implement new features
  • Select infrastructure tools
  • Adapt environments to increasing or decreasing numbers of users

Have a look at  “The Ultimate Guide to SRE Acronyms” if you want to learn how to “talk SRE.”

A Little History About SREs

The term ‘Site Reliability Engineer’ originated at Google by Ben Treynor Sloss, VP of engineering, in 2003. He was hired by Google to manage a team of software developers running a production environment. Continuous development, integration and operations demanded a new way of thinking. That’s how Site Reliability Engineering came to be.

Ben Treynor Sloss explained the core of the SRE role in this interview:

“SRE is fundamentally doing work that has historically been done by an operations team, but using engineers with software expertise and banking on the fact that these engineers are inherently both predisposed to, and have the ability to, substitute automation for human labor. In general, an SRE team is responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning.”

Now that we know the origin of SRE, we can askt, what is this role built on?

What are the Foundations of SRE?

Site Reliability Engineering is based on the following:

  • Scalability – System can handle a growing amount of work by adding resources to the system
  • Availability – System works as required
  • Incident Response – Managing the handling of incidents happening with the system
  • Automation – Automating the Software Development LifeCycle Workflow

These fundamental elements are embedded in the job of an SRE in a balanced and efficient manner, to deal with the daily work in the organisation. To do this, an SRE needs a toolbox.

What is in a typical SRE Toolbox?

A Site Reliability Engineer works with the following software, languages, and tools:

  • Software languages: Ruby, Python, C++, Bash, Java
  • JavaScript extensions: Node JS, React, TypeScript
  • Cloud computing Services: AWS, Azure
  • Infrastructure tooling: Terraform, Cloud Formation, Ansible
  • Container tooling: Kubernetes, Docker, Meso

As you can see, an SRE must have Development and Operations skills to automate the manual skills of a development team.

How to Become an SRE

Currently, SREs are high in demand. But it is not an easy job. As stated earlier, an SRE needs development and operations skills – a Pi-shaped skill set. For this skill set, an SRE has to be proficient in both trades; not just one or the other, which defines a T-shaped skill set. This makes SRE a very demanding and practical career. It can be beneficial to have a solid understanding and knowledge base to start from, check out the Top 10 SRE Books to Read in 2021.  However, itt can also be learned on the job with the right motivation and endurance. Most SREs have a software development or system and networking engineering background or education.

At Google, SREs do at least 50% development during their daily job. An SRE is still a software developer; an engineer doing operations.

Do you want to become an SRE? Big tech companies, Google included, want you because they know SREs are very hard to find. Is this because a good SRE ultimately ‘automates their way out of a job’?

We hope that this article showed you what a Site Reliability Engineer does, why it is in high demand and how you can become one. For more information, you can take a look at Google’s take on SRE as well as this excellent series of videos that they posted on YouTube.

To learn more of the SRE toolkit, visit our solution page and grab the solution brief.

Originally posted by StackPulse.

Latest Articles

SRE Terminology: The Definitive Guide

If your work relates to site reliability engineering, incident response or even just plain-old DevOps, it’s easy to feel like you are drowning in a sea of acronyms. The IT world, in general, is riddled with acronyms (Wikipedia lists hundreds of them) that can be hard for the uninitiated to decipher; but the world of […]

Francesco Altomare Southern Europe Regional Manager @ GlobalDots
6th April, 2021
3 IT Infrastructure Costs Increasing in 2024:

As we navigate through the evolving landscape of IT infrastructure, a closer look at the cost trends for 2024 reveals significant shifts. From cloud expenses feeling the pressure of economic changes. With global cloud spending expected to hit over $1 trillion and various sectors facing unique challenges, staying informed is more crucial than ever.  Dive […]

Miguel Fersen Iberia & LATAM Regional Manager @ GlobalDots
26th February, 2024
8 FinOps Best Practices for Cutting Cloud Costs

The cloud used to be viewed as a place of significant cost savings: rather than purchasing and maintaining dozens of server stacks, organizations could outsource this and purchase compute power on an as-needed basis. In the ensuing rush to cloud architecture, however, many companies simply lifted-and-shifted their old financial bad habits. The sheer speed of […]

22nd February, 2024

Unlock Your Cloud Potential

Schedule a call with our experts. Discover new technology and get recommendations to improve your performance.

Unlock Your Cloud Potential