Manuel Reischl, Head of Customer IT Support @ GlobalDots
09.06.2021
image 5 Min read

In this article I hope to give the reader a small history lesson as well as some advice on how to build a useful monitoring system for your platform. First, it’s key to understand where we came from. Before cloud computing systems, every company owned its own infrastructure and therefore had a need to monitor it. This kind of monitoring was used to check on things like, hard disk failures, memory usage, CPU overload etc. This was very important because if you had a faulty hard disk within a disk array, you really needed to know about this before the whole disk array collapsed. These types of systems were not online typically and everything was managed by the system administrators in the building. Nagios was the preferred tool then and is still heavily used by many organisations today; but as the world has migrated to the cloud, they have become less necessary. If you use Azure, AWS or GCP you don’t have to worry about faulty disks, memory, network switches etc. All the fundamental computer hardware is very well looked after by such providers; it is no longer your responsibility. This is one of the biggest mistakes we currently see in monitoring – using old approaches to monitor a new world. Let’s take a closer look.

How to monitor in the ‘new’ world

Now that the primary hardware is monitored by someone else it means that organisations can focus on what matters to them. In my experience (as the head of support for a global NOC) I can tell you this is the number one area where businesses fail. Over and over companies will ask us to monitor, and when we ask what needs monitoring, they cannot tell us – even worse, they expect us to tell them what needs monitoring! This is total madness. Every company has a unique system and different metrics that are important to them. If you are in this position then I urge you to go back and think of what is important to you? Say, for example, that the shopping cart on the website is the most income-generating feature – then, in this situation you should map all of the services that make up the feature, ‘shopping cart’. Once you know all the subsystems that support ‘shopping cart’, you can then look to monitor all of them, and by doing so you will have also built a way of monitoring the shopping cart more fully. As cloud computing scales up and down with demand you should not look to monitor CPU and RAM, but instead look to run ‘synthetic tests’ which attempt to run transactions through your system. These synthetic tests mimic the behaviour of uses more realistically and are more likely to tell you if your service is up/down; they are also much more useful for alerting. For example, you could have a synthetic test that runs a purchase on your site every minute – this would make sure the shopping cart is working correctly.

The Benefits of Cloud Monitoring Tools

In the cloud you have lots of tools that come with inbuilt integrations, such as DataDog for example. This is important because you no longer need to build the monitoring yourself like you used to in the days of Nagios. DataDog has 500+ integrations ready to use. To try and build monitoring solutions on your own makes no sense – do not try to reinvent the wheel! The fun fact here is that by taking cloud (or hosted) solutions you get much more added benefit. Companies like DataDog use Machine Learning and Artificial Intelligence which helps you to find the root cause of issues more quickly – the more data you push to them, the smarter they become (beware of the additional costs though). Cloud tools also benefit from constant upgrades and feature releases – with Nagios you could have been on the same version for months or even years; DataDog has new features added daily. Hosted monitoring solutions also have the benefit of infinite scalability. You don’t have to worry that they will run out of space or have service interruptions, as all of these problems are pushed onto the vendor.

Do’s & Don’ts for Smart Monitoring

Do deduce the noise.

Think of what is really an alert to you. It is tempting to add alerts for all kinds of things, but if you add too many alerts everyone will learn to ignore them and you will miss the important one. Only set alerts to engineers when there is a real problem. It is also key to work out what the business sees as important. An engineers view of the world is not the same as the product manager – a system may appear totally healthy from a monitoring point of view, but be 100% broken from the user perspective.

Do look for existing integrations.

If you need to monitor a system, always look for existing integrations as it is most likely you are not the first person to do this. Take the lessons other people have learned and save yourself time and effort.

Do try to build a ‘single pane of glass’

Modern monitoring systems will allow you to build up time-based data sets from multiple sources, which means that you can view CDN logs next to application logs very easily. The more sources you can pull together in one place, the better your monitoring will be

Don’t monitor everything

Whilst it is tempting to monitor everything, this typically means that you have too much data to sort through. If you have a staging system creating millions of lines of logs in an hour, ask yourself ‘is this really necessary?’. Also, be very aware that modern systems are priced by volume, so if you start to push unnecessary data you will end up paying for this.

Don’t rely on the out-of-box settings

I have already mentioned that it is a good idea to find existing integrations for your systems. This remains true but you should be aware of what your system comprises and what makes it unique. The existing integrations are usually a very good start but don’t expect them to be perfect – you will need to do some fine tuning to get the best experience

Don’t make assumptions

Even though monitoring systems are much more advanced with Machine Learning and Artificial Intelligence, do not expect that the system will know what is important to you. Perhaps you have some system folders that should only hold files on a temporary basis, then delete them. There is no way for a monitoring platform to know such things, you must teach it. Understand your system before you look to monitor it.

Hopefully, this introduction to monitoring will help you on your journey.  I shall leave you with a quote:

“If you can’t measure it, you can’t manage it”.

Keep this in mind when building your systems and you will have an easier life monitoring them.

Monitoring can be worry-free, even on hybrid or multi-cloud. Contact us to get the latest monitoring solutions.

Learn More

How-To: Collect SNMP with Sumologic
Monitoring, Logging & Observability
Manuel Reischl, Head of Customer IT Support @ GlobalDots 19.04.22

Introduction SNMP is an application layer protocol which manages and monitors the connected IP devices. SNMP works on a Client-Server based architecture, where the clients are known as the SNMP Agents and the Server are called as the Managers. The clients are devices that are connected to the Internet, it could be switches, routers, printers, […]

Read more
Practical Guide: How To Act on Your CDN Logs for Increased Revenue & Security
Monitoring, Logging & Observability
Manuel Reischl, Head of Customer IT Support @ GlobalDots 28.02.22

As CDNs become ubiquitous, our need to monitor and understand the operational performance of our solution becomes increasingly more important. Logs are a brilliant way to get insight into the health and performance of your CDN. Logs are often held up as a shining example of observability data, comprising an unstructured collection of quantitative and […]

Read more
Why Your Security Posture Needs In-Depth CDN Log Monitoring
Monitoring, Logging & Observability
Manuel Reischl, Head of Customer IT Support @ GlobalDots 27.02.22

CDNs have become a standard component of any serious scaling strategy. With scaling, of course, comes an increased security challenge. This leads to code scanning, log analysis, expensive intrusion detection systems and more, but the data locked away inside of a CDN is often ignored.  This data is essential to a strong security posture and […]

Read more
Unlock Your Cloud Potential
Schedule a call with our experts. Discover new technology and get recommendations to improve your performance.
Book a Demo