Always Watching: The Power of Real-time Server Monitoring

Nicholas Iglehart
| October 17, 2023

Imagine a world where the servers and services that power your business are silently monitored around the clock. Where issues are identified even before they manifest to customers. Where timely intervention and rapid troubleshooting become the norm. This isn’t science fiction, it’s the power of effective monitoring.

Why Monitoring Matters

Stepping back, let’s first grasp the essence of why monitoring is essential. Along with patching and backups, monitoring is one of the pillars of server care. Unlike the other pillars, monitoring also extends to your serverless and third-party services.

So what are the consequences of not monitoring your systems? There are many, but here are some of the most pressing.

Downtime: Time truly is money for your business. Unexpected downtime can halt operations, causing significant losses. An outage can disrupt revenue streams and damage the trust customers place in your company. Monitoring can give you advance warning of looming problems and reduce delays in resolution due to troubleshooting.

Wasted Money: If you’re not monitoring your systems, you might not realize that you are spending on resources that aren’t being used. Poorly sized instances, orphaned resources, and underutilized services are all money down the drain.

Loss of Compliance: For industries with strict compliance requirements (e.g., healthcare, finance), not monitoring systems could lead to breaches of regulations, resulting in fines and legal action.

Lost Business Opportunities: If potential clients or partners see that your services are frequently down or underperforming, they might choose to go with a competitor instead. Proper monitoring helps you get ahead of problem services and address them before they become outages.

High-Profile Monitoring Oversights

To drive home the importance of monitoring:

Several major airlines (JetBlue, British Airways, for example), have recently grounded thousands of flights due to server issues. Real-time monitoring could have provided early warnings. Worse, the FAA had an outage that could have been found and corrected much more quickly with proper monitoring.

Multiple financial institutions (here and here are just a couple of examples) have faced trading halts because of unanticipated system glitches. This isn’t just about financial loss but also reputational damage.

Smaller businesses aren’t exempt. A lack of resources sometimes pushes monitoring to the back burner, making them susceptible to unpredictable server downtimes. It’s hard to grow your business when you’re technology doesn’t live up to your needs.

What Should You Monitor?

Okay, you get it. You need monitoring. But what exactly does that mean? What do you need to monitor? The short answer is everything!

Monitoring is the first line of defense against issues, and the most valuable tool during troubleshooting. The more you monitor, the better your insights will be.

Some common types of monitoring checks are:

Server Checks: Servers are the bedrock of applications and keeping on eye on their CPU usage, memory pressure, storage space, and processes is critical to a health application.

Service Checks: Your environment may or may not include actual servers, but regardless, all your services need to be checked as well. This includes jobs (AWS lambdas, Azure functions, etc), notifications (SNS, SQS, email, etc), storage (S3, Azure Storage, etc), and more.

Website Checks: Websites are a type of service, but usually have a host of special checks like domain name expirations, SSL/TLS certificate validation, responses times, and special interactions like custom headers or redirects.

Database Checks: Databases are also a type of service, and like websites, they have some very specific things you should watch like query times, deadlocks, throughput, and connection limits.

Business Rules: That’s right, you’re monitoring system can also be leveraged to ensure that your business needs are being met. This is particularly important if you are in a regulated environment like PCI or HITRUST and need to ensure the secure configurations stay in place.

Costs: Getting at least a basic view of your IT costs and alerting when they exceed the expected threshold can help you identify and tackle problem areas before they run amok.

If you choose the right monitoring system you should be able to monitor essentially anything that can be boiled down to a value. That value might be:

a threshold (is CPU usage over 90%?)

a true/false (is the firewall turned on?)

a timeframe (has it been more than an hour since that lambda ran?)

a count (are there more than 10 files in the outbound queue?)

or anything else you can think of!

Monitoring Done Right

Implementing effective server monitoring involves a few simple steps that you should be looking to do anyways.

Inventory Your Assets: Understand the hardware and software components that require monitoring.

Set Clear Benchmarks: Determine what ‘normal’ looks like for your services, so deviations can be easily spotted.

Automate Alerts: For anomalies, ensure that relevant stakeholders are immediately notified.

Regular Review: Periodically assess your monitoring strategy and refine it based on evolving business needs.

Choosing a Monitoring Tool

Many believe that top-tier monitoring is a luxury for large enterprises. That’s a myth. Look for a tool that fits your budget and use these guidelines to ensure you get your money’s worth.

Simple, Powerful Checks: You should be able to easily add and customize checks.

Rich Ecosystem: Someone is already monitoring almost anything you want to monitor. The tool you choose should allow you to easily tap into these existing checks.

Integration Capabilities: Seamless integration with other server care pillars, like patching, can amplify the value of all your tools and provide holistic care.

Alert Customization: Noise is the enemy of monitoring. Once people get used to seeing something, they start to ignore it. Tailoring the alert mechanisms based on the severity, type of issue, and audience is critical.

User-friendly Interface: A monitoring tool should be intuitive and straightforward, ensuring that users can quickly understand and respond to alerts.

In Conclusion

As we venture deeper into the digital realm, the call for proactive server care becomes louder. Monitoring isn’t just an accessory, it’s an imperative. It’s the lens through which you can preemptively identify and address server issues. Invest in it, and ensure your business runs like a well-oiled machine.

Nicholas Iglehart

Nicholas is a senior architect and engineer with Absolute Ops and has over 20 years experience designing, building, and owning customer technology solutions.