Explaining and Expanding the SLA Conversation

Service Level Agreements (SLAs) come in many forms and descriptions in life, promising a basic level of acceptable experience. Typically, SLAs have some measurable component, i.e. a metric or performance indicator.

Take, for example, the minimum speed limit for interstates. Usually, the sign would read “Minimum Speed 45 mph”. I always thought the signs existed to keep those who got confused by the top posting of 70 mph (considering that to be the minimum) from running over those who got confused thinking 45 mph to be the maximum.

It turns out the “minimum speed” concept is enforced in some states in the U.S. to prevent anyone from impeding traffic flow. For those who recall the very old “Beverly Hillbillies” TV show, I’ve often wondered if Granny sitting in a rocking chair atop a pile of junk in an open bed truck, driving across the country might be a good example of “impeding the flow of traffic” at any speed. Although, from the looks of the old truck, it probably couldn’t manage the 45 mph minimum either.

In the world of IT, there are all sorts of things that can “impede the flow” of data transfer, data processing, and/or data storage. While there’s nothing as obvious as Granny atop an old truck, there are frequently Key Performance Indicators (KPIs) that could indicate when things aren’t going according to plan.

Historically, IT SLAs have focused on a Reliability, Availability, and Serviceability (RAS) model. While not directly related to specific events/obstacles to optimum IT performance, RAS has become the norm:

  • Reliability – This “thing of interest” shouldn’t break, but it will. Let’s put a number on it.
  • Availability – This “thing of interest” needs to be available for use all the time. That’s not really practical. Let’s put a number on it.
  • Serviceability – When this “thing of interest” breaks or is not available, it must be put back into service instantly. In the real world, that’s not going to happen. Let’s put a number on it.

In the IT industry, there exist many creative variations on the basic theme described above, but RAS is at the heart of this thing called SLA performance. The problem with this approach from an end-user standpoint is that it misses the intent of the SLA, which is to ensure the productivity/usefulness of “the thing of interest”.  In the case of a desktop, that means ensuring that the desktop is performing properly to support the needs of the end user. Thus, the end user’s productivity/usefulness is optimized if the desktop is reliable, available, and serviceable… but is it really?

Consider the following commonplace scenarios:

  • The desktop is available 100% of the time, but 50% of the time it doesn’t meet the needs of the end user, e.g. it has insufficient memory to run an Excel spreadsheet with its huge, memory-eating macros?
  • A critical application keeps crashing, but every call to the service desk results in, “Is it doing it now?” After the inevitable “No” is heard, the service desk responds, “Please call back when the application is failing.” This kind of behavior frequently results in end users becoming discouraged and simply continuing to use a faulty application by frequently restarting it. It also results in a false sense of “reliability” because the user simply quits calling the service desk, resulting in fewer incidents being recorded.
  • A system’s performance is slowed to a crawl for no apparent reason at various times of the day. When the caller gets through to the service desk, the system may/may not be behaving poorly. Regardless, the caller can only report, “My system has been running slowly.” The service desk may ask, “Is it doing it now?” If the answer is “Yes”, they may be able to log into the system and have a look around using basic tools, only to find none of the system KPIs are being challenged (i.e. CPU, memory, IOPs, storage, all are fine). In this scenario, the underlying problem may have nothing to do with the desktop or application. Let’s assume it to be the network latency to the user’s home drive and further complicate it by the high latency only being prevalent during peak network traffic periods. Clearly, this will be outside the scope of the traditional RAS approach to SLA management. Result: again a caller who probably learns to simply tolerate poor performance and do the best they can with a sub-optimized workplace experience.

So, how does one improve on the traditional RAS approach to SLA management? Why not monitor the metrics known to be strong indicators of a healthy/not so healthy workstation? In this SLA world, objective, observable system performance metrics are the basis for the measurement of a system’s health. For example, if the CPU is insufficient, track that metric and determine to what extent it is impacting the end user’s productivity. Then do the same for multiple KPIs. The result is very meaningful number that indicates how much of a user’s time is encumbered by poor system performance.

In the case of SLAs based on observable system KPIs, once a baseline is established, variations from the baseline are easily observable. Simply focusing on counting system outages and breakage doesn’t get to the heart of what an IT department wants to achieve.  Namely, we all want the end user to have an excellent workspace experience, unencumbered by “impeded flow” of any type.  The ultimate outcome of this proposed KPI vs RAS based SLA approach will be more productive end users. In future blogs, I will expand on how various industries are putting into practice a KPI based SLA governance model.

Learn More About Maximizing User Productivity

How Can IT Teams Catch Incompatibilities Before Systems Are at Risk?

Millions of PCs currently running Windows 10 will lose feature support in 2018 due to incompatible drivers according to ZDNet. The issue affects systems with certain Intel Atom Clover Trail processors that were designed to run Windows 8 or 8.1, but were offered free OS upgrades as part of Microsoft’s Windows 10 push. The support loss is due to incompatibility with the Windows 10 Creators Update and the devices in question will not receive further Windows 10 updates from Microsoft as of the time of this writing. [Update: As reported by The Verge, Microsoft has said that the devices will continue to receive security patches through 2023, but will not be included in feature updates.]

The affected PCs are consumer-level devices and enterprises are therefore unlikely to be impacted by this loss. However, there is no guarantee that other devices won’t lose support due to similar circumstances in the future. The ZDNet article cites Microsoft’s device support policy for Windows 10 that contains a footnote stating, in part, “Not all features in an update will work on all devices. A device may not be able to receive updates if the device hardware is incompatible, lacking current drivers, or otherwise outside of the Original Equipment Manufacturer’s (‘OEM’) support period.”

Determining the hardware specifications of a system is easy on the individual level, but how can companies ensure that their employees aren’t at risk of continuing to run unsupported hardware now or with future Windows 10 updates?

A workspace analytics solution, such as Lakeside Software’s SysTrack, collects and analyzes data about systems and user behavior that fast-tracks the discovery process. With this functionality, IT can easily answer questions such as whether any of the unsupported Intel Atom processors are running on their Windows 10 systems.

SysTrack view of Intel processors on different systems in an environment

As we’ve seen with recent ransomware outbreaks, running an unsupported version of an OS puts systems at greater risk of attack. Upgrading incompatible hardware in your environment before it loses support will likely be a critical part of Windows 10 management strategies moving forward.

Assess Your Environment’s Compatibility with Windows 10

End Users Are People Too

Companies are finding that the traditional approach of a four-year, one-size-fits-all technology refresh cycle no longer works for today’s tech-charged workforce. For some employees, that cycle is too long and limits their ability to be productive by keeping them from the latest hardware and applications that they’re accustomed to in their personal lives. Other workers are less demanding, and a refresh may arrive years too early for them, resulting in unnecessary system downtime and wasteful spending.

In theory, surveying employees about what technology they use and need to be most productive would result in harmonious unions between people and technologies. However, this ideal scenario breaks down pretty quickly when you consider the time it would take to process that feedback at the enterprise level. And, even if you could, does the user really know best? The average user isn’t going to be able to name every application they’ve interacted with, provide an unbiased portrayal of their system performance, or be willing to disclose their use of Shadow IT. Not to mention that people change job roles and leave companies frequently, which immediately nullifies the project of matching resources to those individuals.

Thankfully, there is a better approach that will allow you to make purchasing and provisioning decisions based on facts rather than user perception. While the basic concept behind this approach may sound familiar to you, the addition of collection and analysis of real user data makes all the difference between a time-intensive effort with minimal returns and an ongoing way of tailoring end-user experience improvements to employee workstyles.

A Personalized Approach to IT

Continuous user segmentation, also known as personas, is a way of grouping users based on their job roles, patterns, behaviors, and technology. Personas provide a meaningful lens for IT to understand what different types of users need to be productive, allowing IT to optimize assets accordingly.

Workspace analytics software for IT automates the segmentation process and continues to assess user characteristics and experiences to update groupings based on quantitative metrics. As a result, once persona groupings are defined, IT can focus on addressing the needs of different groups and let the software do the work of updating the populations within each persona. This functionality is key to any Digital Experience Monitoring strategy.

It Pays to Segment Users Right

Overlooking personas can lead to over- or under-provisioning assets to a job role. This can be costly to a company in several ways. Over-provisioning licenses can be wasteful of a company’s money while under-provisioning can become a nightmare for IT administrators. Under-provisioning encourages users to install their own applications and allows their user profiles to be personally optimized. However, all the miscellaneous applications can burden IT administrators with the multitude of unique problems for each user and application. Applications that users installed might also not be compatible with each other. Additionally, users may use applications not compatible within the workspace, disabling the ease of sharing files.

Optimizing assets for a company with the aid of personas can enable an increase in productivity. With the use of personas, job roles can be catered to uniquely, but with the provisioning remaining consistent. Each job role, based on real user data, can be provisioned unique licenses and applications that cater to their needs. This prevents users from feeling the need to install their own versions of missing applications, ultimately allowing IT administrators to limit any potential application or license errors.

Segmenting Users in Practice

Using common persona categories, a company may have deskbound users who are provisioned with expensive laptops when a desktop would do, or they may have knowledge workers with expensive i7 CPUs when a PC with an i5 or i3 makes more sense. We have also had customers report that they found that their power users needed to be refreshed every year because of the productivity improvement, while their task workers didn’t need a refresh for as long as five years.

Using personas to segment the end-user environment for a targeted refresh allows an enterprise to provide the right end-user device for a given end user based on their CPU consumption, critical application usage, network usage, and other key metrics. The benefits are numerous and include reduced cost, higher end-user productivity, better security, and a device custom-fit to the end user’s needs.

Learn more about Enterprise Personas

Introducing the SQL Server Administration Kit

Database administration is an important role within IT. Ensuring the backend infrastructure is available and functioning well is critical to the end users. If a SQL Server that’s hosting the backend of a Citrix XenApp farm has no available disk space then users won’t be able to launch any new sessions, which would cause massive problems for any company. It’s a simple example with big consequences, but database administration is often avoiding the big consequences by watching the little things. SysTrack natively collects a lot of great SQL Server related data, and we’ve launched a SQL Server Administration Kit to provide some focused dashboards and reports to visualize that data. Here’s an overview of the content you’ll find in the Kit.

SQL Server Inventory and Overview

This dashboard provides a concise overview of the observed SQL Servers. Having this kind of information readily available and nicely summarized makes it easy to keep track of your SQL assets. The dashboard includes basic resource allocation, database configuration details, and system health observations and trends. A great use case for this dashboard is checking resource allocation against the system health. If, for example, memory is the leading health impact, you can instantly see the allocated memory and decide if adding more makes sense. Also included is a drill-down to SysTrack Resolve, making the jump from basic observations like that to diving into more detailed data as easy as double-clicking the system name.

SQL Server Performance 

Another useful dashboard, this one’s focused on both what’s impacted the system health over the past week, and the trending and aggregated performance data over the past week for key SQL performance metrics. The SQL performance metrics are important to provide another level of detail to standard performance metrics, and add some context around why certain aspects of the system health are trending the direction they are. Available metrics are errors/sec, buffer cache hit ratio, logical connections, user connections, full scans/sec, index searches/sec, page life expectancy, and batch requests/sec.  

SQL Server Overview

This SSRS report has similar data to the SQL Server Inventory and Overview dashboard. It includes the system health, resource allocation, and operating system. Having this data available as a formatted, static report makes it very easy to quickly view the data as well as export to Excel, PDF, or Word for offline use.

Summarizing data and providing use-case driven content packs are the whole idea behind our Kits. We’re happy to have expanded the available Kits to include SQL administration. Making IT easier and more data driven is our goal, and we’ll keep improving and expanding our Kits to continue that goal!

Foundations of Success: Digital Experience Monitoring

We’ve all seen the rapid evolution of the workplace; the introduction of personal consumer devices, the massive explosion of SaaS providers, and the gradual blurring of the lines of responsibility for IT have introduced new complications to a role that once had very clearly defined purview. In a previous post, we discussed quantification of user experience as a key metric for success in IT, and, in turn, we introduced a key piece of workspace analytics: Digital Experience Monitoring (DEM).  This raises the question, though, what exactly is DEM about?

At its very heart, DEM is a method of understanding end users’ computing experience and how well IT is enabling them to be productive. This begins with establishing a concept of a user experience score as an underlying KPI for IT. With this score, it’s possible to proactively spot degradation as it occurs, and – perhaps even more importantly – it introduces a method for IT to quantifiably track its impact on the business. With this as a mechanism of accountability, the results of changes and new strategies can be trended and monitored as a benchmark for success.

That measurable success criterion is then a baseline for comparison that threads its way through every aspect of DEM. It also provides a more informed basis for key analytical components that stem from observation of real user behavior, like continuous segmentation of users into personas. By starting with an analysis of how well the current computing environment meets the needs of users, it opens the door to exploring each aspect of their usage: application behaviors, mobility requirements, system resource consumption, and so on. From there users can be assigned into Gartner defined workstyles and roles, creating a mapping of what behaviors can be expected for certain types of users. This leads to more data driven procurement practices, easier budget rationalization, and overall a more successful and satisfied user base.

Pie chart showing the number of critical applications segmented by persona

Taking an active example from a sample analysis, there are only a handful of critical applications per persona. Those applications represent what users spend most of their productive time working on, and therefore have a much larger business impact. Discovery and planning around these critical applications also can dictate how to best provision resources for net new employees that may have a similar job function. This prioritization of business-critical applications based on usage means that proactive management becomes much more clear cut. The experience on systems where users are most active can be focused on with automated analysis and resolution of problems, and this will have the maximum overall impact on improving user experience. In fact, that user experience can then be trended over time to show what the real business impact is of IT problem solving:

Chart showing the user experience trend for an enterprise

Various voices within Lakeside will go through pieces of workspace analytics over the coming months, and we’ll be starting with a more in-depth discussion of DEM. This will touch on several aspects of monitoring and managing the Digital Experience of a user, including the definition of Personas, management of SLAs and IT service quality measurements, and budget rationalization. Throughout, we’ll be exploring the role of IT as an enabler of business productivity, and how the concept of a single user experience score can provide an organization a level of insight into their business-critical resources that wouldn’t otherwise be possible.

Learn more about End User Analytics