

Discover more from Alex Ewerlöf Notes
Definition of good in SLI
How to use the definition of "good" in the service level formula to focus the optimization?
In a recent article, we discussed the service level indicator formula:
Another article discussed the valid. This article talks about the definition of good.
Depending on the type of SLI:
Time-based SLI: good specifies a good time slot
Event-based SLI: good specifies a good event
There are basically 4 types of declarations for good.
1. Upper bound
This is by far the most common type of declaration for good where the value of a metric is considered good if it is below an upper threshold —denoting “good enough”.
For example if our SLI is trying to improve the latency, a sufficiently fast request can have a latency of 200ms. Or it can be 1000ms. The value here needs to be connected to something the consumer cares about and how the reliability is perceived by the consumer.
2. Lower bound
Conceptually this one is similar to the upper bound but the opposite: the value of a metric is considered good if it is above a lower threshold —denoting “good enough”.
For example, an expensive worker that consumes some queue (think Midjourney prompts on an expensive GPU), the utilization on those machines should be high.
3. Range bound
A combination of the upper bound and lower bound. If the metric value is within a range, it’s considered good.
4. No bound
In this case, good defined as a subset of:
Time (for time-based SLIs). For example, if our goal is to improve the website uptime:
good time: minutes where the site can be pinged
valid time: all the minutes in the compliance period (eg. a month)
Events (for event-based SLIs). For example, if our goal is to improve the product purchase flow:
good events can be the number of orders processed with a settled payment
valid events can be the number of orders placed via the website and apps
Conclusion
Depending on the type of SLI, good either specifies good events or good time periods. See this other article for more information:
Definition of good is also related to valid so make sure to check that article as well:
Definition of good in SLI
> For example, an expensive worker that consumes some queue (think Midjourney prompts on an expensive GPU), the utilization on those machines should be high.
Hm, I'm not entirely sure this is a good example of an SLI; I don't think the end-customer cares about "GPU utilization"... :-P
Actually, I don't understand the "2. lower bound". If the definition of of SLI being good/total then a lower bound means that you want _few_ good events. That doesn't make sense to me. I have always worked with upper thresholds for SLIs since I want to be above a certain ratio of "good".