Without a doubt industry has embraced the necessity of providing programmatic Web API access to services along with (when appropriate) web interfaces. There are few systems today that aren’t making use of at least one Web API to outsource complex functionality like payments, mapping or notifications. However, there is a large gap in terms of how we monitor these inherently failure-prone additions to our business applications.
Monitoring APIs exists to satisfy both the technical and the business aspects of a system and needs to be seen from the provider and consumer perspectives. We previously took a look at some API Monitoring services in How do you monitor your APIs?, but let’s dig a little deeper to see what other opportunities we have to improve the state of API monitoring.
We’ll look to providers first, as it is their responsibility to make the API as enjoyable and reliable as possible. As a provider, what types of things should be monitored and why?
- Internal errors - First and foremost monitoring must be able to quickly detect APIs generating errors (the bad kind, 500) and notify the respective ops teams that something is going wrong. These internal server errors leave clients helpless – even if they constructed the responsible payload.
- Response time - Errors aside, response time is very important as the API (despite what has been recommended) is likely running inline with a transaction flow in more systems than expected. Keeping things fast is not only good for the service in terms of throughput but also helps consumers more easily use the APIs. If possible, drive the response time down and save consumers from complex async batch job systems.
- Rates and limits - This falls into the API Management realm, but it is important to know what users are performing what volume of activities with APIs. This is important to protect the service but also to find opportunities to improve the exposed endpoints and to drill down on optimizing those called more often than expected.
- Client technology - Knowing how consumers are calling the API is important because it helps understand where to allocate resources for development, documentation and support. For instance, if consumers are hand-crafting most of their APIs calls is that acceptable for the type of API offered? Why aren’t they using Client Libraries or SDKs?
- External errors - Internal errors are one thing, but external errors offer a tremendous opportunity to understand where the API can be improved. Expanding on knowing the client technology used (and the possibility of bugs existing), external errors represent how many times consumers are shooting themselves in the foot. If there are trends in particularly error-prone endpoints then rally the troops and streamline the information to help avoid the errors.
- Call chains - This one is definitely bonus points, but the idea is track the sequence of calls that consumers are making. If there is a consistent sequence of endpoints called it might indicate an opportunity to consolidate or improve the API structure. This could save the service some load and the consumers some effort.
The other lens to view APIs through is from the point-of-view of the consumer. This is important because the consumer’s business likely has a hard dependency on the capability of the Web API functioning as expected and not consuming an inordinate amount of effort to support. Here are a couple of key things that the consumer should monitor:
- Errors - Errors in general are a problem as the service users will be impacted. Errors on the calling side turn to bugs and errors on the provider side leads to shopping around. Keeping track of the overall breakdown of errors is important because it helps quantify development cost / effectiveness (is the API too hard to use properly) as well as the motivation to look for another provider with more stability.
- Performance - This is an auditing role to ensure that the provider is behaving and returning responses in an acceptable time limit. The system should be designed to gracefully handle the occasional timeout but if it is frequently delivering slow responses the end users will suffer.
- Cost - This can be a little tricky but it is up to the consumer to make the right number of calls to satisfy the business need. If cost is running high then it needs to be discerned what endpoints are being called in what volume and where there is opportunity to cache or just not make the call. Users love the refresh button, don’t let it drive up your bill!
There are surely other aspects to consider when monitoring an API. As either a provider or consumer, how do you monitor your APIs, and how has it provided value to your service?