Recently @annie_ying demonstrated the idea of learning from actual code invocations in Learning How to Invoke IBM Alchemy Data News Through API Harmony. The goal is to learn how developers are commonly using APIs and then summarizing that information for future developers.
This insight is made possible thanks to the 28M+ open source projects hosting on GitHub. For example, in the figure to the right we present the real-world API invocation practices based on endpoints for a particular API. Specifically we can see that 25 usage examples were found when looking at the Instagram API.
Despite the openness and abundance of source code it is surprisingly difficult to obtain these insights. The first challenge in collecting this information is in locating and extracting the usage information from the source code. This extraction procedure is divided into two steps:
Identify the code snippets making API requests to particular URL domain. We leverage the power of the GitHub Code Search, look for requests sent to specific APIs and collect a set of code snippets for the next step.
Understand how the request is made in terms of HTTP method, headers, parameters and body as well as what data is retrieved from the response. More details about the static code analysis that enables this can be found in Understanding Real API Practices via Big Code Analysis.
The treasure hunt starts from a corpus of over 1000 known APIs. The initial code query returns the snippets that appear to include requests to a given API using JQuery specific HTTP request methods. There is a hard upper limit of 1000 which is imposed by GitHub on the code snippet search results. The following plot shows the number of snippets making requests using JQuery functions for each API.
The next figure shows the top 50 APIs having snippets making requests using the JQuery framework.
Similarly, we count the number of invocation instances for each API and present the results in the following two figures.
Are you surprised by the curves? Did you expect that the tail would be so long? JQuery is a very popular framework for client-side web development however it clearly lends itself to some APIs more than others. Do you think the existence or quality of SDKs would radically change an API’s position on the curve? How do you think the curve would look for another programming language or request library?
Do you see something else interesting in the data? We’re interested in seeing what you make of it and what other questions we could be asking about web API usage in public projects.
This post has been written in collaboration with Christopher Young @harledhes.