Read JSON from API using requests and Pandas read_json
In this tutorial, you will learn how to use the requests
library to fetch JSON data from an API and then use the read_json()
function from Pandas to load this data into a DataFrame.
You’ll learn how to retrieve JSON data from APIs, check the response status, handle authentication and pagination, and more.
Making a GET Request
To start retrieving JSON data from an API, you’ll first need to make an HTTP GET request. This is done using the requests
library.
When you make a GET request, the requests
library contacts the API and retrieves the data that the API responds with, often in the form of a JSON.
Here’s a snippet of code demonstrating how to make a GET request:
import requests url = 'your_api_endpoint_url' response = requests.get(url) if response.status_code == 200: print("Data retrieved successfully!") data = response.json() print(data) else: print("Failed to retrieve data. Status code:", response.status_code)
Output:
Data retrieved successfully! { "id": 1, "plan": "Unlimited", "status": "active", "usage": {"calls": 300, "data": 15.5} }
In the output above, the requests.get()
function is used to contact the API at the given URL.
The API responds with a status code of 200, which indicates success, along with the JSON data.
Reading JSON Data into a DataFrame
To convert JSON data to a DataFrame, you’ll use the read_json()
function.
Here is how you can use read_json()
:
import requests import pandas as pd url = 'your_api_endpoint_url' response = requests.get(url) if response.ok: df = pd.read_json(response.text) print(df) else: print("Failed to retrieve data. Status code:", response.status_code)
Output:
id plan status calls data 0 1 Unlimited active 300 15.5
The code above takes the JSON response and passes it directly to pd.read_json()
.
The function reads the JSON and maps it into a DataFrame, which is printed to the console.
The DataFrame df
displays the data in a tabular format, with each key from the JSON object becoming a column in the DataFrame.
Handling Authentication
APIs require a key, token, or username and password combination for authentication.
In the case of an API key, it’s usually included in the request’s headers or as a query parameter in the URL.
It’s essential to handle these credentials securely and never hard-code them into your scripts, especially if you’re sharing the code or using version control systems like Git.
Here’s how you can pass an API key using headers:
import requests import pandas as pd api_key = 'your_secure_api_key' url = 'your_api_endpoint_url' headers = { 'Authorization': f'Bearer {api_key}' } response = requests.get(url, headers=headers) if response.ok: df = pd.read_json(response.text) print(df) else: print("Failed to retrieve data. Status code:", response.status_code)
Output:
id plan status calls data 0 1 Unlimited active 300 15.5
In the example above, the API key is included in the request headers as a bearer token.
The request is made with the additional headers
argument, and upon success, the JSON data is loaded into a DataFrame.
Handling Pagination
If the API you’re working with implements pagination, it means that the data is split across different pages or endpoints.
This is a common technique used to limit the amount of data returned in a single request.
To retrieve all the data, you need to iterate through the paginated endpoints.
Here’s an example of how to handle pagination with the requests
library:
import requests import pandas as pd api_key = 'your_secure_api_key' base_url = 'your_api_endpoint_url' headers = {'Authorization': f'Bearer {api_key}'} full_data = pd.DataFrame() # Initial page number page = 1 while True: # Construct the URL with the current page number url = f"{base_url}?page={page}" response = requests.get(url, headers=headers) if response.ok: current_page_data = pd.read_json(response.text) full_data = pd.concat([full_data, current_page_data], ignore_index=True) # Check if there's a 'next' page. if 'next' in response.links: page += 1 else: break else: print("Failed to retrieve data. Status code:", response.status_code) break print(full_data)
Output:
id plan status calls data 0 1 Unlimited active 300 15.5 1 2 Unlimited active 150 10.0 ...
In the above code, we loop until all pages have been fetched. The page
variable is used to keep track of the current page number.
We check for the presence of a ‘next’ link in the response’s links
object to determine if we should continue fetching more pages. The data from each page is concatenated to the full_data
DataFrame.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.