Read JSON from API using requests and Pandas read_json

In this tutorial, you will learn how to use the requests library to fetch JSON data from an API and then use the read_json() function from Pandas to load this data into a DataFrame.

You’ll learn how to retrieve JSON data from APIs, check the response status, handle authentication and pagination, and more.

 

 

Making a GET Request

To start retrieving JSON data from an API, you’ll first need to make an HTTP GET request. This is done using the requests library.

When you make a GET request, the requests library contacts the API and retrieves the data that the API responds with, often in the form of a JSON.

Here’s a snippet of code demonstrating how to make a GET request:

import requests
url = 'your_api_endpoint_url'
response = requests.get(url)
if response.status_code == 200:
    print("Data retrieved successfully!")
    data = response.json()
    print(data)
else:
    print("Failed to retrieve data. Status code:", response.status_code)

Output:

Data retrieved successfully!
{ "id": 1, "plan": "Unlimited", "status": "active", "usage": {"calls": 300, "data": 15.5} }

In the output above, the requests.get() function is used to contact the API at the given URL.

The API responds with a status code of 200, which indicates success, along with the JSON data.

 

Reading JSON Data into a DataFrame

To convert JSON data to a DataFrame, you’ll use the read_json() function.

Here is how you can use read_json():

import requests
import pandas as pd
url = 'your_api_endpoint_url'
response = requests.get(url)
if response.ok:
    df = pd.read_json(response.text)
    print(df)
else:
    print("Failed to retrieve data. Status code:", response.status_code)

Output:

   id       plan  status  calls  data
0   1  Unlimited  active    300  15.5

The code above takes the JSON response and passes it directly to pd.read_json().

The function reads the JSON and maps it into a DataFrame, which is printed to the console.

The DataFrame df displays the data in a tabular format, with each key from the JSON object becoming a column in the DataFrame.

 

Handling Authentication

APIs require a key, token, or username and password combination for authentication.

In the case of an API key, it’s usually included in the request’s headers or as a query parameter in the URL.

It’s essential to handle these credentials securely and never hard-code them into your scripts, especially if you’re sharing the code or using version control systems like Git.

Here’s how you can pass an API key using headers:

import requests
import pandas as pd
api_key = 'your_secure_api_key'
url = 'your_api_endpoint_url'
headers = {
    'Authorization': f'Bearer {api_key}'
}
response = requests.get(url, headers=headers)
if response.ok:
    df = pd.read_json(response.text)
    print(df)
else:
    print("Failed to retrieve data. Status code:", response.status_code)

Output:

   id       plan  status  calls  data
0   1  Unlimited  active    300  15.5

In the example above, the API key is included in the request headers as a bearer token.

The request is made with the additional headers argument, and upon success, the JSON data is loaded into a DataFrame.

 

Handling Pagination

If the API you’re working with implements pagination, it means that the data is split across different pages or endpoints.

This is a common technique used to limit the amount of data returned in a single request.

To retrieve all the data, you need to iterate through the paginated endpoints.

Here’s an example of how to handle pagination with the requests library:

import requests
import pandas as pd
api_key = 'your_secure_api_key'
base_url = 'your_api_endpoint_url'
headers = {'Authorization': f'Bearer {api_key}'}
full_data = pd.DataFrame()

# Initial page number
page = 1
while True:
    # Construct the URL with the current page number
    url = f"{base_url}?page={page}"
    response = requests.get(url, headers=headers)
    if response.ok:
        current_page_data = pd.read_json(response.text)
        full_data = pd.concat([full_data, current_page_data], ignore_index=True)

        # Check if there's a 'next' page.
        if 'next' in response.links:
            page += 1
        else:
            break
    else:
        print("Failed to retrieve data. Status code:", response.status_code)
        break
print(full_data)

Output:

    id       plan  status  calls  data
0    1  Unlimited  active    300  15.5
1    2  Unlimited  active    150  10.0
...

In the above code, we loop until all pages have been fetched. The page variable is used to keep track of the current page number.

We check for the presence of a ‘next’ link in the response’s links object to determine if we should continue fetching more pages. The data from each page is concatenated to the full_data DataFrame.

Leave a Reply

Your email address will not be published. Required fields are marked *