Programming

Python Requests: A Comprehensive Guide – Including get, post, put, delete, Exception Handling, and Basic Authentication –

Python Requests: A Comprehensive Guide - Including get, post, put, delete, Exception Handling, and Basic Authentication -

Hello, this is Zero-Cheese.

In this article, I’ve compiled the usage of the requests package, a standard HTTP communication library in Python.

By using the requests package,

  • You can collect images and information from websites,
  • You can easily implement operations that require authentication (e.g., automatic stock trading), and more.

This article is aimed at

  • those who are at least at the beginner’s level in Python.

In this article, I tried to cover the most frequently used ways to use the requests package.

You can jump to the necessary sections via the links in the table of contents below. (Each section is designed to be understood even when viewed independently.)

Please note that the ‘URLs, parameter values, and execution results’ shown in the code examples in this article are just examples.

Feel free to modify them according to your own use case.

Also, the following import is required for the code in this article.

import requests

Installation Method

You can install it via pip.

pip install requests

Basic Usage

Here, we will look at the basic usage of the requests package

as an example.

Code:
import requests

url = 'https://www.google.co.jp/'
response = requests.get(url)
print(response.text)

In the case of the HTTP GET method,

  • you execute the requests.get method with the URL you want to get as an argument,
  • and the information you want to get is assigned to the text attribute of the returned value (object).

The execution result of the above code:

<!doctype html>
<html itemscope="" itemtype="http://schema.org/WebPage" lang="ja">
<head><meta content="世界中のあらゆる情報を検索するためのツールを提供しています。さまざまな検索機能を活用して、お探しの情報を見つけてください。" name="description">
<meta content="noodp" name="robots">
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
<meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Google</title>

-- omitted --

</body>
</html>

How to Use the Returned Value (Response Object)

This section explains the returned value (Response object) that results from calling the requests.get method or the post method (to be described later).

The variable ‘response’ that is assigned when you call the method below is an explanation of the object.

response = requests.get('https://www.google.co.jp/')

Obtaining HTTP Status Code

The HTTP status code is a numerical representation of the meaning of the response, such as:

  • 100 series → Informational
  • 200 series → Success
  • 300 series → Redirection
  • 400 series → Client error
  • 500 series → Server error

You may often see ‘404 Not Found’, which is also a status code.

You can find more about HTTP status codes here

print(response.status_code)
# ⬆︎  Execution result: 200

When redirected, only the information from the final page is displayed.

In this case, you can use “response.history” to retrieve the redirect history as a list (with elements being Response objects).

Here is a code example (an instance where redirection occurred once at the URL destination):

url = 'https://www.udemy.com/ja'
response = requests.get(url, headers=headers)

print(response.history)
# Execution result: [<Response [301]>]
# The result is a list of Response objects

print(response.history[0].status_code)
# Execution result: 301

Response headers

To fetch the response headers, additional information about the response content:

print(response.headers)
# Execution result: {'Date': 'Thu, 15 Sep 2022 11:43:49 GMT', 
#            'Expires': '-1',...Omitted below

The headers can be obtained in dictionary format, so you can specify the key and retrieve it as follows:

print(response.headers['Content-Type'])

Response time

To fetch the response time:

print(response.elapsed)
# Execution result: 0:00:00.385989

Encoding

To get the encoding:

print(response.encoding)
# Execution result: Shift_JIS

Response content

To fetch the response content as a string:

print(response.text)
# Execution result: <!doctype html><html itemscope=""...Omitted below

If you encounter garbled characters, the encoding settings might not be appropriate.

The setting for encoding will attempt to read the charset information contained in the content-type within the HTTP response headers.

If this is not specified, the default value (ISO-8859-1) will be used.

To resolve this, the requests module has a feature to estimate the character code.

The code would be as follows:

response.encoding = response.apparent_encoding
# After executing the above command, decode the content as below
print(response.text)

For handling JSON data in the response:

For handling JSON data in the response:

The JSON data obtained from the web can include Unicode escape sequences (\u30b0\u30a4\u30f3 and others starting with \u), and handling them can be a little troublesome.

By using the requests.json() method, you can solve these issues and convert them into a Python dictionary or a list of dictionaries.

# For example, to get the Bitcoin price from BitFlyer:
url = 'https://api.bitflyer.com/v1/ticker'
params = {'product_code': 'btc_jpy'}
response = requests.get(url, params=params)
# Convert JSON
print(response.json())
# Execution result: {'product_code': 'BTC_JPY',...Omitted below

By the way, to check whether JSON data is being sent, you can verify the Content-Type in the response header:

print(response.headers['Content-Type'])
# Execution result: application/json; charset=utf-8

Save the data as a file

If you want to save the data as a file (for example, a Zip file), you can use the binary data saved in the “content” of the response object:

# The following URL provides a CSV containing city/town information in a ZIP file.
url = 'http://zipcloud.ibsnet.co.jp/zipcodedata/download?di=1661933424166'
response = requests.get(url)
# The first argument of the method below specifies the file path
with open('./ken.zip', 'wb') as f:
    f.write(response.content)

print(response.headers)
# Execution result: {'Content-Type': 'application/octet-stream; charset=UTF-8', 
#            'Content-Disposition': 'attachment; filename=ken-all202208.zip', 
#            ...Omitted below

For your reference, the response header information for this case is described below.

The filename you’re trying to download can be obtained by looking at the ‘Content-Disposition’.

  • application/octet-stream: This indicates an arbitrary binary format (used when the file format is unknown)
  • The filename you’re trying to download can be obtained by looking at the ‘Content-Disposition’.

Strongly recommend running a virus check in the process of retrieving and saving file information.

Case-Specific HTTP GET Method Calls

Specifying only the URL

As already mentioned, here is the code again:

url = 'https://www.google.com/' # Example of Google
response = requests.get(url)

Specifying URL Parameters

This is the case of sending parameters with the HTTP GET method.

For URLs like the one below, the part displayed after the ‘&’ is the parameter.

https://www.google.com/?gl=us&hl=en&gws_rd=cr&pws=0
# In this case, gl=us&hl=en&gws_rd=cr&pws=0 is the parameter part

Parameters are set as a dictionary.

url = 'https://www.google.com/'
params = {'q': 'Python',
          'oe': 'utf-8'}
response = requests.get(url, params=params)

Case-Specific HTTP POST Method Calls

When the transmitted data is a string

This is when the transmitted data is a string.

url = 'https://www.google.co.jp/'
data = {'key1': 'value1'}
response = requests.post(url, data=data)

When we used requests.get, we set the transmitted data as a dictionary in the params argument.

In this case, the parameters are attached to the URL as a string and sent.

On the other hand, when set in the data argument, it is transmitted in the request body.

While you can send data with the params argument even with the post method, it is not typically necessary due to the nature of POST.

When the transmitted data is JSON data

When sending JSON data, encoding is necessary.

import json

url = 'https://www.google.co.jp/'
data = {'key1': 'value1'}
data_encode = json.dumps(data)
response = requests.post(url, data=data_encode)

When the transmitted data is a file

When sending a file, the open method is used.

url = 'https://www.google.co.jp/'
# In case of an image file
img_data = open('image.jpg', 'rb')
files = {'file': img_data}
response = requests.post(url, files=files)

By the way, there is no point in sending an image file to Google’s site.

You will likely receive a 413 response (Payload Too Large).

Case-Specific HTTP PUT Method Calls

The main role of the PUT method is to ‘replace’ existing resources.

Therefore, it is common to send an identifier when making a request.

When the transmitted data is a string

url = 'https://www.google.co.jp/'
data = {'key1': 'value1'}
response = requests.put(url, data=data)

When the transmitted data is JSON

import json

url = 'https://www.google.co.jp/'
data = {'key1': 'value1'}
data_encode = json.dumps(data)
response = requests.put(url, data=data_encode)

Case-Specific HTTP DELETE Method Calls

The main role of the DELETE method is to ‘delete’ resources.

Like PUT, it is common to send an identifier when making a request.

When the transmitted data is a string

url = 'https://www.google.co.jp/'
data = {'key1': 'value1'}
response = requests.delete(url, data=data)

When the transmitted data is JSON

import json

url = 'https://www.google.co.jp/'
data = {'key1': 'value1'}
data_encode = json.dumps(data)
response = requests.delete(url, data=data_encode)

Common Procedures for All Methods

Specifying a timeout for the request (recommended)

If you set a timeout,

  • requests.exceptions.Timeout exception

will occur if there is no response from the other party within the specified time.

By default, the value is None, which means it will wait indefinitely for a response from the other side.

The official documentation also recommends setting a timeout.

response = requests.get(url, timeout=3)
# In the above case, set for 3 seconds

Specifying request headers

For instance, I will introduce an example of sending a user agent with a GET request.

You will set the headers as a dictionary.

url = 'https://www.google.co.jp/'
headers = {'User-Agent': 'something'}
response = requests.get(url, headers=headers)

Sending a request with Basic authentication

Here, I will introduce the code for Basic authentication.

This is the most basic method that requires a username and password.

The code below introduces the case for GET and POST transmission.

from requests.auth import HTTPBasicAuth

# The following is a site where you can test Basic authentication.
url = 'http://leggiero.sakura.ne.jp/xxxxbasic_auth_testxxxx/secret/kaiin_page_top.htm'
# Specify the username
username = 'kaiin'
# Specify the password
password = 'naisho'

# Case of GET transmission
response = requests.get(url, auth=requests.auth.HTTPBasicAuth(username, password))
# Case of POST transmission
data = {'key1': 'value1'}
response = requests.post(url, data=data, auth=requests.auth.HTTPBasicAuth(username, password))

Exception handling in the requests package

Because HTTP requests can result in

  • Connection errors
  • Timeout errors

among other issues, exception handling is mandatory.

Exceptions that occur during HTTP communication with requests are all

  • requests.exceptions.RequestException objects
  • or objects that inherit from this class.

Below are some commonly used exceptions in requests.

(All inherit from requests.exceptions.RequestException.)

  • ConnectionError: Occurs when there is a DNS error or connection disruption.
  • Timeout: Occurs when a timeout happens.
  • SSLError: Occurs with an SSL connection error.
  • HttpError: Occurs when there is an incorrect HTTP response.
  • TooManyRedirects: Occurs when the number of redirects exceeds the maximum.

When using requests, we generally use a try-except statement to handle exceptions.

If an exception cannot be caught, the system will terminate forcefully.

A basic code is as follows:

try:
    url = 'https://www.google.co.jp/'
    response = requests.get(url, timeout=5)
except requests.exceptions.RequestException as e:
    # Log output or exception handling
    print("RequestException: ", e)

When the HTTP status is not in the 200 range (successful completion),

  • using raise_for_status()

will cause a HttpError exception.

The usage is as follows (line 4):

try:
    url = 'https://www.google.co.jp/'
    response = requests.get(url, timeout=5)
    # If the HTTP status code is not in the 200s, HTTPError occurs
    response.raise_for_status()
except requests.exceptions.RequestException as e:
    print("RequestException: ", e)

If you want to handle each exception in detail, you can write as follows.

In the except clause, write the parent object requests.exceptions.RequestException, at the very end.

try:
    url = 'https://www.google.co.jp/'
    response = requests.get(url, timeout=5)
    # If the HTTP status code is not in the 200s, HTTPError occurs.
    response.raise_for_status()
except requests.exceptions.HTTPError as e:
    print("HTTPError:", e)
except requests.exceptions.ConnectionError as e:
    print("ConnectionError:", e)
except requests.exceptions.Timeout as e:
    print("Timeout:", e)
except requests.exceptions.RequestException as e:
    print("RequestException: ", e)

Finally,

In this article, we have compiled various operations related to the requests package.

When performing HTTP communication from Python,

there is a method to use

  • the standard library, urllib.

However, from the standpoint of code simplicity, we highly recommend the requests package.

By using the requests package, you’ll likely face almost no troubles with HTTP requests, whether it’s for web scraping, financial transactions, and so on.

We hope this article proves useful to everyone.

Looking forward to meeting you again!