Encoding Surprises: When requests Assumes Latin-1 Instead of UTF-8
Hardcoded Latin-1 encoding in HTTP auth headers causes UnicodeEncodeError for non-Latin usernames. The fix switches to UTF-8, which handles the full Unicode range.
The bottom line: Hardcoded Latin-1 encoding in HTTP auth headers causes UnicodeEncodeError for non-Latin usernames.
The Problem
psf/requests issue #6102 exposes a subtle edge case in how python handles boundary conditions. The fix is only 2 lines, but the pattern behind it applies across projects.
PR: https://github.com/psf/requests/pull/7463
Status: Submitted (awaiting review)
Hardcoded character encodings are a ticking time bomb. When code assumes latin-1 for
string encoding, it works for English, German, and most Western European users — but breaks
for anyone with Chinese, Japanese, Korean, Arabic, or emoji in their input.
import base64
# Before: Latin-1 breaks non-Latin characters
def basic_auth_header(username, password):
raw = f'{username}:{password}'.encode('latin-1')
# UnicodeEncodeError if username contains non-Latin chars
return 'Basic ' + base64.b64encode(raw).decode()
# After: UTF-8 handles the full Unicode range
def basic_auth_header(username, password):
raw = f'{username}:{password}'.encode('utf-8')
return 'Basic ' + base64.b64encode(raw).decode()
Key Takeaway
Never hardcode Latin-1 for user-provided strings. Always use UTF-8 — it’s backward-compatible with ASCII and handles the full Unicode range. The error won’t appear in testing with English data.
Discovered while fixing psf/requests#6102. View the fix post for the specific diff.