Encoding Surprises: When requests Assumes Latin-1 Instead of UTF-8

Hardcoded Latin-1 encoding in HTTP auth headers causes UnicodeEncodeError for non-Latin usernames. The fix switches to UTF-8, which handles the full Unicode range.

The bottom line: Hardcoded Latin-1 encoding in HTTP auth headers causes UnicodeEncodeError for non-Latin usernames.


The Problem

psf/requests issue #6102 exposes a subtle edge case in how python handles boundary conditions. The fix is only 2 lines, but the pattern behind it applies across projects.

PR: https://github.com/psf/requests/pull/7463

Status: Submitted (awaiting review)

Hardcoded character encodings are a ticking time bomb. When code assumes latin-1 for string encoding, it works for English, German, and most Western European users — but breaks for anyone with Chinese, Japanese, Korean, Arabic, or emoji in their input.

import base64

# Before: Latin-1 breaks non-Latin characters
def basic_auth_header(username, password):
    raw = f'{username}:{password}'.encode('latin-1')
    # UnicodeEncodeError if username contains non-Latin chars
    return 'Basic ' + base64.b64encode(raw).decode()

# After: UTF-8 handles the full Unicode range
def basic_auth_header(username, password):
    raw = f'{username}:{password}'.encode('utf-8')
    return 'Basic ' + base64.b64encode(raw).decode()

Key Takeaway

Never hardcode Latin-1 for user-provided strings. Always use UTF-8 — it’s backward-compatible with ASCII and handles the full Unicode range. The error won’t appear in testing with English data.


Discovered while fixing psf/requests#6102. View the fix post for the specific diff.