Encoding Surprises: When requests Assumes Latin-1 Instead of UTF-8

Hardcoded Latin-1 encoding in HTTP auth headers causes UnicodeEncodeError for non-Latin usernames. The fix switches to UTF-8, which handles the full Unicode range.

The bottom line: Hardcoded Latin-1 encoding in HTTP auth headers causes UnicodeEncodeError for non-Latin usernames.


The Problem

psf/requests issue #6102 exposes a subtle edge case in how python handles boundary conditions. The fix is only 2 lines, but the pattern behind it applies across projects.

PR: https://github.com/psf/requests/pull/7463

Status: Submitted (awaiting review)

Hardcoded character encodings are a ticking time bomb. When code assumes latin-1 for string encoding, it works for English, German, and most Western European users — but breaks for anyone with Chinese, Japanese, Korean, Arabic, or emoji in their input.

import base64

# Before: Latin-1 breaks non-Latin characters
def basic_auth_header(username, password):
    raw = f'{username}:{password}'.encode('latin-1')
    # UnicodeEncodeError if username contains non-Latin chars
    return 'Basic ' + base64.b64encode(raw).decode()

# After: UTF-8 handles the full Unicode range
def basic_auth_header(username, password):
    raw = f'{username}:{password}'.encode('utf-8')
    return 'Basic ' + base64.b64encode(raw).decode()

How to Apply This to Your Code

Audit Your Auth Code for Latin-1 Assumptions

  1. grep for encode('latin-1')grep -rn "encode.*latin" src/**/*.py | grep -v test. Every hit is a potential UnicodeEncodeError for international users. Replace 'latin-1' with 'utf-8'.

  2. grep for encode('iso-8859-1') — same issue, different alias. UTF-8 covers the full Unicode range.

  3. Check header construction code — not just auth. Any header that includes user-provided strings (User-Agent, X-Forwarded-For, custom headers) with hardcoded encoding can silently break.

  4. Add a test with non-Latin inputrequests.get(url, auth=HTTPDigestAuth('ユーザー名', 'パスワード')). If this errors or produces garbled headers, you’ve found the bug.

Detection Checklist

Pattern Risk Fix
encode('latin-1') in auth UnicodeEncodeError for CJK, Arabic, emoji encode('utf-8')
encode('iso-8859-1') in headers Same encoding issue encode('utf-8')
b64encode(str.encode('latin-1')) Auth tokens are invalid for non-ASCII users UTF-8 encode before base64
No non-ASCII test coverage Bug exists but undetected Add pytest case with CJK credentials

When to Check

  • After any upgrade to requests, urllib3, or httpx — library changes can break encoding assumptions
  • When adding international user support to any system with HTTP auth
  • During code review of any PR that touches auth.py or header construction
  • Any time you see encode('latin-1') in a codebase — it’s almost always wrong for modern applications

Key Takeaway

Never hardcode Latin-1 for user-provided strings. Always use UTF-8 — it’s backward-compatible with ASCII and handles the full Unicode range. The error won’t appear in testing with English data.


Discovered while fixing psf/requests#6102. View the fix post for the specific diff.