Regex Surprises: When gitleaks's Pattern Matching Breaks

Regular expressions with optional groups and alternative branches produce unexpected matches on edge case inputs, causing downstream failures.

The bottom line: Regular expressions with optional groups and alternative branches produce unexpected matches on edge case inputs, causing downstream failures..


The Problem

gitleaks/gitleaks issue #2121 exposes a subtle edge case in how None handles boundary conditions. The fix is only 1 line, but the pattern behind it applies across projects.

PR: https://github.com/gitleaks/gitleaks/pull/2163

Status: Submitted (awaiting review)

Regular expressions with optional groups and alternatives behave unexpectedly when input matches multiple branches. The regex engine backtracks and may match unintended branches.

import re

# Edge case: optional group matches differently than expected
pattern = r'(foo)?(bar)?'
# Matches 'foobar', 'foo', 'bar', AND empty string!
result = re.match(pattern, '')  # Matches: both groups omitted

Key Takeaway

Always test regex against empty and minimal inputs. Optional groups make the seemingly-mandatory actually optional.


Discovered while fixing gitleaks/gitleaks#2121. View the fix post for the specific diff.