Regex Surprises: When gitleaks's Pattern Matching Breaks
Regular expressions with optional groups and alternative branches produce unexpected matches on edge case inputs, causing downstream failures.
The bottom line: Regular expressions with optional groups and alternative branches produce unexpected matches on edge case inputs, causing downstream failures..
The Problem
gitleaks/gitleaks issue #2121 exposes a subtle edge case in how None handles boundary conditions. The fix is only 1 line, but the pattern behind it applies across projects.
PR: https://github.com/gitleaks/gitleaks/pull/2163
Status: Submitted (awaiting review)
Regular expressions with optional groups and alternatives behave unexpectedly when input matches multiple branches. The regex engine backtracks and may match unintended branches.
import re
# Edge case: optional group matches differently than expected
pattern = r'(foo)?(bar)?'
# Matches 'foobar', 'foo', 'bar', AND empty string!
result = re.match(pattern, '') # Matches: both groups omitted
Key Takeaway
Always test regex against empty and minimal inputs. Optional groups make the seemingly-mandatory actually optional.
Discovered while fixing gitleaks/gitleaks#2121. View the fix post for the specific diff.