Regular expression Denial of Service - ReDoS
Last updated
Last updated
Copied from https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS****
The Regular expression Denial of Service (ReDoS) is a Denial of Service attack, that exploits the fact that most Regular Expression implementations may reach extreme situations that cause them to work very slowly (exponentially related to input size). An attacker can then cause a program using a Regular Expression to enter these extreme situations and then hang for a very long time.
The Regular Expression naïve algorithm builds a Nondeterministic Finite Automaton (NFA), which is a finite state machine where for each pair of state and input symbol there may be several possible next states. Then the engine starts to make transition until the end of the input. Since there may be several possible next states, a deterministic algorithm is used. This algorithm tries one by one all the possible paths (if needed) until a match is found (or all the paths are tried and fail).
For example, the Regex ^(a+)+$
is represented by the following NFA:
For the input aaaaX
there are 16 possible paths in the above graph. But for aaaaaaaaaaaaaaaaX
there are 65536 possible paths, and the number is double for each additional a
. This is an extreme case where the naïve algorithm is problematic, because it must pass on many many paths, and then fail.
Notice, that not all algorithms are naïve, and actually Regex algorithms can be written in an efficient way. Unfortunately, most Regex engines today try to solve not only “pure” Regexes, but also “expanded” Regexes with “special additions”, such as back-references that cannot be always be solved efficiently (see Patterns for non-regular languages in Wiki-Regex for some more details). So even if the Regex is not “expanded”, a naïve algorithm is used.
A Regex is called “evil” if it can stuck on crafted input.
Evil Regex pattern contains:
Grouping with repetition
Inside the repeated group:
Repetition
Alternation with overlapping
Examples of Evil Patterns:
(a+)+
([a-zA-Z]+)*
(a|aa)+
(a|a?)+
(.*a){x} for x \> 10
All the above are susceptible to the input aaaaaaaaaaaaaaaaaaaaaaaa!
(The minimum input length might change slightly, when using faster or slower machines).