Perl Compatible Regular Expressions
{{Short description|Software library for interpreting regular expressions}}
{{Infobox software
| name = Perl Compatible Regular Expressions
| logo =
| author = Philip Hazel
| developer =
| released =
| ver layout = stacked
| latest release version = {{multiple releases
| branch1 = PCRE1
| version1 = 8.45
| date1 = {{Start date and age|2021|06|15}}
| branch2 = PCRE2
| version2 = 10.45
| date2 = {{Start date and age|2025|02|05}}
}}
| latest_preview_version =
| latest_preview_date =
| programming language = C
| operating_system = Cross-platform
| genre = Pattern matching library
| license = BSD
| website = {{Official URL}}
}}
Perl Compatible Regular Expressions (PCRE) is a library written in C, which implements a regular expression engine, inspired by the capabilities of the Perl programming language. Philip Hazel started writing PCRE in summer 1997. PCRE's syntax is much more powerful and flexible than either of the POSIX regular expression flavors (BRE, ERE) and than that of many other regular-expression libraries.
While PCRE originally aimed at feature-equivalence with Perl, the two implementations are not fully equivalent. During the PCRE 7.x and Perl 5.9.x phase, the two projects coordinated development, with features being ported between them in both directions.
In 2015, a fork of PCRE was released with a revised programming interface (API). The original software, now called PCRE1 (the 1.xx–8.xx series), has had bugs mended, but no further development. {{As of|2020}}, it is considered obsolete, and the current 8.45 release is likely to be the last. The new PCRE2 code (the 10.xx series) has had a number of extensions and coding improvements and is where development takes place.
A number of prominent open-source programs, such as the Apache and Nginx HTTP servers, and the PHP and R scripting languages, incorporate the PCRE library; proprietary software can do likewise, as the library is BSD-licensed. As of Perl 5.10, PCRE is also available as a replacement for Perl's default regular-expression engine through the
The library can be built on Unix, Windows, and several other environments. PCRE2 is distributed with a POSIX C wrapper, several test programs, and the utility program
Features
= Just-in-time compiler support =
The just-in-time compiler can be enabled when the PCRE2 library is built. Large performance benefits are possible when (for example) the calling program utilizes the feature with compatible patterns that are executed repeatedly. The just-in-time compiler support was written by Zoltan Herczeg and is not addressed in the POSIX wrapper.
= Flexible memory management =
The use of the system stack for backtracking can be problematic in PCRE1, which is why this feature of the implementation was changed in PCRE2. The heap is now used for this purpose, and the total amount can be limited. The problem of stack overflow, which came up regularly with PCRE1, is no longer an issue with PCRE2 from release 10.30 (2017).
= Consistent escaping rules =
Like Perl, PCRE2 has consistent escaping rules: any non-alpha-numeric character may be escaped to mean its literal value by prefixing a
= Extended character classes =
Single-letter character classes are supported in addition to the longer POSIX names. For example,
= Minimal matching (a.k.a. "ungreedy") =
A
If the U
flag is set, then quantifiers are ungreedy (lazy) by default, while ?
makes them greedy.
= Unicode character properties =
Unicode defines several properties for each character. Patterns in PCRE2 can match these properties: e.g.
= Multiline matching =
= Newline/linebreak options =
When PCRE is compiled, a newline default is selected. Which newline/linebreak is in effect affects where PCRE detects
The newline option can be altered with external options when PCRE is compiled and when it is run. Some applications using PCRE provide users with the means to apply this setting through an external option. So the newline option can also be stated at the start of the pattern using one of the following:
(*LF) Newline is a linefeed character. Corresponding linebreaks can be matched with\n .(*CR) Newline is a carriage return. Corresponding linebreaks can be matched with\r .(*CRLF) Newline/linebreak is a carriage return followed by a linefeed. Corresponding linebreaks can be matched with\r\n .(*ANYCRLF) Any of the above encountered in the data will trigger newline processing. Corresponding linebreaks can be matched with(?:\r\n?|\n) or with\R . See below for configuration and options concerning what matches backslash-R.(*ANY) Any of the above plus special Unicode linebreaks.
When not in UTF-8 mode, corresponding linebreaks can be matched with
In UTF-8 mode, two additional characters are recognized as line breaks with
- LS (line separator, U+2028),
- PS (paragraph separator, U+2029).
On Windows, in non-Unicode data, some of the
For example,
See below for configuration and options concerning what matches backslash-R.
= Backslash-R options =
When PCRE is compiled, a default is selected for what matches (*newline)
option can be provided in addition to a (*BSR_UNICODE)(*ANY)rest-of-pattern
. The backslash-R options also can be changed with external options by the application calling PCRE2, when a pattern is compiled.
= Beginning of pattern options =
Linebreak options such as
= Backreferences =
A pattern may refer back to the results of a previous match. For example,
= Named subpatterns =
A sub-pattern (surrounded by parentheses, like
This feature was subsequently adopted by Perl, so now named groups can also be defined using
= Subroutines =
While a backreference provides a mechanism to refer to that part of the subject that has previously matched a subpattern, a subroutine provides a mechanism to reuse an underlying previously defined subpattern. The subpattern's options, such as case independence, are fixed when the subpattern is defined.
= Atomic grouping =
Atomic grouping is a way of preventing backtracking in a pattern. For example,
= Look-ahead and look-behind assertions =
id="lookbehind" class="floatright wikitable"
! Assertion !! Lookbehind !! Lookahead |
Positive
| style="text-align:center;font-size:125%;font-family:monospace;"|(?<=pattern) | style="text-align:center;font-size:125%;font-family:monospace;"|(?=pattern) |
---|
Negative
| style="text-align:center;font-size:125%;font-family:monospace;"|(?<!pattern) | style="text-align:center;font-size:125%;font-family:monospace;"|(?!pattern) |
colspan="3"|Look-behind and look-ahead assertions in Perl regular expressions |
Patterns may assert that previous text or subsequent text contains a pattern without consuming matched text (zero-width assertion). For example, /
Look-behind assertions cannot be of uncertain length though (unlike Perl) each branch can be a different fixed length.
= Escape sequences for zero-width assertions =
E.g.
= Comments =
A comment begins with
= Recursive patterns =
A pattern can refer back to itself recursively or to any subpattern. For example, the pattern
= Generic callouts =
PCRE expressions can embed (?Cn)
, where n is some number. This will call out to an external user-defined function through the PCRE API and can be used to embed arbitrary code in a pattern.
Differences from Perl
= Until release 10.30 recursive matches were atomic in PCRE and non atomic in Perl =
This meant that
= The value of a capture buffer deriving from the <syntaxhighlight lang="text" inline>?</syntaxhighlight> quantifier (match 1 or 0 times) when nested in another quantified capture buffer is different =
In Perl
= PCRE allows named capture buffers to be given numeric names; Perl requires the name to follow the rule of barewords =
= PCRE allows alternatives within lookbehind to be different lengths =
Within lookbehind assertions, both PCRE and Perl require fixed-length patterns.
That is, both PCRE and Perl disallow variable-length patterns using quantifiers within lookbehind assertions.
However, Perl requires all alternative branches of a lookbehind assertion to be the same length as each other, whereas PCRE allows those alternative branches to have different lengths from each other as long as each branch still has a fixed length.
= PCRE does not support certain "experimental" Perl constructs =
Such as
Recursion control verbs added in the Perl 5.9.x series are also not supported.
Support for experimental backtracking control verbs (added in Perl 5.10) is available in PCRE since version 7.3.
They are
Perl's corresponding use of arguments with backtracking control verbs is not generally supported.
Note however that since version 8.10, PCRE supports the following verbs with a specified argument:
Since version 10.32 PCRE2 has supported
= PCRE and Perl are slightly different in their tolerance of erroneous constructs =
Perl allows quantifiers on the
= PCRE has a hard limit on recursion depth, Perl does not =
With default build options
Perl uses the heap for recursion and has no hard limit for recursion depth, whereas PCRE2 has a compile-time default limit that can be adjusted up or down by the calling application.
=Verification=
With the exception of the above points, PCRE is capable of passing the tests in the Perl "t/op/re_tests
" file, one of the main syntax-level regression tests for Perl's regular expression engine.
See also
{{Portal|Free and open-source software}}
Notes and references
= Notes =
The core PCRE2 library provides both matching and match and replace functionality.
Sure the
Caveat: If the pattern
\x{0085} \u0085
= References =
Final release of PCRE1: https://lists.exim.org/lurker/message/20210615.162400.c16ff8a3.en.html
Releases: https://github.com/PCRE2Project/pcre2/releases
Exim and PCRE: How free software hijacked my life (1999-12), by Philip Hazel, p. 7: https://www.ukuug.org/events/winter99/proc/PH.ps
{{blockquote|1=
What about PCRE?
- Written summer 1997, placed on ftp site.
- People found it, and started a mailing list.
- There has been a trickle of enhancements.}}
- Regular Expression - POSIX Standard: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html
- Utilities § Pattern Matching Notation: https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/utilities/V3_chap02.html#tag_18_13
- Base Definitions § Basic Regular Expressions: https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/basedefs/V1_chap09.html#tag_09_03
- Rationale § Regular Expressions: https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/xrat/V4_xbd_chap09.html#tag_21_09
PCRE2 - Perl-compatible regular expressions (revised API) (2020), by University of Cambridge: https://pcre.org/pcre2.txt
Differences Between PCRE2 and Perl (2019-07-13), by Philip Hazel: https://www.pcre.org/current/doc/html/pcre2compat.html
Quote PCRE changelog (https://www.pcre.org/original/changelog.txt): "Perl no longer allows group names to start with digits, so I have made this change also in PCRE."
ChangeLog for PCRE2: https://www.pcre.org/changelog.txt
External links
- {{Official website}}
- PCRE - Development mailing list: https://groups.google.com/g/pcre2-dev
- PCRE - Bug Tracker: https://github.com/PCRE2Project/pcre2/issues
- Pattern Matching Using Regular Expressions (2010-03-02), by Nick Maclaren, Philip Hazel: https://www-uxsup.csx.cam.ac.uk/courses/moved.REs/paper.pdf
- pcre 8.43 (2019-04) - Windows Cygwin x86-64: https://www-uxsup.csx.cam.ac.uk/pub/windows/cygwin/x86_64/release/pcre/