Regex Basics

50 %
50 %
Information about Regex Basics
Technology

Published on February 20, 2009

Author: phpcodemonkey

Source: slideshare.net

Description

Ciarán Walsh's PHPNW08 slides:

In the right hands regular expressions can be a powerful tool, but it’s also far too easy for them to be used badly, or in the wrong situations.

This talk will kick off with a look at alternatives to regular expressions, for when the power of pattern matching is not required, and will also go over some cases when there are better alternatives available.
Then there will be a brief refresher on pattern syntax and some general tips and tricks to help when constructing regular expressions, before we go on to look at some situations where the use of pattern matching is a good fit, how to solve some common problems, and some common pitfalls when writing patterns.

Regular Expression Basics PHPNW 2008 Ciarán Walsh

PHPNW 2008

Ciarán Walsh

What are regular expressions? Regular expressions allow matching and manipulation of textual data. Abbreviated as regex or regexp , or alternatively just “patterns”.

Regular expressions allow matching and manipulation of textual data.

Abbreviated as regex or regexp , or alternatively just “patterns”.

Regular Expression Basics Literals bus Matches a ‘ b ’, followed by a ‘ u ’, followed by an ‘ s ’

Regular Expression Basics Anchors ^ Matches at the beginning of a line $ Matches at the end of a line

Regular Expression Basics Character Classes [abc] Matches one of ‘ a ’, ‘ b ’ or ‘ c ’ [a-c] Same as above (character range) [^abc] Matches one character that is not listed . Matches any single character

Regular Expression Basics Alternation a|b Matches one of ‘ a ’ or ‘ b ’ dog|cat Matches one of “dog” or “cat”

Regular Expression Basics Quantifiers (repetition) {x,y} Matches minimum of x and a maximum of y occurrences; either can be omitted * Matches zero or more occurrences (any amount). Same as {0,} + Matches one or more occurrences. Same as {1,} ? Matches zero or one occurrences. Same as {0,1}

Regular Expression Basics Grouping (…) Groups the contents of the parentheses. Affects alternation and quantifiers. Allows parts of the match to be captured for|backward “ for” or “backward” (for|back)ward “ forward” or “backward”

Regular Expression Basics Delimiters pattern / modifiers / /i Makes match case-insensitive

Performing a Match Returns number of matches (0 or 1) $matches will contain captured groups preg_match ( '/Te(.)f?/i' , 'text' , $ matches );

Returns number of matches (0 or 1)

$matches will contain captured groups

preg_match (

'/Te(.)f?/i' ,

'text' ,

$ matches

);

Performing a Replacement Returns string after replacement Can use backreferences with -9 preg_replace ( '/some(text)/' , '1' , $ text )

Returns string after replacement

Can use backreferences with -9

preg_replace (

'/some(text)/' ,

'1' ,

$ text

)

(?:(?: )?[ ])*(?:(?:(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?: (?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] ]|)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*|(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)*<(?:(?: )?[ ])*(?:@(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |) *](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*(?:,@(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00- 31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)* ](?:(?: )?[ ])*))*)*:(?:(?: )?[ ])*)?(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;: &quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot; .[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*>(?:(?: )?[ ])*)|(? :[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)*:(?:(?: )?[ ])*(?:(?:(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||( ?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ] ))*&quot;(?:(?: )?[ ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31] +(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*|(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)*<(?:(?: )?[ ])*(?:@(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(? :(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*(?:,@(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*)*:(?:(?: )?[ ])*)?(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(? :(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@ ,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*>(?:(?: )?[ ])*)(?:,s*(?:(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;. []]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*|(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)*<( ?:(?: )?[ ])*(?:@(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+| |(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*(?:,@(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?: .(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*)*:(?:(?: )?[ ])*)?(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+| |(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00- 031]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*>(?:(?: )?[ ])*))*)?;s*) Don’t Use Regular Expressions! Don’t Abuse Regular Expressions! Some people, when confronted with a problem, think “ I know, I'll use regular expressions.” Now they have two problems. — Jamie Zawinski

(?:(?: )?[ ])*(?:(?:(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:

(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[]

]|)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*|(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?:

)?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)*<(?:(?: )?[ ])*(?:@(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)

*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*(?:,@(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-

31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*

](?:(?: )?[ ])*))*)*:(?:(?: )?[ ])*)?(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:

&quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;

.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*>(?:(?: )?[ ])*)|(?

:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)*:(?:(?: )?[ ])*(?:(?:(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(

?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]

))*&quot;(?:(?: )?[ ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]

+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*|(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?:

)?[ ])*)*<(?:(?: )?[ ])*(?:@(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?

:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*(?:,@(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?:

)?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*)*:(?:(?: )?[ ])*)?(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?

:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot;

||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@

,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*>(?:(?: )?[ ])*)(?:,s*(?:(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.

[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[

])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[

])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*|(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)*<(

?:(?: )?[ ])*(?:@(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+|

|(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*(?:,@(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:

.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*)*:(?:(?: )?[ ])*)?(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+|

|(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[

]))*&quot;(?:(?: )?[ ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-

031]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*>(?:(?: )?[ ])*))*)?;s*)

Testing for a Substring if ( preg_match ( '/foo/' , $ var )) if ( strpos ( $ var , 'foo' ) !== false ) if ( preg_match ( '/foo/i' , $ var )) if ( stripos ( $ var , 'foo' ) !== false )

Validating an Integer Intention is not immediately obvious Not efficient if ( preg_match ( '/ ^ d +$ /' , $ value )) { // $value is a positive integer } Regular Expression

Intention is not immediately obvious

Not efficient

Validating an Integer Native C library (fast) Makes the intention obvious ctype (Character Type) if ( ctype_digit ( $ value )) { // $value is a positive integer }

Native C library (fast)

Makes the intention obvious

Validating an Integer Intention is fairly clear Casting is safe practice Any invalid values will result in zero $ casted_value = intval ( $ value ); if ( $ casted_value > 0 ) { // $casted_value is a positive (non-zero) integer } Casting

Intention is fairly clear

Casting is safe practice

Any invalid values will result in zero

HTML Parsing

Using Regular Expressions

Using Regular Expressions Postcodes /[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][A-Z]{2}/ IP Addresses @^(d{1,2})/(d{1,2})/(d{4})$@

Constructing Patterns Writing patterns is a balance between matching what you do want, against not matching what you don’t want.

Writing patterns is a balance between matching what you do want, against not matching what you don’t want.

You don’t need to use /…/ to denote a pattern! /…/ to denote a pattern! preg_match ( '/<b><s> .+ < / s> .+ < / b>/' , $ html ) preg_match ( '@<b><s> .+ </s> .+ </b>@' , $ html )

Greediness $ html = <<< HTML <span> some text </span><span> some more text! </span> HTML ; preg_match ( &quot;@<span>(.+)</span>@&quot; , $ html , $ matches ); echo $ matches [ 0 ]; preg_match ( &quot;@<span>(.+?)</span>@&quot; , $ html , $ matches ); echo $ matches [ 0 ];

You can make your pattern readable! preg_match ( '`^(w+)://(?:(.+?):(.+?)@)?(.+?).(w+)$`' , $ s , $ matches ) preg_match ( '` ^ (w+):// # Protocol (?: (.+?) # Username : # : (.+?) # Password @ # @ )? # Username/password are optional (.+?) # Hostname .(w+) # Top-level domain $ `x' , $ s , $ matches );

Extracting Captures preg_match ( '`^ (?P<protocol>w+):// (?: (?P<user>.+?) : (?P<pass>.+?) @ )? (?P<host>.+?) .(?P<tld>w+) $`x' , $ s , $ matches ); Array(    [0] => http://foo:bar@baz.example.com     [protocol] => http    [1] => http    [user] => foo    [2] => foo    [pass] => bar    [3] => bar    [host] => baz.example    [4] => baz.example    [tld] => com    [5] => com) preg_match ( '`^ (?P<protocol>w+):// (?: (?P<user>.+?) : (?P<pass>.+?) @ )? (?P<host>.+?) .(?P<tld>w+) $`x' , $ s , $ matches );

Variable Data if ( preg_match ( &quot;!> $ value </(?:div|span)>!&quot; , $ text )) $ value = preg_quote ( $ value , '!' );

Performing Logic on Replacements preg_replace ( '/w + /e' , 'strtoupper(&quot;&quot;)' , 'foo bar baz' ) function upper_case_match ( $ matches ) { return strtoupper ( $ matches [ 0 ]); } preg_replace_callback ( '/w + /' , 'upper_case_match' , 'foo bar baz' )

function upper_case_match ( $ matches ) {

return strtoupper ( $ matches [ 0 ]);

}

preg_replace_callback (

'/w + /' ,

'upper_case_match' ,

'foo bar baz'

)

Testing Tools RegexBuddy Reggy http://rubular.com

RegexBuddy

Reggy

http://rubular.com

Any Questions?

Add a comment

Related presentations

Related pages

RegEx Tutorial - Regular Expressions

This tutorial teaches you how to create your own regular expressions, starting with the most basic regex concepts and ending with the most advanced and ...
Read more

RegEx Tutorial - ZYTRAX

A Regular Expression is the term used to ... The Basics. The title is ... For more information on regular expressions go to our links pages under Languages ...
Read more

Quick Start - Regular-Expressions.info - Regex Tutorial ...

Regular Expressions Quick Start. This quick start gets you up to speed quickly with regular expressions. Obviously, this brief introduction cannot explain ...
Read more

RegexOne - Learn Regular Expressions - Lesson 1: An ...

Regex One Learn Regular Expressions with simple, interactive exercises. Interactive Tutorial References & More. All Lessons. Lesson 1: An Introduction, ...
Read more

Back to Basics: Regular Expressions - thoughtbot

Back to Basics: Regular Expressions Britt Ballard. May 12, 2014 back to basics; ... > string_to_match = 'back 2 basics' > regex = /d/ => /d/ > regex ...
Read more

RegExr: Learn, Build, & Test RegEx

RegExr is an online tool to learn, build, & test Regular Expressions (RegEx / RegExp).
Read more

Simple RegEx Tutorial - IceWarp Mail Server

Simple RegEx Tutorial. Regular Expression can be used in Content Filter conditions. Regular Expressions can be extremely complex but they are very flexible ...
Read more

The 30 Minute Regex Tutorial - CodeProject

This tutorial introduces the basics of regular expressions, ... I'm using Regex to search for OCR text and the text may not be correct such as
Read more

Regulärer Ausdruck – Wikipedia

Ein regulärer Ausdruck (englisch regular expression, Abkürzung RegExp oder Regex) ist in der theoretischen Informatik eine Zeichenkette, ...
Read more

JavaScript RegExp Reference - W3Schools

For a tutorial about Regular Expressions, read our JavaScript RegExp Tutorial. Modifiers. Modifiers are used to perform case-insensitive and global searches:
Read more