PHPBuilder - 10 Easy Examples for Deciphering PHP Regular Expressions



RSS Twitter
Articles Php Functions

10 Easy Examples for Deciphering PHP Regular Expressions

by: W. Jason Gilmore
|
July 20, 2010

Regular expressions are the PHP programmer's equivalent of being audited by the IRS. The mere thought of an encounter is enough to cause heart palpitations. This isn't entirely without reason; after all, regular expression syntax is almost as indecipherable as the United States tax code, resulting in an almost automatic need to consult a manual every time the programmer needs to perform a particularly complex parsing task.
Because of the obtuse syntax, some would contend that regular expressions are unlike riding a bicycle or tying your shoes in that it's difficult to retain a working understanding of them over the long term. I disagree. I wager that most programmers choose to seek out an immediate fix to their particular parsing problem (which is quite tempting given the online proliferation of code) rather than attempt to understand the regular expression syntax at a deeper level.
If you would like to decipher the mélange of backslashes, brackets, asterisks and other characters somehow capable of rooting out everything from email addresses to HTML tags, follow along with this tutorial which introduces the topic using 10 numerically-oriented examples.

1. Finding Digits

PHP's regular expression capabilities are based on Perl's powerful syntax, which is widely considered to be the most powerful string parsing language ever devised. Among dozens of features you have a set of predefined character classes, which allow you to seek out substrings that meet a certain set of criteria. One such class can be used to find characters falling between the digits 0 and 9. For instance, to create an array consisting of all digits found in a string, you can pass the [0-9] character class to PHP's preg_match_all() function:
$string = "The 10am news will air in 3...2...1..."; preg_match_all("/[0-9]/", $string, $digits);
Examining the array returned in the &$digits parameter, you'll find that all five digits found in the string have been retrieved:
Array ( [0] => Array ( [0] => 1 [1] => 0 [2] => 3 [3] => 2 [4] => 1 ) )

2. Finding Groups of Digits

You might have noticed that the previous example treated the 10 in 10am as two separate digits, likely producing unexpected results. To cause the regular expression to match every substring, append the + character to the character class:
$string = "The 10am news will air in 3...2...1..."; preg_match_all("/[0-9]+/", $string, $digits);
Execute the revised example and the &$digits variable will contain four elements:
Array ( [0] => Array ( [0] => 10 [1] => 3 [2] => 2 [3] => 1 ) )

3. Filtering Unwanted Strings

The above two examples work great if you wanted to identify every single digit in a string, but chances are you'll want to be a bit more selective. For instance, what if you were interested only in numbers consisting of four digits? Enforce this filter by appending an occurrence modifier to the character class:
$string = "In 1492 3 ships set sail in the Atlantic."; preg_match_all("/[0-9]{4}/", $string, $digits);
After executing this example, &$digits will consist of just one element:
Array ( [0] => Array ( [0] => 1492 ) )

4. Expanding Your Range

You can use a variation of the syntax employed in the previous example to expand the range of allowable integer values. For instance, to find all integers consisting of anywhere between two and four digits, you can use:
/[0-9]{2,4}/
To find all integers consisting of at least three digits, use the following approach:
/[0-9]{3,}/

5. Increasing Your Selectivity

The previous example works great when you know that the only four digit sequences found in the string will indeed be year values, but what if the string looked like this?
In 2015 Cyberdyne created the T-1000 Terminator.
Parsing this string using the previous regular expression will result in two values being added to the &$digits array, namely 2015 and 1000. You can avoid this mishap by using the word boundary option:
$string = "In 2015 Cyberdyne created the T-1000 Terminator."; preg_match_all("/b[0-9]{4}b/", $string, $digits); print_r($digits);

1
|
2
Next Page »

Comment and Contribute

Your comment has been submitted and is pending approval.

Author:
W. Jason Gilmore

Comment:



Comment:

(Maximum characters: 1200). You have characters left.