Regular expressions are the PHP programmer's equivalent of being audited by the IRS. The mere thought of an encounter is enough to cause heart palpitations. This isn't entirely without reason; after all, regular expression syntax is almost as indecipherable as the United States tax code, resulting in an almost automatic need to consult a manual every time the programmer needs to perform a particularly complex parsing task.
Because of the obtuse syntax, some would contend that regular expressions are unlike riding a bicycle or tying your shoes in that it's difficult to retain a working understanding of them over the long term. I disagree. I wager that most programmers choose to seek out an immediate fix to their particular parsing problem (which is quite tempting given the online proliferation of code) rather than attempt to understand the regular expression syntax at a deeper level.
If you would like to decipher the mélange of backslashes, brackets, asterisks and other characters somehow capable of rooting out everything from email addresses to HTML tags, follow along with this tutorial which introduces the topic using 10 numerically-oriented examples.

1. Finding Digits

PHP's regular expression capabilities are based on Perl's powerful syntax, which is widely considered to be the most powerful string parsing language ever devised. Among dozens of features you have a set of predefined character classes, which allow you to seek out substrings that meet a certain set of criteria. One such class can be used to find characters falling between the digits 0 and 9. For instance, to create an array consisting of all digits found in a string, you can pass the [0-9] character class to PHP's preg_match_all() function:
$string = "The 10am news will air in 3...2...1..."; preg_match_all("/[0-9]/", $string, $digits);
Examining the array returned in the &$digits parameter, you'll find that all five digits found in the string have been retrieved:
Array ( [0] => Array ( [0] => 1 [1] => 0 [2] => 3 [3] => 2 [4] => 1 ) )

2. Finding Groups of Digits

You might have noticed that the previous example treated the 10 in 10am as two separate digits, likely producing unexpected results. To cause the regular expression to match every substring, append the + character to the character class:
$string = "The 10am news will air in 3...2...1..."; preg_match_all("/[0-9]+/", $string, $digits);
Execute the revised example and the &$digits variable will contain four elements:
Array ( [0] => Array ( [0] => 10 [1] => 3 [2] => 2 [3] => 1 ) )

3. Filtering Unwanted Strings

The above two examples work great if you wanted to identify every single digit in a string, but chances are you'll want to be a bit more selective. For instance, what if you were interested only in numbers consisting of four digits? Enforce this filter by appending an occurrence modifier to the character class:
$string = "In 1492 3 ships set sail in the Atlantic."; preg_match_all("/[0-9]{4}/", $string, $digits);
After executing this example, &$digits will consist of just one element:
Array ( [0] => Array ( [0] => 1492 ) )

4. Expanding Your Range

You can use a variation of the syntax employed in the previous example to expand the range of allowable integer values. For instance, to find all integers consisting of anywhere between two and four digits, you can use:
/[0-9]{2,4}/
To find all integers consisting of at least three digits, use the following approach:
/[0-9]{3,}/

5. Increasing Your Selectivity

The previous example works great when you know that the only four digit sequences found in the string will indeed be year values, but what if the string looked like this?
In 2015 Cyberdyne created the T-1000 Terminator.
Parsing this string using the previous regular expression will result in two values being added to the &$digits array, namely 2015 and 1000. You can avoid this mishap by using the word boundary option:
$string = "In 2015 Cyberdyne created the T-1000 Terminator."; preg_match_all("/b[0-9]{4}b/", $string, $digits); print_r($digits);

6. Finding More Complex Patterns

As you've just learned, locating numbers, numbers consisting of a specified digit count, and even numbers in context is pretty easy. But what if you wanted to locate a more complex string, such as a U.S. social security number? Social security numbers follow a rigid pattern, consisting of three digits, a hyphen, two digits, another hyphen, and finally four more digits (e.g. 123-45-6789). You can locate strings such as this by combining many of the concepts we've discussed so far:
$string = "John is 28 and his social security number is 123-45-6789."; preg_match_all("/b[0-9]{3}-[0-9]{2}-[0-9]{4}b/", $string, $digits);

7. Accounting for Inconsistency

Suppose that for some reason certain documents formatted social security numbers using asterisks instead of hyphens, meaning they could potentially appear in the string looking like 123*45*6789, and even 123-45*6789. You can write a regular expression that can account for the inconsistency, thereby retrieving substrings consisting of all such variations:
$string = "123-45-6789 and 123*45-6789"; preg_match_all("/b[0-9]{3}[-|*][0-9]{2}[-|*][0-9]{4}b/", $string, $digits);

8. Correcting Errors

When searching for strings such as social security numbers, you probably should correct any data-entry errors you find along the way before using the data for other purposes. You can do so as follows:
$string = "John is 28 and his social security number is 123*45-6789."; echo preg_replace('/([0-9]{3})[-*]([0-9]{2})[-*]([0-9]{4})/', '1-2-3', $string);
Executing this example will output the following corrected string:
John is 28 and his social security number is 123-45-6789.

9. Taking Advantage of Alternative Syntax

The [0-9] character class is just one way to represent digits ranging between zero and nine. You'll also regularly encounter several alternative approaches that accomplish the same task yet use different syntax. For instance, all of the following examples will recognize a proper social security number:
/b[0-9]{3}-[0-9]{2}-[0-9]{4}b/ /bd{3}-d{2}-d{4}b/ /b[:digit:]{3}-[:digit:]{2}-[:digit:]{4}b/

10. Avoiding Regular Expressions

Because the need to parse strings and documents for certain substrings is so common, the PHP developers have put a lot of thought into how to streamline the process for certain tasks. For instance, if you merely want to determine whether a specific substring is located in a larger string, use the stristr() function rather than construct a regular expression. Or if you want to validate a complex string such as an email address, check out the Filter extension. By taking advantage of these conveniences you'll save a fair amount of time and frustration, not to mention boost your application's performance.

Conclusion

With some practice, constructing even complex regular expressions will become as natural as riding a bicycle. Spend some time devising your own examples.

About the Author

Jason Gilmore is the founder of WJGilmore.com and the author of several popular books, including "Easy PHP Websites with the Zend Framework", "Easy PayPal with PHP", and "Beginning PHP and MySQL, Third Edition".