Hello hello hello, and welcome back. We've looked at strings, and numbers and all sorts of types of data, but we've not yet seen how to do something really important, and that's to look for and pull interesting parts out of the data we have, to do that where going to use some magic from the Perl world called "Regular Expressions"
Huh, Regular What?
Put simply, a regular expression is a string in it's own right, but one that has a special meaning. In some ways it's like a little mini program that tells the regexp engine what to look for and how to find it.
Look at the first line of my opening paragraph above.
If you wanted to look for that "Hello hello hello" and treat it as a spelling error to correct how are you going to find it?
Well you could use

if($text == "Hello hello hello")

or you might use the text function "str_replace":

str_replace("Hello hello hello","Hello");

and they would work fine, but what if we now made the phrase "Hello hello hullo"? hmmm, it looks like we now don't have a match. This is where the power of regular expressions comes to the rescue.
How do Regular Expressions Work Then?
Ok, so your asking yourself, how can I match something that's not match-able unless I change what I'm looking for, which is not what I want to do.
The key is not to change what your searching for but, to just search for the differences.
It's a set of searching rules, that allows for variations in the text to be searched.
What exactly is the difference between "Hello hello hello" and "Hello hello hullo" , well in this case it's only one letter, and that letter can be either an 'e' or a 'u', if we had a way of just saying search for this phrase, but at the 4th letter from the end, you need to be aware that that could change, then you've pretty much defined what a regular expression is.
Now in our example here, we could actually expand that very easily, to be aware of any letter at that position and not just a 'e' or 'u', we do this by using the full stop operator '.' , so to show you what I mean we could write our search pattern like this:

"Hello hello h.llo"

That will match any character at that position, but only one character, all the rest have to match.
There is however much more to the power of reg-expressions than just single letters, we can search for whole groups of numbers, letters and combinations of words symbols. Also for certain counts and lengths.
There's far more than we can cover in this article, developing a true mastery of reg- expressions takes years. We only have time to cover what you need to know to get you started in PHP.
Once snippet of advice I will give, find a program that will allow you to see what your doing as you construct regular expressions. I use the wonderful reg-ex coach available from coach/ There is an older version for linux, but the latest versions are only maintained under Windows now, it does however run perfectly fine under wine.
So What Else Can Regular Expressions do?
The best way for me to describe that is to show you a few examples:
Let's say we have the string:

"Long live PHP Builder in 2009"

We can find and extract the 2009 using:


If we use this in PHP with the preg_match function:

$found = preg_match("/^.*(\d\d\d\d)$/", "Long live PHP Builder in 2009",$matches);

$found will be true if the text provided had 4 digits at the end of the string, the / at either end of the pattern are how the regular expression engine knows the start and finish of the search (more on that in just a moment), if a match is found then the array matches will contain the following:

$matches[0] = "Long live PHP Builder in 2009"
$matches[1] = "2009"

Here's how the reg-ex pattern reads:

^ = at the start of the line
. = Read any character
* = for as many as you can, until
\d\d\d\d = you encounter 4 digits in a row
$ = at the end of the string

() = keeps the part of the pattern you found in any rule between these separate, in this case the 4 digits.
or in English. Look for 4 consecutive digits that occur at the end of the string, and retrieve them.
Here's another one:

$text = "Peter Shaw"
$reg-ex = "/(Peter)\s(Sh(aw|ore))/"

I'll not repeat the preg line this time.
The rule here says Return the first word before the space, and after the space match it if it's "Shaw" or a common misspelling "Shore", the pattern reads:

\s = look for the first space you encounter with
Peter = on the left side of it and
Sh = on the right side, followed by either
(aw|ore) = 'aw' OR 'ore'
In all cases keep the 2 found words.

The result in $matches will be

$matches[0] = "Peter Shaw" (or "Peter Shore")
$matches[1] = "Peter"
$matches[2] = "Shaw" (or "Shore")
$matches[3] = "aw" (or "ore")

Pay attention above to the (aw|ore) bit. This has to be in () to group the 2 parts either side of the OR decision, so even if you don't intend to look for that part, it still uses up a slot in the results.
One more example:

$text = "the letter a is a vowel"
$reg-ex = "/the\sletter\s[aeiou]\sis\sa\svowel/i";

This reads:

Search for "the letter " followed by one of the letters a,e,i,o,u and none other
Followed by " is a vowel"

On a positive match, then $matches[0] will hold "the letter "a" is a vowel" , there will be no other parts in $matches as there are no bracket sections.
In case your wondering \s is a special character called a meta- character, and it means anything classed as white space.
The * symbol is also a meta character and means match "0 or more occurrences" eg:


The above axample will match any text starting with 'A', the ^ and $ meta characters mean start and end of the text, so:


Will match any and all the text in a phrase as long as it starts with an 'A' right at the beginning, which is different to the previous, because that will match on the first 'A' it encounters in the text, then match on the rest of the line, and that brings us to my next point.
Regular expressions are greedy. They will try and match the largest amount possible at any given time in any given match string, which is why you really only want to use * if it's really necessary, if you can, always try to narrow your search as much as possible EG:

"Alan went to meet marsha"

To get the word 'Alan' use an expression of:


Or use the count control match meta characters:


What this expression says is, look for a 4 character word beginning with 'A' right at the beginning of the line, followed by a space and at least 1 or more characters.
The {4} means 4 characters of any description, and only 4 characters. It's also possible to specify ranges. Take a look at this example:


This example would specify an A followed by between 1 and 4 characters, but no less than 1 and no more than 4. And this snippet:


This code would mean an 'A' at the beginning followed by at least 4 characters, possibly more.
You can also combine other rules, this does not just have to be a '.', '*' or '+' , you can use a character class like this:


This would match on a line beginning with 'A' and at least 4 of any of the characters in the square brackets in any order, but only the characters in the square brackets.
We've really only just scraped the tip of the iceberg with regular expressions, it's a huge subject for which many books have been written. I urge you to read more about them and you can always look to the PHP manual, the expressions section is at .phpt.
Next time will be the final part in our series, in which we wrap up and look at some practical examples of what we've learned so far.
It's also your chance to tell me what you'd like to cover. If there is a particular thing you've been trying to do, or a technique your not sure how to make work, then please leave a comment using the form at the bottom of this page.
Between now and the final article, I'll be checking these comments, and I'll use them as a basis for what I put in the last article, please note however, I'm not going to complete your project for you or your homework assignment, so please don't put things in like "please show me how to make a project that does xxxx" all I'm looking for are real world ideas based on common scenarios that you guys are currently learning.
Until next time
May your expressions remain regular
The ABC's of PHP
Introduction to PHP
What do I need to make it work?
Basic Script Building in PHP
How Variable Am I?
Strings & Text
Math & Number Handling in PHP
Introduction to Arrays and Hashes in PHP
Loops and Decisions in PHP
Advanced String Processing - How Regular Are Your Expressions