This article is intended for users with an intermediate to advanced knowledge of PHP, HTML, and GET requests.
In this article I'm going to show you how you can use PHP to encode your data for transit. Most importantly, it will be done in a way that makes the data decodable, and therefore much more usable, by the receiving page.
If you're reading this article, you know that GET requests are a common method for transmitting data from one web page to another. GET requests are the method of choice in cases where the data to be transmitted is relatively short, since they allow you to send data simply by writing it into a URL. The problem with this method is that it transmits data in a completely insecure way, since it usually requires displaying data directly in the address bar of the client browser.
You may already be familiar with the many encoding functions PHP has to offer. These generally consist of hashing functions, such as sha1(), md5(), and crc32(). While these functions will encode data so that no human can read it, they also ensure that the page accepting the data transmission won't be able to read it either. This may be fine for storing data that will be needed later for comparison-only, such as passwords, however it's useless if you need to know what the data actually represents. Contact information, dates, user preferences, confidential client information, etc, all present this problem.
Key Advantages of this Encryption Method
1. Unlike some other methods, this one doesn't require sending a separate "key" along with the encoded data, to tell the decoder how to decode. Using this method, everything the decoder needs, including the key and the data itself, is sent within a single string.
2. This script will not use the same encryption scheme every time it's used. For example, if you encode the same word twice, there is only a 1 in 10 chance that the encoded string will look the same each time. This adds an element of randomness that makes the encryption harder to crack.
How it Works
I'll first outline how the encoding/decoding scheme works in plain English (as plain as the subject matter allows for), and then I'll take you through the PHP code. If you'd like to see the finished script now, you can download it below. The script consists of two functions: an encoder and a decoder.
With regard to encoding, there are many different ways to go about it. One way is to perform a series of mathematical operations on the data. However that method carries several disadvantages, or at least, several complications: such as unpredictable output length, and the difficulty of working mathematically with non-numeric ("string") data.
The method we'll be using will define an array that contains static value pairs. Each pair will consist of a decoded value and an encoded value. This essentially makes the array into a simple reference table. In order to support the full range of ASCII characters, the values will be in terms of numeric ASCII codes. For example:
So in order to encode or decode a character, the function simply needs to look up that character's ASCII value in the array, rather than perform a series of calculations.
The array will have as many elements as there are screen-printable characters. This consists of ASCII codes 32 through 126 — the basic printable character set, which includes the space character as well as typable symbols.
The script is divided into 3 separate PHP files:
The first file contains the bulk of the script. It's where we declare the functions.
The second is a "single-use" script that only needs to be executed once. Its purpose is to generate the third file.
The third file will serve as an include file for our encode/decode functions. It defines the static array (our reference table). The array is produced by randomly shuffling the ASCII codes in our range, so that, for example, the character code 56 corresponds to a different random character code, like 121. The purpose of generating this file yourself (rather than using a standard distributed static file) is to perform a new shuffling of ASCII codes for yourself — making the chances of anyone else using the same encoding scheme virtually nil.
A Note on ASCII Ranges
You can customize the ASCII range that this script supports. Simply change the arguments passed to
range(), in the second line of
It is recommended that you limit the ASCII range to include only the characters that you need to support in your specific scenario. This is because any characters within that range will be possible choices for an encoded replacement; meaning that even if you never use them, they can all become part of the encoded string. While this is not necessarily a bad thing, most characters outside the basic ASCII set will need to be URL-encoded, and that will produce a longer encoded string. In the interest of minimizing the length of transmitted data, you should limit the range to cover only those characters you foresee needing.
In the vast majority of cases, GET requests are only used to transmit characters from the basic ASCII set. Using only the basic ASCII set, as opposed to, say, the basic + extended sets, will not only cut down on the average transmission length, but also cut the reference array size in half. The basic ASCII set still covers the space character as well as all typable symbols.
Conversely, you can extend the power of this script by extending the range to cover the entire ASCII table. This will allow you to encode things like file contents, emails, or multi-line form submissions. The full ASCII range includes line separators, tabs, and other non-printing characters, so formatting would be preserved in those instances.
ASCII Range Reference
- Basic: 32 - 126
- Basic + Extended: 32 - 255
- All including non-printing characters: 0 - 255
We'll further secure the system by defining 10 different complete sets of ASCII value pairs, in a multi-dimensional array, to be chosen from at random at each call to the encode function. This way, the same data encoded twice, even by the same user, will look completely different 9 out of 10 times — making the encoded string all the more confusing to look at.
In order to tell the decoder function which data set was used, we will "tag" our encoded data with a character placed at a specific position within the encoded string. When the decoder reads the character at that position, it will be able to determine which data set it needs to use to decode the data. We'll also pad the left and right sides of the encoded string with random dummy digits, just to add some additional confusion.
The decoder function will basically do everything the encoder did, but in reverse. It will pick apart the encoded string, stripping away the characters at positions that are known to always contain useless data, while reading the characters at positions known to contain actual data. It will then use the reference array to find the values that correspond to those in the encoded string, and convert each one back to its original value — thus reforming the original string.
On to the Code
The first thing we'll do is write the single-use script that generates the include file. I call this file mine.php (because it produces salt.php), but you can choose any name you like, as this page is only run manually by you. It never gets called by any of the other code.
This defines the name of the file to create. You should choose a unique name, to prevent the possibility of someone else somehow including the file in their own script and cracking the encoding scheme. Once you change the filename here, you also need to change the name used in the include() statements. There is one include() statement in each function; be sure to change them both, or else this won't work.
This creates an array of values ranging from 32 through 126, which is our ASCII range. You can change the arguments here to suit your own ASCII range requirements (see sidebar).
$ascii = array_combine($ascii, $ascii);