PHP: Encoding a URL before accessing it

I discussed a little about URL encoding in my recent post Facebook: Bug with URL encoding, although it seems like the bug still exists. In this post I will discuss “how to encode a given URL before accessing it using CURL or fsockopen”. The problem with URLs is that they might contain certain disallowed characters like spaces, according to RFC 3986. Our aim is to convert these invalid characters to their percentage encoded values in a given URL , so that we can access the URL using our regular HTTP request methods. For example the URL [http://example.com/space space] should be converted to [http://example.com/space%20space] before we can access it using CURL. However, the URL [http://example.com/percents%25percent] is perfectly valid as it doesn’t contains any of the disallowed characters.

The given URL can only contain characters which are allowed by RFC 3986 which includes alpha-numeric characters along with reserved characters and delimiters  (: | / | ? | # | [ | ]@ | ! | $ | & | | ( | ) | * | + | , | ; | = | %).  The following piece of code changes the disallowed characters in the URL to their corresponding percentage encoded values. The characters and delimiters which are allowed are left untouched.

/**
* @param $url
*     The URL to encode
*
* @return
*     A string containing the encoded URL with disallowed
*     characters converted to their percentage encodings.
*/

function encode_url($url) {
$reserved = array(
":" => '!%3A!ui',
"/" => '!%2F!ui',
"?" => '!%3F!ui',
"#" => '!%23!ui',
"[" => '!%5B!ui',
"]" => '!%5D!ui',
"@" => '!%40!ui',
"!" => '!%21!ui',
"$" => '!%24!ui',
"&" => '!%26!ui',
"'" => '!%27!ui',
"(" => '!%28!ui',
")" => '!%29!ui',
"*" => '!%2A!ui',
"+" => '!%2B!ui',
"," => '!%2C!ui',
";" => '!%3B!ui',
"=" => '!%3D!ui',
"%" => '!%25!ui',
);

$url = rawurlencode($url);
$url = preg_replace(array_values($reserved), array_keys($reserved), $url);
return $url;
}

One thing to notice here is that the ‘%’ character must be the last in the $reserved array. This makes sure that the already encoded values are not lost or encoded again.

Note: This might not be the best solution but well, it works. If you are using something better, let me know in the comments.

11 thoughts on “PHP: Encoding a URL before accessing it

  1. Nitin Post author

    @Sean and Tsung:

    $temp = urlencode("http://example.com/space space");
    echo $temp;

    Output: http%3A%2F%2Fexample.com%2Fspace+space

    I am sure we don’t want the output as above. urlencode encodes the input string as if it is to be included as data in the URL, so every character whether it is allowed or not is encoded. encodeurl function as described, will only encode the disallowed characters and therefore will output:

    http://example.com/space%20space

    Cheers,

    Reply
  2. Nitin Post author

    What about the valid characters present within the URL which are encoded by urlencode such as ‘=’, ‘?’ ?

    As I said, problem with urlencode is that it will encode all the characters which are not alpha-numeric. But we don’t want to encode characters which are not only alpha-numeric but also valid reserved characters and delimiters (‘/’, ‘=’, ‘:’). So, what the above function does in simple terms is:
    1) encodes the whole URL using rawurlencode().
    2) changes the percentage encoded value to its equivalent character if the character is one of reserved characters. for eg: ‘%26’ is replaced by ‘&’ as ‘&’ is a reserved character.

    Cheers,

    Reply
  3. Boolean Value

    Got it,

    Wasnt doubting the usefulness of your function, but you may just want to explain a little more clearly. Your second comment explains it perfectly however 🙂

    Thanks, looks like a simple solution for a shortcoming in the language.

    Reply
  4. Pingback: PHP: Relative URL to Absolute URL | Public Mind

Leave a Reply