I discussed a little about URL encoding in my recent post Facebook: Bug with URL encoding, although it seems like the bug still exists. In this post I will discuss “how to encode a given URL before accessing it using CURL or fsockopen”. The problem with URLs is that they might contain certain disallowed characters like spaces, according to RFC 3986. Our aim is to convert these invalid characters to their percentage encoded values in a given URL , so that we can access the URL using our regular HTTP request methods. For example the URL [http://example.com/space space] should be converted to [http://example.com/space%20space] before we can access it using CURL. However, the URL [http://example.com/percents%25percent] is perfectly valid as it doesn’t contains any of the disallowed characters.
The given URL can only contain characters which are allowed by RFC 3986 which includes alpha-numeric characters along with reserved characters and delimiters ( : | / | ? | # | [ | ] | @ | ! | $ | & | ’ | ( | ) | * | + | , | ; | = | %). The following piece of code changes the disallowed characters in the URL to their corresponding percentage encoded values. The characters and delimiters which are allowed are left untouched. ` /**
- @param $url
-
The URL to encode
- @return
-
A string containing the encoded URL with disallowed
-
characters converted to their percentage encodings.
/ function encode_url($url) { $reserved = array( “:” => ‘!%3A!ui’, “/” => ‘!%2F!ui’, “?” => ‘!%3F!ui’, “#” => ‘!%23!ui’, “[” => ‘!%5B!ui’, “]” => ‘!%5D!ui’, “@” => ‘!%40!ui’, “!” => ‘!%21!ui’, “$” => ‘!%24!ui’, “&” => ‘!%26!ui’, “’” => ‘!%27!ui’, “(” => ‘!%28!ui’, “)” => ‘!%29!ui’, “” => ‘!%2A!ui’, “+” => ‘!%2B!ui’, “,” => ‘!%2C!ui’, “;” => ‘!%3B!ui’, “=” => ‘!%3D!ui’, “%” => ‘!%25!ui’, ); $url = rawurlencode($url); $url = preg_replace(array_values($reserved), array_keys($reserved), $url); return $url; } `
One thing to notice here is that the ‘%’ character must be the last in the $reserved array. This makes sure that the already encoded values are not lost or encoded again.
Note: This might not be the best solution but well, it works. If you are using something better, let me know in the comments.