Tag Archives: encoding

PHP: Encoding a URL before accessing it

I discussed a little about URL encoding in my recent post Facebook: Bug with URL encoding, although it seems like the bug still exists. In this post I will discuss “how to encode a given URL before accessing it using CURL or fsockopen”. The problem with URLs is that they might contain certain disallowed characters like spaces, according to RFC 3986. Our aim is to convert these invalid characters to their percentage encoded values in a given URL , so that we can access the URL using our regular HTTP request methods. For example the URL [http://example.com/space space] should be converted to [http://example.com/space%20space] before we can access it using CURL. However, the URL [http://example.com/percents%25percent] is perfectly valid as it doesn’t contains any of the disallowed characters.
Continue reading

Facebook: Bug with URL encoding

Today, while I was working on the URL encoding for the recently released Facebook-style Links module, I realized a bug with Link Attachments feature on Facebook. Before I explain, let us reproduce it:

Try to attach the following link on Facebook: http://google.com/search?q=blenders%26pride. This URL actually queries Google for ‘blenders&pride’. Facebook converts/encodes the above URL to http://google.com/search?q=blenders&pride which is not the same as above and queries Google for just ‘blenders’.

So, why Facebook does this? Probably Facebook tries to encode the URL to remove the characters which are not allowed by RFC 3986 and replaces them with their percent encoding. But there are certain characters which should not be encoded, such as ‘/’, ‘?’, ‘#’, ‘@’ which are the reserved characters and used as delimiters in the URL. So, it decodes these characters and converts their encoding to the original character which gives rise to the problem. Let us see an example:
Continue reading