Drupal: FeedAPI Imagegrabber

Feeds Image Grabber (FIG) was released on 3rd March 2010, to support the Feeds module.

Introduction

FeedAPI Imagegrabber is a add-on module for FeedAPI. It consists of a parser which visits the original URL of a new feed item, and retrieves the main image from the post. Once the main image has been retrieved, it is then converted into a thumbnail using the ImageCache module, and stored in the node created by FeedAPI, inside a CCK field.

The purpose of FeedAPI Imagegrabber is to make the feed more informative as well as interesting for the user. As, we all know that “comics are much better than novels”, this module appends the feed-item with an appropriate image from its content URL. The goal of the module is to mimic the thumbnail display of websites such as digg.com. This goal is acheived by using FeedAPI to turn RSS feed-items into nodes, and then using FeedAPI Imagegrabber to append these nodes with an appropriate image from the feed-item’s webpage.

How it works

A classic method of mimicking the behavior of FeedAPI Imagegrabber will be to do the same thing manually. Let us go through the procedure if you were to do it manually:

  1. Refresh the feed.
  2. For each feed-item, go to their respective original URL and save the image to display.
  3. Crop the image, convert it into a thumbnail and then upload it in an CCK Image field.

FeedAPI Imagegrabber automates the last 2 steps of the three step process described. It uses cURL to download the images and then crop them using Imagecache. Then the image is stored into the CCK Imagefield. The most difficult part which only humans can do is to select the image,  for which I am constantly improving on the heuristics.

Download and Install

Visit the project page : FeedAPI Imagegrabber

Features and Future Releases

Visit the project page : FeedAPI Imagegrabber

History

I understand history is boring, but it is only for those who think the other way round. I started working on this module during December 2008.  At the time, when I started working on this module, I had a little or say no idea about Drupal or its API. It took me quite some time to get used to it and start working on the module. In the Initial release, I thought of very simple heuristics and decided to select the largest image available on the web-page. I completed the module in about 1 month and was very happy with my performance, and then a big blow came which delayed the release of FeedAPI Imagegrabber by 2 months. I had to include an external BSD licensed script for converting relative URL’s to absolute URL’s but Drupal denied to accept this. They said they allow only GPL licensed code and I started to convince them that both licenses are compatible. Unfortunately, I was unable to convince them and I had to create an external link to that BSD script on sourceforge. And now, the module is released with the hope to enhance the visitor-experience on several websites around the globe.

You may find these posts useful:

Tutorial for FeedAPI ImageGrabber
PHP: Relative URL to Absolute URL
Open Source Software and Licenses

Do Leave your comments to let me know what you think.

Note: Support queries and requests should be made through the FeedAPI Imagegrabber Forum. I may not be able to answer them here. 🙂

Updates:

  • The first stable version of the module was released on 7th April, 2009. Check it out now here!!
  • RC1 release for Imagegrabber is now available for download on Project page. It is compatible with the RC1 release of Imagefield. (17 Apr, 2009)
  • Demonstration Website is up, and is available here. (20 Aug, 2009)
  • FeedAPI ImageGrabber v1.8 released. Download it from the project website here. (20 September 2009)
  • Tutorial for FeedAPI ImageGrabber published. (20 September 2009)
  • FeedAPI ImageGrabber v1.9 released. Download it from the project website here. (26 October 2009)
  • FeedAPI ImageGrabber now supports Filefield Paths module. Download the development snapshot, its stable for use on production site.
  • As of 02 Dec 2009, Development of FeedAPI ImageGrabber has been halted. You can look out for the Feeds Image Grabber module which supports Feeds module, successor of FeedAPI, with the same functionality.

25 thoughts on “Drupal: FeedAPI Imagegrabber

  1. Brad

    Just posted the first bug report over on the project. I’m really psyched to roadtest this for you, but the current code appears to be missing a step.

    Reply
  2. Ben

    I really appreciate the Imagegrabber module. I’m especially anxious to see the implementation of controls to allow a publisher to define what section of a web site an image should be pulled from, on a feed by feed basis. What’s your timeline, do you think?

    Reply
    1. Nitin Post author

      Thanks Ben, the feature requested is already in the TODO list. But unfortunately, I am busy with my internship for the next 2 months. After that this will be my first priority for the ImageGrabber.

      Reply
  3. Ben

    @Nitin
    Thanks Nitin. I just modofied the module on my installation to use the guid instead of the original_url to define the page from which the image is pulled. Do you foresee any reason why this would be problematic?

    I did it because I have a very specific RSS feed for which the guid and original link are different.

    Reply
  4. Ben

    @Nitin
    Thanks! In this case, the RSS feed is weirdly crafted. The link element points to that site’s homepage, whereas the guid element points to the actual story/photo.

    So in your module, I have swapped this out, specifiying the module should use the guid URL instead of the link URL. Does that make sense?

    Otherwise it keeps grabbing the same image from the news source’s homepage each time.

    So far it seems to be working smoothly.

    It might be a nice addition (I know, when there’s time … 🙂 ) — the ability to identify which element in a feed should be used as the source of the image.

    Ben

    Reply
  5. nitin

    @Ben

    Do you really think I should add so much intelligence to a simple module for encouraging some sites to craft their feeds weirdly?

    Initially, I also thought of adding the ability to identify the size of the image just by looking at the height and width attributes but then it is no more a W3C recommendation to use these attributes, so I dropped the idea. So, now I have to get the size by downloading the image.

    I think it is the responsibility of all of us to shape the current web into the incomplete dream of semantic web.

    Reply
  6. nitin

    Ben :
    I’m especially anxious to see the implementation of controls to allow a publisher to define what section of a web site an image should be pulled from, on a feed by feed basis.

    News: I have started working on it. To keep it simple for common users, I am thinking of asking the unique ‘id’ of the tag to identify the portion. I also plan to include ‘I am feeling lucky’ option in which ‘the first image between the tags mentioned’ is selected for the item. what do you think? Any ideas?

    Reply
  7. Derek

    Great job, Nitin,

    “News: I have started working on it. To keep it simple for common users, I am thinking of asking the unique ‘id’ of the tag to identify the portion. I also plan to include ‘I am feeling lucky’ option in which ‘the first image between the tags mentioned’ is selected for the item. what do you think? ”

    I think this is a great idea.

    Right now, there are too many web sites (and almost all newspaper sites ) have advertisements. So identifying where to grab the image on a per feed basis is very important.

    Reply
  8. norathar

    Hello,
    Just want to say im really enjoying using your addon for feedapi.

    You may want to specify where to place the folder for the module in the drupal/project page. Does it belong in the sites/all/modules/feedapi sub-directory? Or does it belong in the sites/all/modules directory? Or does it matter? Key little details like that will keep total newbs like myself from banging their heads against the keyboard wondering why things arent working.
    thanks for sharing this with the drupal community!

    Reply
  9. nitin

    News: I am done with the backend part of the next version, although the interfacing is still left (and I am thinking a lot over it, so that it doesn’t become confusing for a normal user). Nevertheless, I have installed the new version here. Please submit your feed links to see, how the new ImageGrabber performs. It will also help me debug any problems with the ImageGrabber.(Post them in the comments section)
    The new ImageGrabber will now never capture those advertisement images, or images in the sidebar. So you can rest assured that you get the image you want. The new version will also save a lot of bandwidth that you pay for.

    I am planning to release it in the first week of September. Keep checking for updates.

    @norathar

    I will keep it in mind while releasing the next version of ImageGrabber. I am even thinking of putting some tutorials for it. Hope it helps.

    Reply
  10. Pingback: Withdraw PayPal Money Directly to A Bank Account in India | Public Mind

  11. Pingback: Tutorial for FeedAPI ImageGrabber | Public Mind

  12. Nitin Post author

    @Ben and Derek

    The new release for FeedAPI ImageGrabber is available for download now. Have a look at it to what it has to offer you!!

    @Norathar

    I have put a nice tutorial explaining everything that is needed to use ImageGrabber, but if you still need some help I am always here.

    Check out the demonstration of new FeedAPI ImageGrabber: http://publicmind.in/drupal

    Reply
  13. Pingback: PHP: Select HTML elements with more than one css class using XPath | Public Mind

  14. Nick

    How can I get the grabbed images to appear in nodes or views? I want to see a grabbed image field or soemthing in my view and an option to display the grabbed image for that node in the content type edit. I though this model was not working for some reason but then I looked a a feed and there were the images.

    Reply
    1. nitin

      i do not quite understand your problem, but you might be interested in looking at the imagecache and imagefield module’s code to get an idea about how this might be achieved.

      Reply
  15. Pingback: PHP: Relative URL to Absolute URL | Public Mind

  16. Pingback: Drupal: Feeds Image Grabber | Public Mind

Leave a Reply