What have I done? (website screen captures/thumbnails in python and a lot of shell commands)


I have a tutorial/development methodology I’ve developed that walks the uninitiated plonista through a crash-course of plone 3 development. I’m going to post about it in detail separately, but what I’m posting about here is what I’ve done as part of the technology behind it. I’m using a simple XML schema to structure my content, and python to transform it into XHTML.

One feature I wanted was a nice table of references. The tutorial relies pretty heavily on both dead-tree texts, as well as online resources. I rigged up my transformation so that it would post an Amazon.com affiliate link (in the least obtrusive manner, of course) for the books I’ve referenced. So for each book there’s a nice picture of the cover, a link, and text explaining what part of the book I’m referencing, and why it’s important.

Problem was that external links and links to the plone.org documentation didn’t have any sort of image to represent them. I couldn’t work out anything I liked (made some icons, but they didn’t do it for me), so I thought it might be cool to use thumbnail screen captures of the pages I was referencing instead.

I know, this whole screencap/webthumb/snap shots thing is out of hand. Every link I mouse over everywhere seems to pop up some sort of DHTML fluff showing me a preview of the link I might be clicking on at some point. It’s especially annoying given that I like to check the urls of hyperlinks before I click them (silly me). I like to know where I’m going, but I don’t need to see where I’m going.

I bring up my distain for the snap shot trend only to emphasize that I’m not doing it the way I see it done everywhere. I suppose I could, but I’m not. :) The over use of snap shots also puts my fustration at finding a good solution to create the snap shots into perspective: it’s so popular, a google search for ‘website thumbnail’ comes up with a huge number of viable options. A cursory glance shows that none of them are packages you can run on your own, they’re all web services.

So, I decided to strike out on my own, and develop something from scratch.

Tool Chain

The rough steps to creating a screen capture are:

  1. launch a browser
  2. tell it to load a given url
  3. take a screen shot
  4. crop out window dressing and menus/controls
  5. resize to a thumbnail size

This could be a manual process. This could work in MS Windows, or on Unix (or OS X for that matter). I was already developing my xslt transformation on an Ubuntu VM, and I can get a fair representation of what just about any web site looks like on any platform, thanks to Firefox’s cross-platform sweetness. This led me to try automating an X session via command line execution.

Since I was using Unix, ImageMagick also was a forgone conclusion to do the image editing (the mogrify command specifically). I then discovered a post someone made some time ago by chance, illustrating how ImageMagick can take screen shots.

I then found out that for a long time, FireFox has had more-or-less well documented command line interface options.

So with this in mind, I refined my process plan a bit:

  1. start X
  2. launch FireFox, specifying what url to load
  3. call IM’s import command to take a screen shot of the firefox window on the right display
  4. call IM’s mogrify command to crop out cruft.
  5. call IM’s mogrify to resize to a thumbnail size
  6. quit FireFox (since it will load additional calls into new tabs)
  7. close X (if we’re all done… don’t bother if we’ll be taking more screen shots)

One problem with this is I have no way to know what windows are open or manipulate them. For example, on my default install of X, Firefox was loading in a 800×600 window. I wanted to take my screen shots at 1024×768, so Firefox would have to be resized. wmctrl to the rescue!

Wmctrl allows you to query and interact with the windows that your window manager is using. It gives you X Windows ID’s, and will let you close or resize windows by matching a word within their title. Good stuff.

I can get away with starting X running firefox at the same time as it’s sole executed application by specifying the full path to firefox as an argument to xinit.

$ xinit /usr/bin/firefox &

But this is problematic because it doesn’t run a window manager, and so wmctrl doesn’t work, and killing the X session without manually quitting firefox makes Firefox ask “Do you want to restore your session” the next time it runs (bad when you’re not there to click “Start a New Session”). That’s not acceptable for a situation like mine.

This meant I needed a window manager. I started with icewm but didn’t care for it, so I fell back to my personal favorite when I’m in X, blackbox. I like it for this task especially because it’s lightweight, and doesn’t put any widgets on the desktop that get in the way.

So now the process looks like this (with the actual commands)

  1. start x:
    $ xinit /usr/bin/blackbox &
  2. launch FireFox, specifying what url to load:
    $ firefox http://www.plone.org/documentation &
  3. Resize firefox:
    $ wmctrl -r firefox -e 0,0,0,1024,768
  4. call IM’s import command to take a screen shot:
    $ import -display localhost:0.0 -window root screencap.jpg
  5. (Properly) close Firefox:
    $ wmctrl -c firefox
  6. call IM’s mogrify command to crop out cruft:
    $ mogrify -crop 1006x648+0+95 screencap.jpg
  7. call IM’s mogrify to resize to a thumbnail size:
    $ mogrify -resize 100 screencap.jpg
  8. close X:
    I never figured out how to do this short of grepping through the output of ps

This works, more or less, in a manual way.

One caveat: you can’t do this from a remote terminal (like an SSH session) unless you change the way X is configured. I found a journal entry that pointed me in the right direction. You need to set allowed_users to ‘anybody’ in your XWrapper.config file. Since I was on Ubuntu, I tried using dpkg to reconfigure X as the journal entry suggested. This worked for me (it brings up a UI that asks a couple of questions including a question about allowed users):

 $ sudo dpkg-reconfigure x11-common

Getting Down to it (Coding, that is)

Now that you see where my mind was when I started this endeavor, you can understand the next steps, and the ultimate python code that I wrote.

My initial instinct was to try a shell script. This process came down to, after all, a list of shell commands. But as I started to write a shell script,  things got messy and complex in a hurry, and the matter of getting X to close cleanly was problematic.

In addition, I wanted to integrate this more directly into my xlst transformation script, so Python became the obvious answer (as it tends to) :)

I utilized the newish subprocess module to handle executing the various commands. I wrapped it all in a class so I could easily share information (like what display to point at) across methods. I wrote methods to parse wmctrl’s window list output, and refined the process a bit so it would grab a specific window, instead of the root, or the whole X desktop (so I just got the browser and kept the cropped bits to a minimum).

I wrote unit tests for most of the functionality, so I’m confident that if you’re setup is like mine, you should have no trouble using the code.

Some features that go beyond the shell commands:

  • Can create a full-sized screen shot along with the thumbnail
  • Handles creating unique names for the files (does a hex md5 hash of the url)
  • Adds in wait time for the page to load
  • You can change WMs easily

It definitely is more flexible than it needs to be, and the code has a lot of potential:

  • You could set up a special profile just for taking captures (maybe run a special profile in a different language).
  • You could use it to take a screen capture when a Selenium test fails.
  • It could be adapted for any application, not just FireFox.
  • It’s designed so you can point it at any X server, not just localhost:0.0
    • This means you could do a multi-threaded batch generation by starting the X server on different display numbers and rotating the work between them

So the end result is a nice module that will generate screen captures and thumbnails of web sites.

I’ve uploaded the code to my google code repo. The source is GPL right now, let me know if you want to use it under different terms.

Check it out

I appreciate all feedback/comments/enhancements/etc. Fire away!

XSLT Integration

Here’s how I integrated it into my XSLT:


I opted to defer generation of the thumbnails to after the XSLT is all processed. It’s not as elegant as I’d like, but it works :)

from webthumb import XScreenCap

xcap = XScreenCap()
urls = set()

def linkwebthumb(context, url):
    Return a URL that will represent a thumbnail screenshot of the URL contained in url

    Adds the url to the list of urls that need to be processed into screen shots


    return xcap.imagepath(url)

libxslt.registerExtModuleFunction('website-thumbnail', 'http://namespaces.joshjohnson.noresolve/util', linkwebthumb)

for url in urls:


This obviously leaves out the meat of the xslt processing stuff. Stay tuned, I plan to post about that in detail soon.


I then can call the website-thumbnail function from within my XSLT template (the ‘util’ namespace is defined in the opening <xsl:stylesheet> tag, not shown):

<!-- snip -->
  <xsl:template match="ref">
    <div class="reference">
      <xsl:when test="@type='external-site'">
            <h1>External website, <a><xsl:attribute name="href"><xsl:value-of select="@href" /></xsl:attribute><xsl:value-of select="@href" /></a></h1>
            <div class="refimage">
                <a><xsl:attribute name="href"><xsl:value-of select="@href" /></xsl:attribute>
                <img border="0">
                   <xsl:attribute name="src"><xsl:value-of select="util:website-thumbnail(string(@href))" /></xsl:attribute>
                   <xsl:attribute name="alt">Screen Capture of <xsl:value-of select="@href" /></xsl:attribute>
                <br />
            <div class="reftext" style="height:55px">
            <br />
<!-- snip -->

Again, there’s more to this than just the use of the thumbnail code… more on the ins and outs of xslt and libxslt when I post about it later :)

Potential Problems

  • There has to be a simpler way to do this :)
  • This should be fairly straight unix command line stuff, and should port well. In spite of that I haven’t tested this module on any other platform besides Ubuntu. I think the version I’m running is 7.10, the server version (no desktop environment included). I installed X and all of the other applications directly from apt.The first thing you should do once you’ve set up the components I listed earlier is run the unit tests. You can do this by running the module directly:
    $ python __init__.py
  • This does not work on MS Windows (although I suppose you could point the code at some sort of X server on a MS Windows host).
  • I currently have no way of knowing when FireFox is finished loading a page, so I sleep after firefox is spawned (20 seconds at the moment, can be configured to be less). This makes generating thumbnails or running the unit tests take a lot longer than it could.
  • This is pure python 2.5 code. It might run in a late 2.4, but that’s untested.
This entry was posted in python and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s