Work the Shell

Working with Image Files, Part II

Dave Taylor

Issue #211, November 2011

Adding caption generation to the image-inclusion script.

If you're a faithful reader of my LJ column with a good memory, you'll have realized this is the second time I've written a set of columns talking about my image-scaling script. My last article presented a major revision to the script that added the capability of scaling the image dimensions in the resultant HTML code and also warning if it produces a significantly smaller image than the existing file specifies.

One thing I pay close attention to with my Web sites is their search-engine friendliness. After all, why put all the effort into producing good content and then omit the last step or two that can help maximize search-engine results placement (SERPs, in the biz)?

It turns out that if your page loads slowly, you're less likely to nab a really great spot in the search results than if it's lightning fast. So, if you are loading a 73KB image and scaling it down 33%, for example, it'd be faster to scale the image file itself (even if you end up with multiple versions of the image at different sizes) and have the 39KB version or similar.

In this article, I want to turn our attention to something else, generating attractive captions. There are two ways that captions are specified from the command line: -c tells the script to use the filename as the caption, substituting spaces for dash (-) or underscore (_) characters, and -C xx tells the script to use the user-specified value xx for the caption.

The latter is more accurate, but more work, so I typically use -c, particularly if I'm generating the image-inclusion HTML for a group or collection of images en masse. In that case, the filename is typically something like “mac-pages-hyphenation-control-1.png”.

Converting Filenames to Image Captions

The easy way to create a caption is to axe the filename suffix and replace all dashes or underscores with spaces. For the filename mentioned above, that'd give us “mac pages hyphenation control 1”, which isn't too bad. However, it would be better to fix the capitalization so the caption looks more like proper English.

The problem is, case-transliteration utilities in Linux are designed to be all-or-nothing enterprises, so translating “pages” into “PAGES” is easy, but producing “Pages” is a bit trickier.

To do that, the script breaks the all-lowercase caption into separate words, then breaks each word into its first letter and subsequent letters:

firstletter=$(echo $word | cut -c1 |
   tr '[[:lower:]]' '[[:upper:]]')
otherletters=$(echo $word | cut -c2-)
newcaption="$newcaption$firstletter$otherletters "

Wrapped in the following for loop:

for word in $nicecaption

Ideally though, the sentence cap function should be smart enough to know that certain words shouldn't be capitalized, like “the”, “of” and “or”. That I solve as generically as possible:

if [ $wordcount -gt 0 ] ; then
  case $word in
   the|and|or|a|an|of|in) newcaption="$newcaption$word ";
        Continue;  ;;
  esac
Fi

Do you know why I check the word count in the resultant properly capitalized caption? Because if it's the first word in the caption, it should be capitalized. For example, “The Black and the Blue” is correct, not “the Black and the Blue”.

One problem needs to be fixed due to how I reassemble the sentence in the script: the removal of the final trailing space. There are a bunch of ways to do that, but I really like using rev twice and cutting off the very first character:

rev | cut -c1- | rev

The entire function is neatly wrapped in a shell function:

FixCaption()
{
  newcaption="" ; wordcount=0
  for word in $nicecaption
  do
    if [ $wordcount -gt 0 ] ; then
      case $word in
       the|and|or|a|an|of|in) newcaption="$newcaption$word ";
          continue; ;;
      esac
    fi
    firstletter=$(echo $word | cut -c1 | tr '[[:lower:]]'
        '[[:upper:]]')
    otherletters=$(echo $word | cut -c2-)
    newcaption="$newcaption$firstletter$otherletters "
    wordcount=$(( $wordcount + 1 ))
  done
  nicecaption=$(echo $newcaption | rev | cut -c1- | rev)
}

It's complicated, but hopefully, still understandable!

Producing Good HTML Code

Are you wondering what we've created? Here's how this now gives us a nice, readable caption based on the well-named file:

$ scale -c 1 facebook-upload-photo-computer-1.png
<center><img
src="http://www.askdavetaylor.com/pics/
↪facebook-upload-photo-computer-1.png"
alt="facebook upload photo computer 1" border="0" 
 ↪width="604" height="204"/><div style="font-size:
↪80%;color:#777;">Facebook Upload Photo Computer
1</div></center>

I'll unwrap that so you can see the HTML with less headache:

<center>

<img
src="http://www.askdavetaylor.com/pics/
↪facebook-upload-photo-computer-1.png"
alt="facebook upload photo computer 1" border="0" 
 ↪width="604" height="204"
/>

<div style="font-size:80%;color:#777;">

Facebook Upload Photo Computer 1

</div>

</center>

If you know HTML, you might be tempted to have some better code, where the entire image + caption are in a single div container. For that matter, there's a “caption” attribute to the “img” tag in modern HTML, but I don't use it because I like to have more control over how the actual text is positioned and rendered on the page—old-school, I guess.

I'm going to stop hacking the script here because it's already almost 200 lines, and I have to say that if a script is getting to be more than 100 lines or so, it might be time to consider moving the functionality into a Perl script or another programming language like C or C++. Not always, but shell scripts are really good until a certain point, then you're just wrestling with limitations rather than expanding your capabilities.

That's it for this month. Do you have a scripting challenge you'd like some help with or just an idea for a fun or interesting project we can tackle here in Linux Journal? If so, get off your duff and send me a message about it!

Dave Taylor has been hacking shell scripts for a really long time, 30 years. He's the author of the popular Wicked Cool Shell Scripts and can be found on Twitter as @DaveTaylor and more generally at www.DaveTaylorOnline.com.