Work the Shell

Movie Trivia—Finally!

Dave Taylor

Issue #174, October 2008

Use the shell to generate movie trivia from a movie database.

It's been one of those proverbial journeys of a thousand steps, but I think we're finally ready to start generating some movie trivia after spending the past few months doing all the underlying tool development. You'll recall that we're grabbing the top 250 movies list from Amazon's IMDb site, then getting the release year of each movie and storing it in a database.

Separately, we chewed on the interesting problem of coming up with adjacent years for a given year in time, recognizing that the older the movie, the more of a spread we want between years, because precious few people will guess incorrectly that a movie released in 2007 was released in 1983, but a movie released in 1947 could stymie people who might think it came out in 1931.

Now, it's time to put the pieces together.

Two Random Years

The last column dug in to the year spread, ending with a script that produced a likely adjacent year for a given year. We need to refine this script, because what we want to produce are three different year possibilities, two that are likely but wrong and one that's the correct year, without duplicates.

First, let's make the code that generates a reasonable adjacent year a script function:

get_random()

{

  delta="$(( $RANDOM % $factor + 1))"

  add="$(( $RANDOM % 2 ))"

  if [ $add -eq 1 ] ; then

    closeyear="$(( $releasedate + $delta ))"

  else

    closeyear="$(( $releasedate - $delta ))"

  fi

  if [ $closeyear -gt $thisyear ] ; then

    closeyear="$(( $releasedate - $delta ))"

  fi

}

Next, given that we can't gracefully return a value short of using a global variable, here's how we can leverage the function:

get_random 

match1=$closeyear

That gets us the first year guess, easily enough. But, the next guess needs to be different from the first. How to do that? In a while loop:

match2=$match1        # needs an initial value

while [ $match2 -eq $match1 ] ; do

  get_random 

  match2=$closeyear

done

This is slightly risky, because there is the possibility of an infinite loop if the code never finds a random year value that differs, but I'll ignore that for now.

Now we have three year values: two incorrect ones, $match1 and $match2, and the correct value, $releasedate. How to give them back to the calling routine sorted? Easy:

echo "$match1 $match2 $releasedate" | sort -n

And, that's the function. Give it a year, and it'll return three: two that are close but wrong, and one that's correct. For example:

$ ./year-delta.sh 1975

1981 1971 1975

$ ./year-delta.sh 1999

2000 1998 1999

$ ./year-delta.sh 1938

1948 1935 1938

That's exactly what we want. Now, how to integrate this into the bigger script that grabs a random line from the IMDb database and then presents it in a workable fashion?

Extracting Data and Displaying It

Once you remember the trick of $(( $RANDOM % some-value)), it should be straightforward to get a random line from a data file:


lines="$(wc -l < $filmdb | sed 's/ //g')"

randline=$(( $RANDOM % $lines + 1 ))

match="$(sed -n "${randline}p" < $filmdb)"

As I've written about before, wc is one of your best friends in script writing, because it's easy. But, it's also frustrating that there's no way to turn off the superfluous white space it generates. That's why the first line includes a call to sed to axe any spaces that are added. Somewhere, in a parallel universe to our own, there's an -n flag to wc that says “no padding” and makes this forevermore unnecessary. Sadly, we aren't in that universe, so just about every time you use wc, you have to strip out the white space at the same time.

The result of these three lines is that match has a value similar to:

The Lord of the Rings: The Two Towers|2002

Now we need to split it into two fields, which is easily, if tediously, done:

title="$(echo $match | cut -d\| -f1)"

relyear="$(echo $match | cut -d\| -f2)"

And, finally, it's time to invoke the random years function that will, if you recall, generate one correct and two incorrect years:

years=$($randomyears $relyear)

Finally, let's pull the three years into separate variables and then output an attractive trivia query:

year1="$(echo $years | cut -d\  -f1)"

year2="$(echo $years | cut -d\  -f2)"

year3="$(echo $years | cut -d\  -f3)"

echo "IMDb Top 250 Movie #$randline: Was $title 
released in $year1, $year2 or $year3?"

Not too shabby! Let's see how it works:

$ ./generate-trivia-question.sh 

IMDb Top 250 Movie #82: Was "Some Like It Hot" 
released in 1950, 1959 or 1963?

$ ./generate-trivia-question.sh 

IMDb Top 250 Movie #118: Was "Mononoke-hime" 
released in 1994, 1995 or 1997?

$ ./generate-trivia-question.sh 

IMDb Top 250 Movie #250: Was "Planet of the Apes" 
released in 1967, 1968 or 1969?

Perfect, perfect!

That's about all we have space for in this column, but we've come a long, long way from the URL for a Web page that lists some top movies to a nice little trivia engine that's fast and fun!

Next month, we'll look at how to inject the trivia into the Twitterstream. Want to see it in action? By the time you read this column, it'll be live at twitter.com/FilmBuzz (along with movie commentary and much more).

Dave Taylor is a 26-year veteran of UNIX, creator of The Elm Mail System, and most recently author of both the best-selling Wicked Cool Shell Scripts and Teach Yourself Unix in 24 Hours, among his 16 technical books. His main Web site is at www.intuitive.com, and he also offers up tech support at AskDaveTaylor.com. Follow him on Twitter if you'd like: twitter.com/DaveTaylor.