We continue building a Mad Libs tool and slowly come to realize that it's a considerably harder problem than can be neatly solved in a 20-line shell script.
Last month, I ended with a script that could take an arbitrary set of sentences and randomly select, analyze and replace words with their parts of speech with the intention of creating a fun and interesting Mad Libs-style puzzle game. With a few tweaks, giving it a simple few sentences on party planning, we get something like this:
If you're ((looking:noun)) [for] a fun ((way:noun)) [to] celebrate your next ((birthday:noun)) how ((about:adjective)) a pirate-themed costume party? Start by sending ((invitations:noun)) in the form of ((a:noun)) <buried:verb> ((treasure:noun)) {map} with {X} ((marking:noun)) {the} ((location:noun)) [of] your house, then {put} {a} sign on the ((front:noun)) ((door:noun)) [that] ((reads:noun)) "Ahoy, mateys" {and} ((fill:noun)) [the] ((house:noun)) [with] ((lots:noun)) of ((pirate:noun)) ((booty:noun))
In the current iteration of the script, it marks words chosen but discarded as being too short with {}, words where it couldn't unambiguously figure out the part of speech with [] and words that have what we defined as uninteresting parts of speech with <>.
If we display them as regular words without any indication that they've been rejected for different reasons, here's what we have left:
If you're ((looking:noun)) for a fun ((way:noun)) to celebrate your next ((birthday:noun)) how ((about:adjective)) a pirate-themed costume party? Start by sending ((invitations:noun)) in the form of ((a:noun)) buried ((treasure:noun)) map with X ((marking:noun)) the ((location:noun)) of your house, then put a sign on the ((front:noun)) ((door:noun)) that ((reads:noun)) "Ahoy, mateys" and ((fill:noun)) the ((house:noun)) with ((lots:noun)) of ((pirate:noun)) ((booty:noun))
Next, let's look at the output by simply blanking out the words we've chosen:
If you're ___ for a fun ___ to celebrate your next ___ how ___ a pirate-themed costume party? Start by sending ___ in the form of ___ buried ___ map with X ___ the ___ of your house, then put a sign on the ___ ___ that ___ "Ahoy, mateys" and ___ the ___ with ___ of ___ ___.
It seems like too many words are being replaced, doesn't it? Fortunately, that's easily tweaked.
What's a bit harder to tweak is that there are two bad choices that survived the heuristics: “a” (in “form of a buried treasure map”) and “about” (in “how about a pirate-themed costume party?”). Just make three letters the minimum required for a word that can be substituted? Skip adjectives?
For the purposes of this column, let's just proceed because this is the kind of thing that's never going to be as good as a human editor taking a mundane passage of prose and pulling out the potential for amusing re-interpretation.
The next step in the evolution of the script is to prompt users for different parts of speech, then actually substitute those for the original words as the text passage is analyzed and output.
There are a couple ways to tackle this, but let's take advantage of tr and fmt to replace all spaces with carriage returns, then reassemble them neatly into formatted text again.
The problem is that both standard input and standard output already are being mapped and redirected: input is coming from the redirection of an input file, and output is going to a pipe that reassembles the individual words into a paragraph.
This means we end up needing a complicated solution like the following:
/bin/echo -n "Enter a ${pos}: " > /dev/tty read newword < /dev/tty echo $newword
We have to be careful not to redirect to /dev/stdout, because that's redirected, which means that a notation like &>1 would have the same problem of getting our input and output hopelessly muddled.
Instead, it actually works pretty well right off the bat:
$ sh madlib.sh < madlib-sample-text-2 Enter a noun: Starbucks Enter a adjective: wet Enter a adjective: sticky Enter a noun: jeans Enter a noun: dog Enter a noun: window Enter a noun: mouse Enter a noun: bathroom Enter a noun: Uncle Mort
That produced the following result:
If you're (( Starbucks )) for a fun way to celebrate your (( wet )) birthday, how (( sticky )) a pirate-themed costume (( jeans )) Start by sending invitations in the (( dog )) of a buried treasure map with X marking the (( window )) of your house, then put a (( mouse )) on the front (( bathroom )) that reads "Ahoy mateys" and fill the house with lots of pirate (( Uncle Mort ))
Now let's add some prompts, because if you're like me, you might not immediately remember the difference between a verb and an adjective. Here's what I came up with:
verb: an action word (eat, sleep, drink, jump) noun: a person, place or thing (dog, Uncle Mort, Starbucks) adjective: an attribute (red, squishy, sticky, wet)
Instead of just asking for the part of speech, we can have a simple case statement to include a useful prompt:
case $pos in noun ) prompt="Noun (person, place or thing: ↪dog, Uncle Mort, Starbucks)" ;; verb ) prompt="Verb (action word: eat, ↪sleep, drink, jump)" ;; adjective ) prompt="Adjective (attribute: red, ↪squishy, sticky, wet)" ;; * ) prompt="$pos" ;; esac /bin/echo -n "${prompt}: " > /dev/tty
One more thing we need to add for completeness is to detect when we have plural versus singular, particularly with nouns. This can be done simply by looking at whether the last letter of a word is an s. It's not 100% accurate, but for our purposes, we'll slide with it being pretty good:
plural="" if [ "$(echo $word | rev | cut -c1)" = "s" ] ; then plural="Plural "; fi
Then, just modify the prompt appropriately:
/bin/echo -n "$plural${prompt}: " > /dev/tty
Looking back at what we've done, however, there are a couple problems. The most important is that although we have a tool that identifies part of speech, it's not particularly accurate, because it turns out that many words can be identified properly based only on their use and context. A grammarian already will have identified some of the problems above! Even more than that, I suspect that however much we hack the script to make smarter word selections and identify context, the fact is that creating a really great Mad Libs involves human intervention. Given an arbitrary sentence, there are words that can be replaced to make it funny, and others that just make it incomprehensible.
Now, it wouldn't be too much to have a somewhat less ambitious program that understood a Mad Libs type of markup language and prompted as appropriate, reassembling the results after user input. Perhaps “The <noun> in <place> stays mainly in the plain”, which turns into:
Noun (person, place or thing): Noun (a place):
But, that I will leave as (ready for it?) an exercise for the reader!
Note: Mad Libs is a registered trademark of Penguin Group USA.