Work the Shell

Parsing Your Twitter Stream

Dave Taylor

Issue #190, February 2010

More work on the Twitter response bot.

Last month, we circled back to Twitter and started developing a shell script that lets you actually parse and respond to queries sent via Twitter. The idea was that if you were a store, for example, a tweet of “hours?” could be answered automatically with a response tweet of the store's hours—simple, but interesting nonetheless.

We ended last month with a script that does quite a bit in just a few lines:


#!/bin/sh

curl="/usr/bin/curl -s"
inurl="http://www.twitter.com/statuses/mentions.xml"
pw='PasswordGoesHere'
temp="/tmp/$(basename $0).$$"

trap "/bin/rm -f $temp" 0 1 9 15 # axe our temp file

$curl -u "davetaylor:$pw" $inurl | \
    grep -E '(<screen_name>|<text>)' | \
    sed 's/@DaveTaylor //;s/  <text>//;s/<\/text>//' | \
    sed 's/    <screen_name>//;s/<\/screen_name>//' | \
    awk '{if (NR % 2 == 1) { printf ("msg=\"%s\"; ",$0) }
          else             { print "id="$0 }}' > $temp

while read buffer
do
    eval $buffer
    echo Twitter user @$id sent message $msg
done < $temp

exit 0

(Unfortunately, it has to have the Twitter account password hard-coded, which I've obviously redacted here. You can see where I have “davetaylor” appear and can tweak this to match your own Twitter account.)

This is a pretty tricky script, if I say so myself. Here you can see that we unwrap the XML sent by Twitter and use a complicated sequence of grep/sed/awk to turn it into two name=value pairs, instantiating msg and id.

When I run the script, I see:


Twitter user @TedWahler sent message That sounds like a
very interesting article. When and where can I read
&quot;When Not To Identify your Group Memberships&quot; Dave?

Twitter user @naomimimi sent message i will send you some
of my amazing restedness after sleeping for 20 hours
yesterday. *bzzzt* feel better? :)

Twitter user @GaryBloomer sent message RE: Song. Dave,
don't know if you have an answer yet, but: Supertramp:
If Everyone Was Listening

A tiny tweak can show who sends you tweets (these are actually @ replies, which is what makes this work): simply change the echo in the final loop to just echo $id.

Want to find those shortened URLs and compile a list? That's a tiny bit more tricky, but you can use tr and grep to do the heavy lifting:

$ sh tweet-listen.sh | tr ' ' '\
> ' | grep 'http://'

http://twurl.nl/bco8tq
http://twurl.nl/bco8tq
http://bit.ly/12PvjV

Hey, someone must have retweeted or something for the same URL to show up twice!

What we want to do though is look for a specific pattern within the stream, so let's do that instead.

Looking for Patterns

The easy way is to change the while read buffer loop to do the parsing:


while read buffer
do
  eval $buffer
  if [ "$msg" == "hours" ] ; then
    echo "Twitter user @$id asked what our hours are"

  elif [ "$msg" = "address" ] ; then
    echo "Twitter user @$id asked for our address"

  # else
  #   echo Twitter user @$id sent message $msg
  fi
done < $temp

Armed with that (and with some cooperative Twitter pals), I can now run the script and find out that:

Twitter user @MommyBrain asked for our address
Twitter user @lizhamilton asked what our hours are
Twitter user @valdezign asked what our hours are
Twitter user @bgindra asked what our hours are
Twitter user @MommyBrain asked what our hours are

Coolness, eh? Now, let's answer.

Responding to Tweet Queries

From an earlier column “Pushing Your Message Out to Twitter” in the November 2008 issue of LJ (www.linuxjournal.com/article/10222), we have a script already lying around that lets you specify what message you'd like to send out on Twitter, so it's just a matter of assembling it properly:


while read buffer
do
  eval $buffer
  if [ "$msg" == "hours" ] ; then
    echo "Twitter user @$id asked what our hours are"
    $tweet "@$id our hours are Mon-Fri 9-5, Sat 10-4."

  elif [ "$msg" = "address" ] ; then
    echo "Twitter user @$id asked for our address"
    $tweet "@$id we're at 123 University Avenue, Anywhere USA"
  fi
done < $temp

In this instance, I'll repeat the earlier tweet script because it's both so succinct and so darn useful:

#!/bin/sh
# Twitter command line interface

user="DaveTaylor" ; pass='PasswordGoesHere'

curl="/usr/bin/curl"
$curl --silent --user "$user:$pass" --data-ascii \
    "status=$(echo $@ | tr ' ' '+')" \
    "http://twitter.com/statuses/update.json" > /dev/null

echo "(sent tweet $@)"
exit 0

The problem is a bit more complex than we've addressed so far, because when I asked people to send one-word queries, I also got things like “directions” and directions! rather than just the word by itself, unadorned by punctuation, quotation marks and so on.

This is something we'll need to deal with in the script, so we'll want to scrub the msg value to be just alphanumeric (or just alphabetic, if our set of canned response queries never includes a digit). This can be done with tr again, immediately after the eval $buffer statement:

msg="$(echo $msg | tr -cd '[:alpha:]')"

That's not quite right. When we get “directions”, it's actually with the quotes escaped by HTML so they're &quot; rather than just the " symbol. The result? quotdirectionsquot. Not good.

Just like so much in the world of programming, things aren't as easy as you'd like them to be. Instead, we're going to have to strip out quotes manually as part of the scrubbing process. Now it looks like this:


msg="$(echo $msg | sed 's/\&quot;//g' | tr -cd '[:alpha:]')"

It's a bit more complicated, but not terribly so.

The bigger issue is recognizing when we've already responded to a Twitter query to the bot. I'm sure no one's going to appreciate it if a query for “hours?” results in an answer every ten minutes for the next two weeks!

There are two ways to address that particular problem, one of which is to add timestamps to each tweet and figure out when we last auto-responded, but that sounds suspiciously like work. Instead, we simply can remember the most recent tweet to which we responded, including user ID, and use that as the starting point for subsequent auto-response parsing efforts.

I can't squeeze it in this month, but rest assured that next month we'll add this third piece and then talk about how to slip it into a cron job so that every N minutes our Twitter response bot answers any pending queries from the twitterverse.

Dave Taylor has been hacking shell scripts for a really long time. He's the author of the popular Wicked Cool Shell Scripts and can be found on Twitter as @DaveTaylor and more generally at www.DaveTaylorOnline.com.