Podcast Recording Shootout

Dan Sawyer

Issue #169, May 2008

So many VoIP programs, so little time. What's a podcaster to do?

Do you have a podcast? Okay, dumb question. Of course you do—podcasting is the blogging of tomorrow. It's quick, it's easy, it's not tied to a computer screen, and your audience members can take you with them anywhere on their iPod-ish devices. Best of all, you don't have to worry about actually learning to spell in order to inflict your opinions on others. So long as you can speak clearly and have fun doing it, you too can have a podcast. So who wouldn't want to do it?

I mean, you have an opinion you want to express, right? Or you have a story you want to tell. Or you simply have a desire to see what will happen if you gradually fade the volume out on your podcast until it's near zero, encouraging your listeners to turn their headphones up, before you blast them with a channel-saturating guitar riff to wake them up. The point is, you have a podcast, or you want one.

One thing you begin to notice when you get into podcasting is that listening to your own voice is boring—really boring. It's cathartic to rant into a microphone for half an hour and then put it on iTunes for the world to hear, but after a while, it's really nice to have listeners call in, or have guests, or pick up a cohost in another state.

How can you do it? Telephony, naturally.

Now, I must emphasize that not just any telephony client will work. Ekiga and Skype are not created equal. Neither are Gizmo and Twinkie. That doesn't mean they aren't all good for something, but good for something isn't the issue here. We need good for podcasting, which is a whole other spool of fiber-optic cable.

In my podcasting and production career, I've run into a lot of remote conferencing, and I've found that pretty much any remote conferencing is done for one reason: you can't get the talent into your recording studio (humble as it may be).

Why this can happen is a bit of another matter. For one of my podcasts, The Polyschizmatic Reprobates Hour (don't ask), my sometime-cohost lives halfway across the country, and to have any kind of intelligible real-time conversation, we needed a good telephony setup. This went double for when we needed to bring in guests for interviews. The basic requirements list is as follows:

  1. Good sound quality: this show is already going to be compressed to MP3; we don't want to start off with crappy sound in the first place.

  2. Ease of installation: most people still are fairly technophobic or tech-ignorant, and most people still run Windows. That means whatever telephony software you're using for your podcast conferencing, it has to be one that you can get guests up on in a few minutes. Longer or more troublesome than that, and you're going to hear the words of death: “Maybe we should do this another time.”

  3. Ease of dial-out/dial-in: sometimes, your guests just aren't going to be able to get on your VoIP network, and when that happens, you have to call them on a phone. In that case, you want the experience to go quickly and smoothly—there's nothing worse for your street cred than making a guest, who has carved out an hour for you, wait by the phone. Chances are you'll need to do this at some point. When you do, will it be quick and painless? Will the price be right?

  4. Ease of recording: of course, the best-sounding protocols on the slickest software in the world aren't going to get you anywhere if you can't record your conversations, and on this score, VoIP software is justly infamous. Because of the way most conference calls grab your sound ins and outs, it often kills the hardware duplexing your otherwise bright-and-shiny ALSA drivers usually support. But, a lot of people podcast over telephony, so there has to be a way.

  5. Carts: this is something from the old days when those of us who took broadcasting training at college radio stations actually had to juggle tapes. A cart was a tape cartridge on a continuous loop that contained station ID, sound effects, music beds or anything else we wanted to punch in to the broadcast. Nowadays with podcasting, most people just lay this stuff down in the final mix, but sometimes it's nice to be able to play things while the show is being recorded—sound effects, quotes from sources upon which you're commenting and so on. This is one of those nice-to-have-but-not-essential features, which does make life a lot easier.

Now, looking back over that list, the vast field of SIP clients narrows substantially. Instead of a couple dozen to pick from, there are only two that will fit the bill, and neither of them are open source.

Skype vs. Gizmo

The two main contenders that are suitable for workhorse podcast use are Skype and Gizmo. Both are very easy to download and install. Both offer comparable rates on calls coming in from the phone network and going out again, both nationally and internationally (though Gizmo has a slight edge in this respect). Both are user-friendly and easy to get potential guests set up on so they can be on your show.

They both are usable. They both are workable. They both run quite well on Linux, Windows and Mac OS. Their feature sets are comparable in many respects. But, they are not the same.

The Technical Lowdown

Skype, now the prized stepchild of the eBay corporation, runs on a proprietary peer-to-peer networking back end that co-opts the user's system resources to route calls, up to the maximum of what it can grab that's not being used by other systems. This is comparable to how BitTorrent works, though unlike with BitTorrent, users have no control over how much in the way of bandwidth or system resources they want to allocate to the task. The practical upshot for this where performance is concerned is curiously double-edged. At the beginning of a Skype call, the connection typically is loud and clear, the mix is well proportioned, and the compression artifacts are very difficult to hear (and, if you're good with EQs, you can pretty much notch out the most obvious ones). However, as a call progresses, more of your personal bandwidth gets allocated to other network calls, and the quality of the audio gradually degrades. At low traffic times, this effect is barely noticeable, but at high traffic times, you may find yourself having to restart the call every 10–15 minutes as the quality falls below what you find acceptable (or intelligible).

Its networking setup isn't the only thing that's proprietary—it's also a closed system. Skype's network can't be dialed in to directly from any other voice-conferencing network. The standards are closed, and they're black-boxed. Although this isn't a problem that's directly relevant to podcasting, if you're looking for a general first-line VoIP package, it's something you'll want to keep in mind. Skype is like Vegas: what happens there, stays there—well, assuming its encryption algorithms are robust.

Gizmo, a service and application owned by SIPphone, Inc., has a somewhat different approach. Although the software itself is proprietary, it uses the open SIP protocol for its transport across the Net, and calls are routed directly over the SIPphone network between the individual call participants, rather than being routed through a peer-to-peer network. Because it uses SIP and Jabber, it can hook up with any software using either of these protocols fairly transparently.

Gizmo uses TLS and SSL encryption to discourage eavesdropping—open technologies whose strengths and limitations are well known. The corporate culture is deliberately geared toward transparency rather than toward opacity, which is an operating philosophy that warms the cockles of this Linux geek's heart. However, when it comes to encryption, Gizmo also loses a point, as it does not encrypt between Gizmo and non-Gizmo SIP clients.

The sound quality on Gizmo follows a different curve from Skype. Because Gizmo routes over the SIP network instead of through a peer-to-peer setup, it is more subject to the fickle winds of fate. When Net traffic is up, Gizmo calls tend to decay. When it's down, they do better. However, Gizmo does not progressively degrade performance over the course of a call or take your bandwidth for allocating to other calls on the network.

In terms of actual performance, the sound quality is usually a wash, but Gizmo consistently sounded better the times I've used it for multiparty conferences than has Skype, particularly on extra long calls.

Recording the Podcast

So, you've got your guest on the line, your cohost on the other line, and all three of you are happily chatting it up in the conference. The podcast is off to a great start—if you can manage to record it correctly. Sometimes, this isn't as easy as it looks.

Skype is notoriously difficult in this area. Although the latest version works on ALSA instead of OSS, on many distros it still doesn't always play nice. It doesn't work well with the Windows or Mac sound systems, either. With full duplex sound hardware, this should be a no-brainer, right? Simply dump the DSP to a file in parallel with running the conference. Alas, some programs want to be front and center, end of story. Skype is one of them. In order to record a Skype call, you have to do one of two things:

  1. Hijack the DSP with a middleware layer. There are a number of packages that'll do this—for a fee—on Windows and Mac. On Linux, I've only ever found one solution that works, and it's a kludge. Twisted Little GNOME has cleverly cobbled together LAME, OggEnc, SoX, Vsound and Skype in an elaborate (though very dependable) script, available at sourceforge.net/project/showfiles.php?group_id=146056&package_id=160795&release_id=358917. Unfortunately, this script is not well maintained and tends to break when Skype upgrades. Worse still, this is the only hijacking option that I've been able to find for Linux. The other method of recording Skype calls is suitable only for audio engineers and people that like playing around with too many cables.

  2. The two-computer mixdown: there are a few permutations of this, but basically, you'll need two computers—one to conduct the call (Box A) and the other to record it (Box B). To do the recording, you either split your mic into two channels before it hits Box A, and split the speaker out after it leaves Box A, and run them both to Box B as left and right channels. The other option works only if you're running a mixing board: route your mic input to both Mains and Subs, and plug the Box A output in to the board as a Subs-only source, then send the Subs to Box B for recording (if you're not following this, don't worry—just be glad you're not an audio engineer).

Either way, if you intend to record a Skype call, be prepared to put up with a bit of misery.

Gizmo, by contrast, has a recording tap built in to the program, and when you press Record, it announces to all parties on the call that the call is being recorded. Thus, not only is recording the call painless, it also covers your backside legally (see the Legal Issues sidebar).

Carts and Extras

When it comes to live carts, on Skype, you're out of luck. Without third-party plugins, there isn't a thing you can do with Skype to make it play nice with other sound apps on the computer, and not a lot of those plugins are available for Linux.

With Gizmo, on the other hand, you have options. Gizmo comes with a cart interface where you can preload ten sound FX for playing at the touch of a button. You also can route XMMS through Gizmo and play your carts from there, if you need a longer playlist.

Skype and Gizmo also offer varying sets of extras to entice customers. Both have integrated text chat—a very useful feature for prepping your guests for their next question or conspiring with your cohost behind your guests' backs. Both have integrated file transfer—handy for sending outlines or PowerPoint slides to discuss.

Skype's two big standout extras are one-click video conferencing (even under Linux), which can double as a whiteboarding system and extremely easy-to-set-up conference calls.

Gizmo's conference call system, by contrast, can be a bit twitchy, particularly when trying to bring in someone from an external phone network. On the other hand, with Gizmo, you get free voice mail, which is lovely for handling show feedback. On Skype, voice mail comes only with a subscription to Skype Pro.

Conclusion

Of the two, on technical merits, Gizmo is the clear victor over most of the field. Happily, it's also the winner on cultural merits. However, Skype is used more widely, and potential guests are more likely to be familiar with it. The different network architectures of the two services gives an odd kind of redundancy—often, when one's sound quality stinks, the other's works gloriously. My advice: keep them both around. But, when it comes time to buy call-out credits or to get a call-in number, stick with Gizmo.

Dan Sawyer is the founder of ArtisticWhispers Productions (www.artisticwhispers.com), a small audio/video studio in the San Francisco Bay Area. He has been an enthusiastic advocate for free and open-source software since the late 1990s, when he founded the Blenderwars filmmaking community (www.blenderwars.com). He currently is the host of “The Polyschizmatic Reprobates Hour”, a cultural commentary podcast, and “Sculpting God”, a science-fiction anthology podcast. Author contact information is available at www.jdsawyer.net.