Making Linux Accessible for the Visually Impaired with Speakup

Ameer Armaly

Issue #140, December 2005

Speakup makes Linux more accessible to the visually impaired by integrating speech capabilities directly into the kernel.

During the past ten years, evolutions in many fields of technology have influenced the lives of all of us, and especially the world's blind population. Advancements in speech synthesis have led to the usability of many different operating systems, Linux among them. One of these programs, and by far one of the best, is a screen review package called Speakup, written by Kirk Reiser with assistance from the user community. Speakup is unique in the sense that it integrates seamlessly into the kernel, allowing it to talk from startup to shutdown, and even to debug kernel errors, which I can testify to from personal experience. It also makes the installation of a Linux system much easier, because one does not usually require a serial console or sighted assistance to complete the installation process.

A screen review package is a program that takes the text displayed on the screen, and outputs it in spoken words. The actual speaking is done by a speech synthesizer, which can come in either hardware or software versions. Hardware synthesizers are either external boxes with headphone jacks and volume knobs that plug in to your computer via serial or USB ports, or ISA or PCI cards that have an output jack for a speaker or headphones. Software synthesizers are actual software programs that handle all the processing of the text into spoken words and output it through the computer's sound card. Speakup supports both hardware and software synthesizers, though software synthesizers require a user-space program and thus can't load at kernel boot, as we'll discuss later. Speakup's key features include seamless integration, logical key layout, support for laptop keyboards, easy adjustability of speech settings and support for software synthesizers.

Features

Speakup is packed full of features, some of which you won't find in any other screen reader. In order to read text, Speakup uses an invisible review cursor. At the same time, however, Speakup tracks the system cursor, to facilitate navigation in menus, editors and similar situations. To perform tasks such as moving the review cursor around, Speakup uses the numeric keypad, hereafter referred to as the numpad.

The numpad Enter key silences speech until the next key press, which is very useful for quieting boot-up messages and/or frequently heard text. It also synchronizes the location of the review cursor with the system cursor, facilitating many different operations. Insert plus numpad Enter silences reading of new text until this combination is pressed again, but still allows you to move around the screen.

The numpad plus key reads the entire screen. The numpad 0, or insert, is used as a key modifier similar to Alt, Ctrl or Shift. Speakup also respects numlock, still allowing the user to enter numbers from the numpad if necessary. Numpad keys 7–9 go up a line, read the current line and go down a line, respectively. A similar arrangement is used for words on numpad 4–6, and with characters on numpad 1–3. The numpad slash marks a spot on the screen, and if there is a spot already marked, it copies the text into memory. Insert plus numpad slash inputs any previously copied text, which usually results in pasting it to the location of the system cursor.

The numpad minus parks the review cursor. Parking means that the review cursor's location will not be moved unless the user moves it; this is useful for tracking text that changes but is not at the cursor, requiring you to move to it constantly. This functionality is also in the windowing system, which will be covered shortly. Numpad star toggles on and off cursor tracking. This is different from parking the review cursor, because parking does not affect what is actually spoken, just where the review cursor is. Cursor tracking always speaks what is at the cursor, which is optimum for menus and editors, but occasionally you may need to turn it off.

Laptops

For laptops, Speakup has a set of key assignments as well. These center around the Caps Lock key or Windows logo key if it is present on the keyboard. While the Caps Lock key is down, the letters I, O and U act as the numpad 7–9. Thus, you have a very similar arrangement to what you have on the numpad. Some things are different—for instance, Caps Lock plus Enter acts as numpad Enter, but overall it's very similar and easy to learn. When referring to either the the Caps Lock/Windows key or numpad Insert key simultaneously, they are called the Speakup key.

Adjusting Settings

Adjusting speech settings, such as volume, rate, pitch and tone, can be done in two ways.

The first, and probably the easiest, is to use the Speakup key plus the numbers on the number row. The Speakup key plus 1 and 2 decrement and increment the volume, respectively; 3 and 4 do the same with pitch; and finally 5 and 6 do the same with rate. The Speakup key plus F9 and F10 control punctuation, and the Speakup key plus F11 and F12 control the punctuation only for reading.

The Speakup key plus F5 lets you edit the “some” punctuation level. It works by toggling the punctuation that you press, as to whether it is spoken in the specified level. The Speakup key plus F6 does the same for the “most” punctuation level, and Speakup key plus F7 lets you edit what delimiters are used when moving by words; usually it is spacing and certain punctuation.

The other method of changing speech settings is to use the Speakup entry under /proc. Under /proc/speakup, there are the usual items, such as volume, rate, pitch, voice, version and synth_name, as well as some more-advanced items dealing with timing and other things. Some of these values are read/write, and some are read-only. For instance, version gives the current revision of Speakup, including the CVS build date if applicable, but synth_name can be used both to get and set the synthesizer in use. synth_direct is a write-only entry that sends all text directly to the synthesizer. It is even possible to load a new keymap while the system is running, rather than having to rebuild the kernel. There are also values for punct_some, punct_most and delimiters, which do the same things as the key functions described above. There is also a script called speakupconfig, which saves all of your entries in /proc/speakup for the particular synthesizer in use and allows you to restore these settings later, allowing automated loading of settings.

Windows

Speakup has a windowing system, which can be very useful in certain programs where a specific area of the screen that is not tracked by the cursor is updated frequently. The Speakup key plus F2 is used to set the window dimensions; the Speakup key plus F3 clears the window settings, allowing you to set a new one; and the Speakup key plus F4 silences the window, preventing it from being read automatically. However, you can read windows manually with the Speakup key plus the numpad plus key.

Work is now being done on color and highlighting recognition, which will allow ncurses-based programs to function even better than they do now, especially in menus. This means that text that is a different color from surrounding text will be given a higher priority, thus read first.

Help

There are several ways to get help on Speakup. First, you can load the module called speakup_keyhelp, and press the Speakup key plus F1. This puts you in a key identification mode, which can be exited by pressing the spacebar. When in this mode, Speakup speaks the description of any key that is assigned to a Speakup function, and allows you to arrow through the list of assignments. Another way to get help is to consult the guide provided with Speakup under Documentation in the kernel tree, or on the Web site. This document has many useful instructions, which can get a new user started with Speakup, as well as refresh an existing user's memory.

Installation

The number one thing that sets Speakup apart from other screen reader programs is the fact that it is literally part of the kernel. The install script applies a few patches to some kernel source files and copies the relevant Speakup sources to drivers/char in the kernel tree. Then, when make config is executed, there is a section for console speech output and Speakup. There you can choose what synthesizers you would like to build directly in to the kernel or as modules, though software speech support can be built only as a module.

You can also select what synthesizer you want to be the default at startup. Thus, if you build everything in to the kernel, you have a fully talking Linux system from startup to shutdown. This allows a blind person to install Linux without any sighted assistance whatsoever, because every step in the installation talks.

There are Speakup-modified ISO images for three major distros: Debian, Fedora and Slackware. Slackware has actually incorporated Speakup into its official installation setup, simplifying things even further. There is also a Speakup-enabled version of Knoppix, which is a basic Linux distro on CD. This allows people wanting a quick look at a Linux system simply to boot the CD, have it come up talking and not have to worry about installation unless they're interested. It also can be very useful for crash recovery.

Software Speech

As previously mentioned, Speakup supports software speech synthesizers with some user-space support. Some of the more famous software synthesizers include Festival, Flite, Freetts and IBM's VivaVoice Outloud, which is no longer supported. Software speech in Speakup centers around another program called Speech Dispatcher. Speech Dispatcher is a framework to provide a single interface to multiple software synthesizers. It does this through a series of programs that provide a Speech Dispatcher interface to elements such as Emacs as well as libraries for a number of languages. It also has a tcp protocol for transmitting speech from a server to client that does the actual output.

Speakup has a generic software synthesizer driver called /dev/softsynth, which outputs the text that would normally be sent to a hardware synthesizer. A module for Speech Dispatcher, called speechd-up, takes the text from /dev/softsynth and sends it to Speech Dispatcher and a software synthesizer of the user's choice. Support exists for Festival, Flite, Dectalk software and generic synthesizers. You also can integrate other synthesizers with some tweaking of configuration files. Performance-wise, software synthesizers have a slight lag in responsiveness compared to hardware synthesizers, but the overall result is not that bad given the circumstances.

The first step is to get Speech Dispatcher working, which is not hard at all; just compile it and you're set to go. You have to edit the configuration file to tell it what synthesizer you want to use; by default it uses Flite. Then, compile and install speechd-up. To start software speech, load the speakup_sftsyn module if you haven't already, and run speechd-up. If you do this through an init script, you still will get an early-talking system, though not entirely in the kernel.

Future

Many things are planned for Speakup in the future. As has been previously mentioned, work has been started on color recognition and highlight tracking, thanks to some folks at the American Printing House for the Blind. This will enable many menu-based programs to talk much more smoothly.

Another new feature that is planned is keyboard macros, allowing the user to accomplish many different tasks with the press of one key. We eventually want to have a screen memory find function, as well as a goto function to go to a specific set of coordinates on the screen.

Another matter that is under consideration and analysis is configuration files. These files would somehow have to be loaded in on execution of their corresponding program, and would contain voice, macro and other information necessary for the operation of that program.

All of these and more features are planned for Speakup in the future, provided that people are willing to help and contribute their time to the effort of making Linux accessible to the world's blind population.

Conclusion

Today, technology has revolutionized the lives of the world's blind population. Computers allow us to access data more easily than ever, and the arrival of the Internet into the mainstream has made communication and linking with others easier than ever before for everyone. Linux systems are economical by their nature, not requiring the absolute latest hardware to run well. This is especially helpful for the world's blind, who may not have access to as much funding as would be ideal. Now there is a cheap and workable solution for those people, a fully talking Linux system with Speakup; and with the introduction of software speech and Speech Dispatcher, it just got even cheaper.

Resources for this article: /article/8586.

Ameer Armaly is a sixteen-year-old junior in high school. He has been blind since birth, and enjoys programming, food and science fiction. He uses computers with the aid of talking programs that read the text aloud, sometimes as fast at 550 words per minute.