Low-Bandwidth Communication Tools for Science

Enrique Canessa

Clement Onime

Issue #75, July 2000

No access to the Internet? Browse the Web via e-mail instead!

Dissemination and management of knowledge is essential for scientific enterprise and sustainable development. For several decades, the Abdus Salam International Centre for Theoretical Physics (ICTP) in Trieste, Italy, has paid special attention to the needs of developing countries to foster, through training and research, the progress of science.

The Centre long ago realized the importance of information retrieval systems on the Internet, including the distribution of in-house preprints, yearly activities and public access catalogs.

On a technical level, Linux provides us with a cost-effective alternative for promoting distance electronic collaboration (see Resources). Based on the Linux OS, virtual laboratories and the extensive use of digital communication tools can help reduce scientific isolation, while filling the need to transfer knowledge to developing countries in the Southern Hemisphere in an unprecedented way (see Resources).

Following these principles, we have started building prototype, on-line scientific tools to further enhance electronic collaboration and to support the use of web navigation and database search by e-mail. Below, we describe two tools that Salam ICTP offers the low-bandwidth scientific community. Both packages use state-of-the-art technologies and software developed in-house, and are distributed under the GNU General Public License (GPL).

www4mail—Web Navigation and Database Search by E-Mail

The ICTP www4mail software allows navigation and search of the entire Internet via e-mail, using any standard web browser and a MIME (Multipurpose Internet Mail Exchange)-aware e-mail program. At first glance, it may appear similar to one of the several web-to-mail software interfaces; but the www4mail program introduces new features not previously available. In short, e-mail messages containing filtered HTML pages are automatically passed to the www4mail server when links to other web sites are selected while browsing.

Written in modular Perl, the program allows retrieval of web pages, searching of arbitrary databases, filling out of web forms (GET and POST conduct web database searches) and following of links (on-line browsing), all by e-mail. It is multi-lingual, easy to manage and supports current Internet standards (MIME, HTML 4.0, etc.).

Developed from scratch on the Linux platform, www4mail has been used successfully on the BSD platform and contains some optional optimisations that are Linux-specific. For example, www4mail can monitor the system load average, directly from the Linux /proc file system and, at high load averages, queue requests for later processing.

Here are some major features of www4mail:

  • sends replies as e-mail attachments or in the body of an e-mail message, depending on the type of request options sent by the e-mail client through the web browser

  • supports scripting, once the browser can display it

  • delivers most types of web documents, including JavaScript and cookies

  • handles dynamic contents, parsing text HTML and source HTML

  • preserves the original layout of requested web pages

  • retrieves information from FTP sites and Usenet news servers

  • handles meta tags; that is, if a web page is redirected or relocated by the use of a meta statement, www4mail automatically warns about the possible relocation of the information and provides suitable links for the new location at the top of the reply page

  • handles frames, inserting suitable links to each framed document

  • supports user authentication for password-protected web/FTP sites

  • traps error messages and sends them back to the user

  • provides support for text-only access for compatibility with the alternative “Agora” and “GetWeb” web-mail servers

  • serves filtered requests to reduce bandwidth

  • supports the transfer of binary data

  • allows web pages to be downloaded as PostScript files, to be viewed or printed locally (see for manuals)

4.23.00 - It was in the C-edit directory as of May 1.

www4mail (see logo in Figure 1) was developed mainly to help researchers from developing countries browse the Web using only e-mail and slow Internet links. While the amount of information on the Web has grown exponentially in the last few years, there is still a large community of Internet users who have access to only e-mail, or their Internet providers do not offer full Internet connections (some of them still use UUCP) or who cannof afford to have an expensive account with full Internet capabilities. Many of these users live in rural areas of developing countries, and rely on e-mail to access essential medical and business information as well as for interpersonal communication and world news. Having the ability to query available databases (such as AltaVista, HotBot, etc.) or preprint repositories with one simple e-mail and receive the output in a few minutes (or hours) could help them tremendously with their scientific work.

At present, www4mail can be tested by sending an e-mail message to www4mail@wm.ictp.trieste.it, or to any other place where the gateway is installed (e.g., Bellanet-Canada, www.bellanet.org/email.htm), listing the requested URL(s) in the body of the message.

Over 50 server configuration options are currently available for setting parameters such as maximum quota per user, gateway administrators, maximum size of each request, or to split sizes for large files. (Type help in the body of the e-mail message for further details).

The installation procedure of the server is simple. For example, under Red Hat Linux, create a user account called www4mail (adduser www4mail), log on as user www4mail (su - www4mail), extract the tar archive in the home directory for www4mail (tar zxvf www4mail.tar.gz) and perform a few extra operations (e.g., to enable forwarding). It is necessary to create a link from the executable /home/www4mail/bin/www4mail to the /etc/smrsh subdirectory in order to keep the sendmail MTA (mail transfer agent) happy. To optimize its configuration, some preliminary monitoring is necessary.

www4mail has been very useful for many people from many different countries, often receiving over 12,000 requests per day. You can view weekly statistics at http://web.bellanet.org/www4mail/).

ScientificTalk—Real-Time Mathematics Discussions via Web

ScientificTalk (see logo in Figure 2) is a profession-specific prototype tool for scientists, students and teachers to exchange information via a web browser, using a display of math equations in a synchronous manner. The project focus is on users' interests in such things as mathematics and scientific notation. Our motivation follows an early goal for the Web to be a readable and writable collaborative medium.

Unfortunately, the large tag repertoire of the HTML 4.0 language does not cater to mathematics, since they cannot mark up complex mathematical expressions. Usually, to create technical documents with mathematical or scientific content, web authors resort to methods involving images (e.g., screen captures of equations), which means the sharing of scholastic and scientific material by lecturers, students, etc. is often a many-step process. There are a few available applets and plug-ins that can render MathML in a browser (which are not necessarily designed for synchronous collaboration).

The Mathematical Markup Language, MathML, is a recommendation of the W3C, which provides a foundation for including mathematical expressions in web pages. As an application of the Extensible Markup Language (XML) and with adequate style-sheet support, MathML will ultimately make it possible for browsers to natively render math expressions, including threaded on-line discussions. (Some applets and plug-ins are currently available, which can render MathML in a browser.) See W3C at http://www.w3.org/Math/ for a complete list of technical/scientific document viewers and renderers such as the Scientific MessageBoard WebEQ, the IBM techexplorer, EzMath editor and LaTex2HTML.

ScientificTalk is a Perl script for a standard multiway graphical web chat. This CGI-based application is portable across platforms and allows the viewing of occupants, sending input to specific users, etc. While chatting on-line, it converts textual input or standard LaTeX—a popular computer language for composing formatted scientific text for high-quality printing—into HTML. The math displayed on the browser is rich because of the LaTeX typesetting and Ian Hutchinson's powerful TeX-to-HTML translator, tth, available at http://hutchinson.belmont.ma.us/tth/.

For those not familiar with LaTeX commands, ScientificTalk has an external symbols keyboard as well as a composer and messenger window for the user input. There is no need for extra plug-ins or high-speed networks—all input is passed via text-mode only. On the client side, Netscape (v4.0 or greater) needs a simple character-set configuration (for details and demo, connect to http://sv7.ictp.trieste.it/.

Although the ScientificTalk prototype has proven that it is possible to carry out synchronous math discussions on the Web between distant clients today, our to-do list is still long. For example, it would be useful to save a complete session as a LaTeX file (in order to restart an on-line discussion from a given session or collaboratively write LaTeX documents on the Web), display plots from a given function, create small transparent .gif files, and extend its language capabilities to symbols in other domains, such as chemistry.

Concluding Remarks

More opportunities for learning and growth are available if we can shar ideas via a computer environment that is responsive to our professional needs. For example, using simplified scientific notation on the Web can lead to faster, more effective results. Electronic tools, designed for collaboration and based on Linux, will continue to play an important role in an increasingly interconnected world. Off-line browsing via web-to-e-mail servers such as www4mail is still a reality from remote areas of the world, and most likely will remain so, as the number of Internet users is expected to double to 300 million by the year 2005.

Acknowledgements

Resources

Dr. Enrique Canessa (canessae@ictp.trieste.it) is a theoretical physicist currently working as a scientific consultant at the ICTP. His main areas of research and interest are in the field of condensed matter and scientific software applications. He has been lost in the Internet since 1987.

Clement Onime