How Did It All Come Together?

I needed to pick a standard distribution to get my home grown system converted to ELF. Caldera had released two previews of their Linux Desktop product at a reduced price and functionality. Since it looked like a good deal, I bought it hoping that they'd go ELF by Preview II (they did). I had a lot of fun participating in their voluntary 1.0 Beta program which included a copy of their 1.0 system.

vh-man2html started with an e-mail message I sent to the Caldera-users mailing list. This mailing list was established by Caldera as a forum, where sometimes Caldera staff can be found contributing information and responding to questions, for their users. (I should add that this forum is informal and isn't an official support channel.) I submitted a wish list for additions to the product. One of my wishes was:

Man Pages—integrate man pages into the help browser via a Man-page CGI converter. full text search? (As well as man -k search).

Caldera seemed quite open to input, so in my message I also suggested:

There's opportunity here for some of us customers to create some of what is necessary. It might be helpful to know if any of these are already being worked on at Caldera/Red-Hat.

As a result of this suggestion, another contributor to the list, Matthias Watermann (Matthias@oln.Comlink.APC.org), suggested RosettaMan might be the answer:

http://www.cs.berkeley.edu/~phelps/tcltk/index.html

He offered up an example script which got me started on my own quest for a solution. I had some success with RosettaMan, which works on formatted man pages, formatting them as necessary using the man command. This technique is well suited to many of the “commercial” Unix's where it has become common practice to distribute man pages in formatted form only. However, Linux does come with source, and both Red Hat and Caldera chose to store all man pages in source form. Having the source available means that a more accurate translation to HTML is possible. Lack of source, plus distribution restrictions with RosettaMan, led me to wonder if there was an alternative that could use the man source.

I used my web browser to search for a man source to HTML converter. Among the references I turned up was a summary of man to HTML converters at Richard Verhoeven's (rcb5@win.tue.nl) home page at the Eindhoven University of Technology:

http://wsinwp01.win.tue.nl:1234/maninfo.html

Not only did Richard summarize several of the converters available, he also offered his own efforts with no restrictions on commercial use. Richard's code produced nice HTML right out of the box; thus, it looked like most of the hard work was already done.

From my efforts with RosettaMan I had a couple of awk scripts to produce two kinds of indices into the man pages. The mansec (man section) script searches the man-path directory hierarchies, building a name-only index of man pages. The manwhatis script processes the whatis file found in each man hierarchy to build a name-title index.

I slightly modified Richard's program to make it more configurable—I added a Makefile, my scripts and the necessary magic to build Red Hat rpm packages. I checked with Richard to make sure I wasn't taking too many liberties with his work, and he seemed only too happy to see his work being put to use.

During testing I had some fun with the man-1.4 package, which is the standard man pager package shipped with Red Hat. The makewhatis script from man-1.4 was taking 30 minutes to create the necessary whatis files. By reducing the number of times the script ran an awk subscript, I was able to reduce this time to 1.5 minutes. The trick here was to pass awk of list of files for it to open and process within one awk run. Previously makewhatis was firing up a new awk for each file, so that the process creation and initialization time was dominating the run time. I corresponded with Andries Brouwer (Andries.Brouwer@cwi.nl), the man-1.4 maintainer, who was most helpful in getting the the script refined and incorporated into man-1.4g.

During March and April I released vh-man2html-1.0 and 1.1 to Caldera users via Caldera's web site. I added full text search in version 1.1 by using a program called glimpse. At this stage I found that vh-man2html had problems formatting some man pages. The source for the man pages used on most Linux systems comes in one of two forms. Most pages use the man macros which are the original troff macros that are found on ATT-derived Unix's. But several pages, most notably the networking ones, are written using the Berkeley Unix mandoc macros. Richard's program was only written to translate the man macros. Fortunately, Richard's code was structured in such a way that I was able to add additional code to parse the mandoc macros. This process took another month of spare evenings. This task was further complicated as I was not able to find good documentation of the mandoc manuals. Anyone have any pointers?

By the beginning of May I'd tested vh-man2html on every man page installed on my system, and had refined its ability to handle both man and mandoc pages. At this point I gave it the name vh-man2html and released version 1.3. Version 1.4 adds the refinement of being able to deal with man sources that have been compressed with compress or gzip.