Let's go back five years and remember how things were. In 1992 the World Wide Web was just getting started. The first effective Web browser, Mosaic, was just becoming available. Nearly all the scholarly literature was being delivered in the traditional paper form. But, we were becoming aware of the possibilities offered by this new medium for information dissemination. Some journals began to deliver their tables of content over the Internet. Other journals were being delivered the same way - although this only worked for those journals which could be represented in ASCII format. The Red Sage project, and other projects which depended upon proprietary formats and software were under way - a mistaken approach as it turns out. Nevertheless, it all seemed like magic to have what amounted to instantaneous delivery of information via this new medium.
But now it is five years later. It's time to become more sophisticated. As the magic of receiving scholarly journals (or even just the tables of contents) from the World Wide Web wears off, we, as consumers of electronic information, have to wipe the stars from our eyes and take a hard look at what is being offered under the rubric of "electronic journal." I have recently seen one library's Web page listing their so-called "electronic journals." Under this heading they list two types of electronic journals; those with electronic tables of content and full text journals.
Here's my take on this listing. Tables of content are fine, and are definitely a step up from paper delivery, but it escapes me how they can be classified as an "electronic journal." But, what I find even more disturbing is the tendency to lump all full text electronic journals together in one category. There are certainly two classes of full text electronic journals, those which are page images and those which have a full suite of electronic features and provide information which is impossible to transmit via the printed page.
Admittedly, having the full printed page transmitted nearly instantaneously via the Web is a great step forward. It is this format which the Red Sage project used and which is now available via Adobe's PDF format. Although the format of the printed page does not fit on a computer screen well, it is possible to read a PDF document on the screen, and it looks pretty good when printed out locally. Nevertheless, the PDF format is nothing more than electronic delivery of paper pages. And while the file size for a PDF document is about ten times smaller than the same document in Postscript, it is still about ten times larger than the same amount of information which has been coded in ASCII or HTML . While many libraries have good Interconnectivity, trying to download a ten page PDF document can be excruciatingly slow over a modem at home.
But size is not the issue here, functionality is. Navigating and reading a PDF document is still a cumbersome process. Most PDF version of journals still do not contain links to the references, much less to the citation list or to data tables. Adobe says the new version of Acrobat will take things like movies and links, but publishers are only now beginning to make this happen, albeit with much trumpeting and fanfare. But that is not sufficient any more. PDF journals remain just what they have always been, page images of the paper version of the articles delivered over the Internet. Five years ago that would have been wonderful. Not so today. Readers want more. For instance, readers have responded enthusiastically to the two full-featured electronic journals which appeared in 1995, the Astrophysical Journal and Journal of Biological Chemistry . Access logs show that readers of the electronic ApJ look at the HTML version six times more often than they access the PDF version. Subscriptions to JBC rose 10% the year after the advent of the electronic version, reversing a downward trend of many years.
Why is this so? A well designed electronic journal has links for navigating within the journal. Screen images are hard to read, at best. To be competitive with paper, electronic articles have to incorporate tools for the reader and features which can't be duplicated in paper. For instance, the best electronic journals have links from the reference list into the text where the reference originated. Same for figures. This makes it possible to go first to the reference list and then jump into the middle of the text, or to start with the figures and jump to the text explaining an interesting figure. Try to do that in the paper version, and you spend a lot of time searching the text for the references.
But more than that, our readers tell us that being able to jump right to the abstract (and even the full text in many cases) of the referenced articles makes the electronic journal a powerful research tool. Add to that, a continuously updated citation list which gets carried with each article, and you suddenly have a whole new capability being delivered right to your desktop. Then add, when appropriate, movie and sound clips and, presto, it is a whole new world. True, we have to learn how to use it effectively, but already the potential should be obvious.
If the full featured, linked, HTML journal is so superior, why don't more publishers go that route? The answers are conservatism, difficulty and money. Publishers have developed procedures over decades, and they find it difficult to change. It is noteworthy that the two full-featured journals mentioned above were developed outside of the normal publishing enterprise. The ApJ was the result of a special electronic publishing team set up by the American Astronomical Society outside of their normal journal operations. The JBC electronic format was developed by Highwire Press which originated out of the Stanford University library. Both groups set out to make something exceptionally useful to the readers in the scientific community. Only when we knew what we wanted did we ask how best to incorporate what we needed into the publishing process.
In the case of the ApJ, the AAS worked with our publisher, the University of Chicago Press, to develop a whole new process which produced the electronic, archival database of manuscripts first, and then derived both the electronic screen version and the paper version from this archival electronic database. In order to get started rapidly, Highwire Press, in essence, grafted the electronic products onto the end of the traditional paper process. Both methods required significant development; basically a revolutionary new approach. And both organizations are continuing to experiment and adapt to the electronic environment. It has been, for the University of Chicago Press and the AAS, a difficult job to revamp the process while still publishing 25,000 pages per year. Larger publishers face an even more difficult task.
But maybe money is the factor driving larger publishers to produce PDF journals. After the paper journals are typeset, it is easy and inexpensive (relatively) to produce a PDF version. It is just a matter of capturing the page images which are already prepared and putting them as a series of files. Who wouldn't choose to make a PDF journal when they realize that to add the full range of electronic features will require a three year development and a significant investment in resources, not to mention requiring the entrenched staff to adopt a whole new philosophy? Ironically, we find that once you make the effort to re-engineer the complete publishing process, you can actually produce both paper and electronic versions for less than the cost of paper alone using the more traditional procedures.
Where does that leave the users? It leaves them with less than they deserve. By embracing the PDF journals and not demanding better, the libraries and their user communities are setting the stage, both for slow progress now, and for later disaster when they discover that PDF is not a satisfactory archival format. It is this latter problem that concerns me the most.
We can probably all remember electronic formats which have disappeared. Has anyone tried to play a beta format videotape recently? Do you remember eight inch floppy disks? Even a drive for 5 1/4 in floppy disks is hard to find these days. The electronic world is changing more rapidly than ever before. Yet scholarly literature has to be readable 50 or 100 years from now. Just as technology is the cause of this problem, so it can be the solution. But you have to plan for managing an electronic journal so that it never becomes unreadable. This is not possible with PDF, it is only possible when all the features, structure, special characters, etc. are coded in the original electronic product.
And this is where the University of Chicago Press has taken the lead. Right now one accepted standard is the Standard Generalized Markup Language (SGML). The UCP translates the electronic manuscripts from the authors into SGML and does all subsequent operations using SGML. The result is an archive, which the public never sees, which contains all the information necessary to derive automatically the screen version, the PDF version for printout and the paper typeset version. Once the archive exists it becomes possible to translate the archive into any new standard and to rederive the entire public electronic version to utilize new browser tools as they become available. Automatic translation offers the only hope to keep the journal updated at a reasonable cost. Before they pay good license money, libraries should be asking publishers about the longevity of their material. There are a series of obvious questions. How long will the publisher make the material available? Are their links (if they have them) structured to ensure they will work into the next century? Is their archive in a robust format like SGML which lends itself to updating and automatic translation? Who do they envision will manage the archive to keep it up to date?
These are serious concerns. The Commission on Preservation and Access has done its best to call attention to the problem of ensuring long term archiving and access. But librarians and users should be educated about what constitutes a good electronic journal. Right now we are in a transition period. By shopping smarter and demanding quality electronic publications, we can both shorten the transition time during which we only get PDF versions, and ultimately ensure continued access to the electronic scholarly literature well into the future. It is time that the library community understands the differences between good and poor electronic journals.
It is also time to start educating the readers as to the great possibilities inherent in full featured electronic journals and the potential for the eventual loss of the literature if we remain satisfied with what I call "archivally crippled" electronic journals. Within two years we will come to understand that paper archives will not suffice as a storage medium for much of the important content of electronic scholarly journals. We should not wait that long to start asking publishers to produce effective, archival quality electronic journals.
Acknowledgments:
Development of the AAS electronic journals has been supported in part by a grant from the National Science Foundation and a cooperative agreement with the National Institute of Standards and Technology.
Thanks are owed to the other members of the AAS Electronic Publishing Development Team: