The Web makes it possible for everyone to put their material up where everyone can see it, usurping, as it were, the traditional job of the publisher, that of distributing information. But, other tasks of the traditional publisher are not being taken care of by most people putting information on the Web. Self-publishing on the Web can be seductively rapid and promotes the exchange of current information without regard to physical location. A well organized and seamlessly interlinked, distributed online information system provides benefits to the users which go far beyond the mere availability of single articles. As the system developed for astronomy demonstrates, such a system can revolutionize research and study capabilities. But, without adherence to open standards, provision for easy maintenance, a good measure of quality control, and production in a robust format it is impossible to weld the individual articles into a working system. The products of individual Web authors, and even many Web publishing organizations, fall far short of what is required to provide a well linked, functioning, and long-lived information system.
The capabilities of the Web are enormously seductive We can prepare a paper or other work in HTML and, literally, reach the eyeballs of a million people around the world. Seeing our work on the Web gives most of us a rush which exceeds the sight of our book featured in the window of the local bookstore. It is quite enjoyable to plug into the Internet in some far-away place and show off our latest work to friends or colleagues. But, once we get over the initial thrill of seeing ourselves on the screen, we should be asking ourselves some sober questions about Web publishing. Why are we doing it? What purpose are we trying to achieve? How long do we expect our work to be read? How do we expect that potential interested readers will find our work? More importantly, how can we place our own work within a logical collection of material on the same or similar subject so that the readers can put it in context?
Let me at once define the limits of this paper. I will be discussing scholarly publishing. I am not discussing trade publications, entertainment, or personal pages (except insofar as they provide a home for scholarly material one may produce). For scholars, the Web provides wonderful opportunities to interact with colleagues, to collect and share data, and to make their work available to a broad audience of peers, colleagues and students, as well as to the public at large. Particularly in this scholarly arena, the worth of a work will be enhanced if it is located (in the logical, not the geographical sense) with other works on the same subject. Where it can be found by colleagues working on the same topic, and can, through references and comments of future scholars, be incorporated into the edifice of scholarly knowledge.
The great power of the Web lies in being able to search over a wide variety of disparate sources, finding, extracting, collating and combining different material to generate knew knowledge. The question is how to make this possible. The days of having to physically acquire a copy of everything of interest is gone. It should not be necessary to develop and maintain single, monolithic collections of material any more. The Web provides the power to distribute material among a number of places because all the material should be searchable and recoverable, no matter at which site it exists. There are a few examples of such information resources which include electronic journals, databases, and bibliographic services which are changing forever the methods working with the data. One such example is the field of astronomy where the American Astronomical Society has been publishing their journals in electronic form, starting four years ago ( Boyce and Dalterio, 1996, Boyce, et.al., 1997, Boyce, et.al., 1998 ). The developments in astronomy are instructive in showing the potential of effective electronic delivery of information. The results are amazing, and the feedback from the community is instructive.
First, let us consider the system of scholarly information dissemination as it had come to exist before the advent of the WWW. In many disciplines, particularly in the Science Technology and medicine (STM) area, the body of knowledge within a discipline is carried in a number of scholarly journals, each composed of articles which refer to previous articles, often in other journals. Tracking down all the relevant work on a given subject was often a time-consuming task. Still, the system of journals printed on paper provided a workable record of the state of knowledge and progress within a field.
The publishers of the vast number of scholarly journals have provided a number of services which bear enumerating. As well as disseminating information as widely as possible (or as widely as the paper paradigm will permit), good publishers have traditionally provided:
* a system for accessing information,
* clarity and effectiveness of presentation
* a system of standardized exposition
* production of material in a long-lasting format.
All of these services have enhanced the ability of scholars to disseminate their results, locate and identify information relevant to the topic of interest, and ensure that information will be available to future generations of scholars.
Although not a new topic,(2) this does not receive the discussion it deserves. As the author has pointed out in several recent talks (see Boyce, 1999) the paper versions of the scholarly journals have traditionally served a multitude of purposes, not all of which are readily apparent.
This last function is subtle, but critically important. The journals, in effect, define how to write an original research paper; how to cite past work, describe your procedures, summarize new data or findings, and draw conclusions. In practice, the journals provide a necessary and important self-regulatory role . My colleagues often judge the importance of their latest project by asking themselves if it is a significant enough piece of knowledge that it would be accepted for publication in the Astrophysical Journal (ApJ). To draw upon another example from the field of physics, if the well known and prestigious journal,Physical Review(PR), had not existed, the well-known xxx preprint server at the Los Alamos National Laboratory would not function. Everyone who posts their work to the preprint server is writing ultimately for publication in the Physical Review -- or other similarly prestigious journal. The omnipresent example of a universally accepted set of norms keeps the quality of the preprints high, both in style and substance.
The system of scholarly information is undergoing a rapid transformation. The old system which has grown up since World War II is being modified. There is a certain movement, particularly within the university community to encourage the development of a system which will replace the current system of journals as the avenue for exchanging scholarly information. A number of suggestions along this line have been made (Harnad, 1990), (Okerson and O'Donnell, 1995) but, in general, they look at the problem from a narrow perspective; usually either from that of a librarian whose budget has been stretched beyond the breaking point by the rising costs of journals, or a researcher who is anxious to become aware of the latest information as rapidly as possible. As new models for exchanging scholarly information are being tried, let us try to do so with the goal that the new system should provide all the functions of the present system as well as opening up new capabilities.
The Web is upon us. We, as an academic community, are starting to embrace new paradigms for information transfer. I agree with Kelly (1999) that the scientific community, as a whole, has been remarkably timid in trying new methods and new tools to improve our ability to disseminate, find, retrieve, and use information. The new tools and capabilities should offer tremendous improvement in the distribution and exchange knowledge, if only we can use them to advantage.
Astronomy provides one example of the power and synergy which result from linking the whole system. We now have four years of experience in publishing a thoroughly linked, scholarly journal on the Web (Boyce, et al.1997). Our journal has a number of advanced features such as versions in HTML (for reading on the screen) and PDF (for printing out), data tables in machine readable format, and video clips where appropriate. But, from the user's standpoint, by far the most important feature is the abundance of links incorporated into the HTML version.
The links to references in past papers and to future papers which cite the present article are consistently rated by our readers as the most important feature of the electronic version.
The links to past references are an obvious need, paralleling the normal practice in paper journals, with the difference being that electronic links are nearly instantaneous, bringing the abstract (and even the full text) of the referenced article directly to the reader's screen within seconds. But, the electronic journal need not be content with simple backward links. For the last three years, we have been including links to the future papers, a service which can not be duplicated in the world of paper journals.
But this was just the beginning for us. The well established astronomical databases - which are organized by the names of stars, nebulae, clusters and galaxies, have, for the last fifteen years, included the references to the literature where the data items were published( Genova 1998 , ). It has proven simple to incorporate links from the databases to the electronic versions of the articles, and vice versa, from the journals directly to the databases.
Searching for information is of critical importance, so with support from NASA, astronomers have built a searchable database of abstracts covering 150 years of the core literature in astronomy starting with the first issue of the Astronomical Journal in November, 1849. Eichorn et. al. (1998) have described this service, the Astrophysics Data System (ADS)). Note that since we are managing the 1900 transition successfully, we are, of necessity, Y2K compliant.
But having titles and abstracts is not enough for most users, so the ADS has nearly completed the task of scanning and making available the historical or "legacy" literature. The major difference between this effort and JSTOR is that the historical astronomical literature in the ADS has active backward reference and forward citation links which knit the whole system together for seamless use. The historical literature is linked both forward and backward in time with the new electronic journals being published for the AAS by the University of Chicago Press.
>The concept behind the ADS is not unique. Medicine has the same facility, with the PubMed database making the links. In the case of astronomy, all of the core literature is available on line, and so are the majority of the important databases. The coverage is more complete than for any other field of which I am aware.
To summarize, many of the electronic astronomical journals are linked to each other directly, and all are linked indirectly through the ADS abstract system. The ADS also provides search capability and links to abstracts and full text, whether in the electronic journals or the scanned page images of the historical literature within astronomy. With NASA support, this collection is available for free. The ADS also provides the links from abstracts to the machine readable data tables which reside in the online astronomical databases ( CDS, NED, ADC), which can be searched by astronomical object. We call this system of protocols and links Urania, It is not a collection of objects, it is the underlying, enabling protocols - an important distinction. No need to point out that Urania is entirely a Web Creature which could not function effectively it were to be created on paper. The unique characteristic of astronomy's system is the tight interlinking among all the distributed information sources.
There are three keys to making this system work:
1. Common, open standards for naming digital articles.
2. Name resolution for robust linking and for managing mirror sites. For lack of any broand standard for identifiers, astronomy developed their own, interim, standard identifier (Scmhitz, et. al., 1995) at least ten years ago.
3. Significant cooperation on the part of all the adherents, including the willingness on the part of single organizations to compromise in order to make the whole system function more effectively.
Figure 1 demonstrates that the interlinked astronomical information system can be entered at many points, as illustrated by the red arrows entering from the left. One can browse the journals -- jumping to the abstracts of the references and forward citations, then reading the full text, or going to the relevant data in the online databases as shown with the green arrows.
One can search the abstract collection, get to the full text, the online journals and the data. Or knowing the reference, one can go directly to the historical collection -- full page images of all the core journals in astronomy -- most of whose citations are linked to the abstracts and full text of the referenced work.
Or, one can enter the databases by the name of the astronomical object of interest, retrieve the published data on that object and link immediately into the articles where the data were originally published. One of the great tools -- particularly useful in this form for astronomy, is the ability (now only in prototype form) to search over a huge collection of data for a list of all objects which meet certain characteristics (e.g. are in a certain region of the sky, and are brighter than a certain magnitude, but emit a large amount of X-ray energy, and have more than the expected amount of infrared radiation). This capability for mining the existing databases and catalogs (see Ortiz, et. al., 1998) in order to discover new members of a class of objects is changing the way astronomers do their research. The time which used to be spent on tedious literature searches can now be used more productively, converting this information into real knowledge about the universe.
One can consider the Urania information system, as powerful as it is for users, to be a first step; a prototype of one of many such collections of even more sophisticated interlinked information resources which I expect will appear in the future.
The successful set of preprint servers at LANL, was started by Paul Ginsparg, (1996) a theoretical physicist himself, who was unsatisfied with the failure of the traditional publishers in physics to move rapidly toward methods of electronic publishing. Within the disciplines of physics, astronomy and math, the preprint servers are very popular. They provide a means of rapid communication of the latest results, and secondarily provide an overview of what people are working on. In other words, they are providing the first two functions of the traditional journal, news about the field and rapid notification of recent results.
Although it is the desire of Paul Ginsparg (1994) to have the preprint servers supplant the traditional scholarly journals, there are a number of reasons why this probably will not happen. In astronomy, the community has shown a willingness to use the preprint servers to stay abreast of breaking developments and, at the same time, to use the electronic versions of the established journals as the repository of of the core knowledge of the field and to validate the reputation of the authors. Our experience seems to indicate that both systems, existing side by side and even, hopefully, working together is the choice our community of users is making.
There are a number of reasons why the scholarly journals, at least in astronomy, should be the vehicle of choice for authors. First is the fact that only the journal articles become part of the whole distributed and linked information system. While the articles in the preprint servers can, and do, refer to each other the system is more cumbersome than in the journals, and can depend upon the willingness and skill of the author to make the links. Searching in the preprint server is by author, title and keyword, whereas the journals offer full text searching and the ADS offers searching of the full published abstracts. In fact, the ADS now provides full text searching of the abstracts of the preprints on the xxx preprint server, with links directly into the preprints, but scarcely anyone is using this service, preferring to search the published abstracts.
But even more important is the problem introduced by multiple versions of an article. Which version of the
article does the ctation you see in a paper refer to. It is confusing rather than helpful when a criticism of an article is rendered moot because the author has changed the article after the criticism was pwritten and posted. The preprint servers added version control after this began happening, but, to my mind, it still
leaves the preprint server as a mechanism for rapid communication, but not for archival storage of one's work. Archival? The preprint servers call themselves archives, but the formats used to prepare the articles are not
archival in nature. They are servers of information, not true archives. PDF is one format used widely, but it
can not be claimed to be truly archival, or even capable of being read twenty years from now. Many authors
use LaTeX to prepare their articles, another format which can not claim to be of true archival quality. All
electronic information will have to be migrated to new forms as the software and hardware used for reading
them evolve. Perhaps LaTeX and PDF will be translatable because they simply represent page images of
material. That is the only hope for survival of material stored in those forms. To my mind, this is not a sure
thing. And who will pay for the translation? But, the underlying reason I have for not supporting the preprint servers as repositories of any but transient
information is the underlying philosophy of treating articles as separate (and virtually independent) entities
composed of page images. Even though the preprints are being delivered electronically, the concept of the preprint server is rooted in the
old thinking which derives from the world of the paper journal. It is clear from the experience with the
astronomical information system that the world of electronic information will be vastly different five years from now. Articles
are no longer independent entities. They are tied to other articles forward and backward in time, and soon to
pieces of other articles such as data tables, or video clips. Already, many articles are closely tied to various
databases. Eventually, they will be tied to information sources which we have not yet envisioned. I find the format and
the philosophy of the preprint servers, as they now exist, to be more rooted in the past than one might
expect; hence, relatively uninteresting except for serving as the scientists' "electronic hallway" where the latest information is exchanged. But, of course, Paul Ginsparg has shown a remarkable capacity to turn technology to the advantage of the
users of information. And this is a hopeful sign for the continued evolution of the electronic information dissemination system. Having demonstrated what can be done in the way of effective information dissemination, let us examine
what is lacking in the way most scholarly information is now being distributed on the Internet. With authors
able to post their work directly on the world's "bulletin board," many of the tasks performed by the
traditional publisher are being attended to poorly, if at all by most purveyors of information. Much"self-published" material on the Web is poorly presented, is not part of a stable system of interlinked
information, can not be easily found, and certainly will not last or be readable a decade from now. The
consequences of this failure to produce locatable, information on the Web, in a format which can be
maintained, with links to relevant information and data is already making the Web an inefficient medium
for transferring information. This problem is exacerbated by the single-minded and complete dependence of most young people upon the
electronic resources. Their motto is, "If it isn't on the Web, it doesn't exist at all."
(Stevens-Rayburn, 1998)
Let us look at some of the aspects which go into making Urania a success. 1. I already mentioned the links as being the number one feature of our electronic journals. Insertion of links
only becomes possible if it can be accomplished nearly automatically, i.e. at very low cost. The adoption of
a standardized name for each article is critical to the ability to link to backward references and forward
citations nearly automatically. 2. Using the electronic capabilities of the ADS, it becomes feasible to check each link for accuracy during
the production process. By doing so we have significantly reduced the number of wrong references in both
the paper and electronic editions of our journals. The same result has been reported by other publishers at a recent workshop on linking
(NISO,1999).
3. Links in astronomy work reliably, a trait made possible through the use of a simple system of name
resolution - in concert with the standard logical identifier. Name resolution not only helps in the intelligent
use of multiple mirror sites to reduce transmission delays, but it also ensures that the links among
astronomical resources will remain viable over time. 4. All of the world's major astronomical journals have adopted a uniform set of keywords. Part of the
editorial process includes the assignment of article keywords. Such a controlled vocabulary greatly
facilitates the use of search engines and other information discovery techniques to locate relevant
information for users. 5. The very existence of a structured set of information resources both facilitates the interlinking process -
the establishment of multiple pathways by which to arrive at a particular article - and also encourages
authors to consider the addition of specialized links as they create their article. 6. Finally, and most importantly, nearly all of the material prepared for the scholarly journals in astronomy,
is prepared in a robust, richly tagged SGML format which can be migrated to a new standard format, and so
maintain the archive, through the use of a program of automatic translation At one stroke, this transforms
the chore of maintaining an electronic archive from an impossible task to one which can be accomplished
within the current operating budget of an a journal. In our experience, we have rederived the publicly
distributed instances of the AAS journals about every eighteen months, just to keep up with the advancing
technology of the browsers. And , because of the redesign of our publishing process and our reliance upon
well implemented SGML, it has been remarkably inexpensive to do so. We expect that managing our
electronic archive will consume much less than one percent of our operating budget. For this, we will have
a journal which will remain accessible into the indefinite future. Nowadays, it is even more imperative to use the Web effectively, not just to advertise the author's presence,
but to ensure that the author's information can be found, will be part of a broad system of electronic
information, and will survive into the future. The tasks fulfilled by the traditional publisher, which have evolved and matured over centuries, to make an
effective paper-based information system are still required. Yet, it is clear that the processes and materials
of the traditional publishing are changing. In the long run, the traditional structure of information itself, are
being revolutionized. Whether the traditional publishers can adapt to using the new medium and new
formats of information exchange is very much an open question. Some STM publishers are adapting well to
the new era of distributed, interlinked information,. Others are not. In any case, authors should be aware that simple posting of material to their own Web site is not an effective
method for information transfer. The material is much less likely to be found by an interested reader, and
the longevity is certainly not sufficient for scholarly work. Posting to a preprint server is only slightly better. (2) The author first became conscious of the importance of considering the broad set of functions of a journal
while listening to a talk by Washington Taylor (http://publish.aps.org/EPRINT/KATHD/taylor.html) at a
workshop on Electronic Preprints held at Los Alamos, Oct. 14-15, 1994.
Boyce, Peter B.(1999) Electronic Scholarly Journals, A Talk given at Université Louis Pasteur
[back to text]
Boyce, Peter B. (1998)
Urania, a Linked, Distributed Resource for Astronomy , Published in Library and Information Services in Astronomy III, ASP Conference Series, Vol. 153,
Boyce, Peter B., et. al. (1997)Electronic Publishing: Experience is Telling Us Something, Serials Review, 23,1 1997(As Submitted)
[back to text]
Boyce, Peter B.and H. Dalterio (1996)Electronic Publishing of Scientific Journals, Physics Today, 49,42 1996
[back to text]
Eichhorn, Guenther, et.el. (1998)
The Astrophysics Data System , Published in Library and Information Services in Astronomy III, ASP Conference Series, Vol. 153,
Genova, Francoise, (1998)
The CDS Information Hub, Published in Astronomical Data Analysis Software and Systems VII
ASP Conference Series, Vol. 145, Editors: R. Albrecht, R. N. Hook and H. A. Bushouse
[back to text]
Ginsparg, Paul, (1996)
Winners and Losers in the Global Research Village A talk given at the Unesco-ICSU Press Expert Conference on Electronic Publishing, Paris, France, 21 Feb, 1996
[back to text]
Ginsparg, Paul, (1994)
After Dinner Remarks , A talk given at a
workshop on Electronic Preprints held at Los Alamos, Oct. 14-15, 1994
[back to text]
Harnad, Stevan (1991)
Scholarly Skywriting and the Prepublication Continuum of Scientific Inquiry,
Psychological Science 1: 342 and reprinted in several places
[back to text]The current system: what's missing?
[Back to Contents]
Conclusion
[Back to Contents]
Notes:
(1) The author is currently a visiting professor at the
Centre de Données astronomiques de Strasbourg
Université Louis Pasteur, Strasbourg, France
References
Editors: U. Grothkopf, H. Andernach, S. Stevens-Rayburn, and M. Gomez
Electronic Editor: H. E. Payne
[back to text]
Editors: U. Grothkopf, H. Andernach, S. Stevens-Rayburn, and M. Gomez
Electronic Editor: H. E. Payne
[back to text]
NISO (1999) Workshop on Linkage from Citations to Electronic Journal Literature [back to text]
Okerson, Ann and James O'Donnell (1995) Scholarly Journals at the Crossroads: A Subversive Proposal for Electronic Publishing, Web edition of book published by ARL [back to text]
Ortiz, Patricio, et. al.(1998) Astronomy: Data Mining , A talk given at Astronomy Data Analysis Software and Systems Conference VIII (ADASS VIII) , Urbana, Illinois, 03 Nov, 1998 [back to text]Schmitz, et. al. (1995) NED and SIMBAD Conventions for Bibliographic Reference Coding , Published in "Information & On-line Data in Astronomy", D. Egret & M.A. Albrecht, Eds., Kluwer Acad. Publ., 259. [back to text]
Stevens-Rayburn, Sarah, (1998)
Electronic Information Resources - Myth and Reality , Published in Library and Information Services in Astronomy III, ASP Conference Series, Vol. 153,
Editors: U. Grothkopf, H. Andernach, S. Stevens-Rayburn, and M. Gomez
Electronic Editor: H. E. Payne
[back to text]
(ADC) Astronomical Data Center
(ADS) Astrophysics Data System searchable abstract database for the astronomical literature
(ApJ) The Astrophysical Journal is published by the University of Chicago Press for the American Astronomical Society
(CDS) The Centre de Données astronomiques de Strasbourg
(JSTOR) JSTOR is a journal archiving service
(NED) The NASA Extragalactic Database
(PR) The Physical Review is published by the American Physical Society
(Urania) Urania is the worldwide set of distributed, interlinked information resources for astronomy
(xxx) The Los Alamos preprint server