Information and the Undergraduate Science Student

This started out with a request that I talk about validity and student searching --the pitfalls of the Web as an information source, and what we do about teaching students about research literatures. Thinking about those issues led me to the necessity to consider some wider problems in information access. So I want to engage you in thinking about problems great and small, which do in fact center on the teaching of undergraduate science courses.
We live in interesting times. Scientific publishing is booming, as measured by the number of papers being published in scientific journals, or by the sagging shelves in library periodical collections. Web-based search capabilities and electronic delivery of full text have vastly increased accessibility of specialized literatures in all fields. Lots of journals have online versions, many augmented with multimedia links. Search engines like AltaVista point us toward millions (indeed a recent estimate says a trillion) of pages of stuff on the web. An embarassment of riches.

So what's wrong with this picture? For the sake of argument, I'll suggest that these riches aren't reaching undergraduate students of science.

Faculty decry the superficiality and deficiencies of judgement in student use of electronic resources, and for their part, students are confused by the array of choices they must navigate on the way to finding what they seek.

It seems to me that, despite the riches, even science students aren't generally very interested in 'science news', aren't motivated to seek out current knowledge on the frontiers, don't include lifetime patterns of knowledge acquisition as a part of their perception of what needs to be done to become a scientist.

And I'll further suggest that college faculty aren't (generally) making very good use of the riches either, for their own research and teaching. To some degree this is a challenge for librarians, who are the primary gatekeepers and (often) the main instructors in the intricacies of searching and retrieval, but there are also some structural problems with the edifice of scientific literatures, and somebody has to start thinking about ways to address them.
A good part of the problem at the undergraduate level is one of parallel universes: in most science disciplines, courses are textbook-based --which is to say that they are not literature-based. Most of the primary literature is inaccessible to most undergraduate students, and research frontiers are pretty much irrelevant to the problems of lower-level courses, which have to concentrate on building firm foundations for those who will go on to advanced study, and have no time to linger over the general education of non-majors.

On the other hand, the tertiary literature of articles in Scientific American and the non-primary parts of Science and Nature (which are written at a level that is generally accessible to undergraduates and might seem appropriate for the general education context) don't often reach undergraduate eyes, be they those of majors or non-majors. For most departments, courses which emphasize general knowledge and the development of scientific literacy are an unthinkable luxury. The result seems to be majors who have little opportunity to develop a broad context for their disciplinary knowledge, and non-majors who may have a smattering of chemical or biological background, but have insufficient basis for understanding the broad outlines of the frontiers of scientific research, and aren't well prepared for a lifetime of intelligent consumption of news about advancing science and technology.

The secondary (review) literature should serve as the bridge to primary research literature for majors, and could broaden the context for non-majors, but most undergraduates aren't motivated to seek out review articles or to develop intellectual curiosity about the evolution of disciplines.

The picture of literature access is further complicated by the fact that a lot of what passes for information comes from the web these days, and it's common to hear tales of chicanery and uncritical use of stuff that's untrustworthy or just plain wrong. What can be done to repair this? On the positive side, the last decade has changed information access quite fundamentally, in the direction of democratization that should result in improved knowledge and access for all. What skills do students need to develop to realize these potentials?

So there are issues of how people do, could, should and might use information resources. There's also a great upheaval just about to happen in scholarly publishing, though its dimensions are pretty unclear. I want to argue that the student-information problem and the publishing problem are linked in important ways, which haven't been much explored in the somewhat panicky dialog that's been going on around the budgetary process (the "Serials Crisis", which has seen journal subscriptions cut at nearly every university). The deeper question, rarely addressed, is how to encourage more and better use of the resources to which so many budgetary dollars are allocated.

Our periodical collection makes a good case in point. The Science Library receives about 350 titles, nearly 100 of which are available to our users in online full-text form. Generally speaking, we have a title because a faculty member requested it at some point; we do serials reviews every 5 years or so, as a result of which we prune the lists. New journals get added as faculty request them (generally with some curricular justification), and we're almost unique among North American libraries in not having to cancel subscriptions to keep within budgetary constraints. So every day new issues of journals come into the library to be shelved in Current Periodicals, and every month a cartful of titles is sent out for binding, and then filed away in the stacks. The sad fact is that most of the current periodicals are never touched --some faculty and some community patrons do browse the shelves, but students are almost never seen browsing. In any case, browsing is a minor mode of information access for most people, and database searching is what leads most users to the journal articles they do use. So the skills of navigation, search, retrieval and evaluation are important to teach and develop --again, predominantly a responsibility of librarians.

How much do we really know about the use of our journal holdings? How has online full-text changed things? Our data are spotty, and the landscape of available resources changes frequently, so comparisons from year to year are perilous. We do keep track of the reshelving of bound journals, many of which are taken off the shelves to be photocopied for InterLibrary Loan. We ask users not to reshelve bound journals, and every day we scan in the barcodes of volumes left lying on tables and carts. In a year of tallying bound volumes left for reshelving, 2582 were counted:
psychology: 850
biology: 801
general: 373 (Science, Nature, Scientific American, etc.)
chemistry: 368
geology: 88
physics: 61
math: 27
computer science: 9
Psychology and Biology, which account for nearly two-thirds of the use, have between them about half of the "science majors". These two departments have courses in research and literature, for which students are required to use primary literature --the other departments don't.

...and we keep track of electronic access to databases for which we pay. The general pattern is one of increasing use of electronic access tools, though a closer look indicates that most of the database searching activity is connected with two or three courses --and the greatest use is in connection with a course I teach for the Biology Department on Use and Understanding of Biological Literature, for which students have to use a range of databases. In general we can say that there's little 'recreational' or curiosity-based use of electronic databases: students use them when they have to, and (with a few exceptions) faculty use them infrequently.

Bibliographic databases --no full text
Cambridge Scientific Abstracts Medical and Biological Sciences: June 1998 - June 1999, 920 logons [more than twice as many as the previous year], 2042 searches [more than twice as many as the previous year]

FirstSearch: 1174 searches in Science databases [948 the previous year, a 20% increase, even though CSA has replaced FirstSearch as the searching tool of choice for Biology]

Online full-text sources
Academic Press (IDEAL) journals: between Jul '98 and May '99, 3267 sessions [60% increase over last year], 1063 .pdf downloads [more than double the previous year's use]

JSTOR journals: an archiving service which serves .pdf page images from the entire run of included journal titles, has seen usage rise dramatically (more than quadrupling) from the previous year --between August 1998 and June 1999, 7332 accesses and 493 articles printed (Economics and Ecology the two biggest use areas)

Are these modest usage figures typical of liberal arts colleges?
The only easy comparative measurements I have compare
We have some measurements of use of titles we don't have, via InterLibrary Loan and document delivery statistics, and these confirm the general impression of limited use of journal literatures by most faculty and most students:
Fewer than half of the science faculty requested any ILL in 1998, and 3 of the 20 who did accounted for more than half of the ILL requests by science faculty. About two-thirds of all science ILL requests were initiated by students.

We don't use document delivery services very widely, though faculty do have ad lib. access to UnCover, to retrieve articles by FAX. Very little use is made of this service.

So in general it looks like periodical literatures are not heavily used by our students and faculty. From a dollars-and-cents point of view, is the $230,000+ we spend annually on science periodicals worthwhile? Or, to put it another way, what could be done to increase use, to get more out of the considerable investment, to insure that we commit funds to really useful resources? If it's worthwhile to put effort into increasing use, the question may be

how to raise the salience and relevance of what the periodicals contain,
but that's primarily a problem for teaching faculty, not librarians... or is it? Maybe what's really needed is a reconceptualization of what literatures are for, and how they could be connected to the day-to-day problems of teaching undergraduates --a problem that involves faculty, librarians, and the publishers and database vendors who supply the product. But where is such a reconceptualization to come from, and who would do it? These are all very interesting questions, because they gnaw at the very roots of how we teach (and learn) science, and how we integrate information into the teaching and learning.

An added complication to this question of the salience of journals for undergraduate science teaching is the rapid growth of online access to literatures. Just over the horizon is another enormous information problem, in the form of online journals. It's widely believed that scholarly journals will eventually be readily available online, often suggested (foolishly, I think) that the future will be paperless, and devoutly hoped by many that the information contained in online journals will be free --in stark contrast to the present situation, in which most journal publishers charge high (and rapidly-rising) rates for journals, most don't provide an online-only option, and/or they charge more for the online version. Various experiments are underway by publishers, subscription services, archiving organizations and even government agencies, but the pricing model of the future is still unclear. No matter what happens with journal publishers, it's clear that electronic access is proliferating rapidly, in a variety of forms, and this development presents a variety of challenges to teachers of undergraduate sciences.

Every day the NEWJOUR Digest listserv brings me announcements of a dozen or more new electronic titles (many of them online versions of print journals, but some online-only), some available only to paid subscribers. Even the most grasping of the publishers permit non-subscribers to view tables of contents and abstracts, but there's certainly no single indexing source for all, and only a few disciplines have anything as comprehensive as PubMed. So the problem is where to begin a search.

How, as a practical matter, does one teach the finding, retrieval, and assessment skills that are necessary to navigate the jungles and swamps of information? Of these, the greatest challenge is assessment and evaluation of what is retrieved, which requires that the searcher have enough perspective and basic knowledge to make judgements of relevance and content. So it's necessary to build up from general knowledge, and to help students develop their own skills as readers and writers, AND it's necessary to have better ways to develop and communicate context.

There's a real missing piece in the world of access to journal literature, a piece that could be filled in, at considerable expense, if its importance was recognized and it was integrated into the suite of information-access tools available to end users. I refer to citation indexing, specifically in the form of ISI's Web of Science, which offers a friendly interface to the Science Citation Index database.

I shouldn't assume that everybody shares my enthusiasm for this tool, but you would if you had a chance to play with it. In brief, citation indexing allows the searcher to see a retrieved article in its intellectual context, linked both to its antecedents (sources which it cites) AND subsequent work (sources which cite it). The pedagogical significance of this view is that the collaborative and interlinked nature of science is manifest; the utilitarian significance is that a searcher can quickly locate coherent specialized literatures, and can follow the influence of a source through time and across disciplinary boundaries. Citation indexing is sold as a tool for demonstrating the importance of a person's work (especially useful in the context of tenure and promotion), and quantifying the "impact" or "immediacy" of a journal title; but it is really much more important as another --and very effective-- means to search and, potentially, as an empirical basis for exploring, researching, and understanding the structure and evolution of the sciences.

The problem with this magnificent and even essential tool is that it's extremely expensive --even with consortial pricing, more than $20,000 per year for a site license for Web of Science --which would allow end users to explore the citation landscapes of fields of interest. Few colleges are likely to come up with that kind of money unless it's really clear that the results are measurably 'worth it', though many research universities have ponied up --because who-cites is recognized as critical information for researchers. I'd argue that it should be important information for undergraduate science students too, because it offers a basis for empirical study of the structure of the sciences. Science Citation Index is accessible via DIALOG, but the per-search pricing structure and complex search interface are such that it has to remain a specialist's tool, not available for the sort of end-user exploratory searching that would make it useful to undergraduates.

So here's the problem citation indexing should help us address:

How shall we think about an edifice as vast as intellectual geography of the sciences? The names of disciplines (chemistry, geology, physics...) seem to imply boundaries between subject matter which are clear and impermeable, although we know they are not. Subdisciplines are mapped crudely --oligonucleotide chemistry is a specialty within organic chemistry, which is itself a subspecies of chemistry-- and not generally analyzed empirically in terms of the structure of shared citations, or descent from intellectual antecedents. Very few people have anything like a sophisticated understanding of the research frontiers of more than a single discipline, or a handful of subdisciplines, and increasing specialization exacerbates the situation. Popularizers, generalists and science journalists serve useful functions for beginners, but rarely take a synoptic view of the terrain. Something else is necessary, an analytical approach to citation data which has as its goal a visual representation --an empirical mapping and search interface-- of the temporal evolution and present relations among research specialties.

Macramé serves as a handy metaphor if we seek to describe the literature of a subfield: the various articles reporting research are linked to others in a complex web of direct and indirect connections. Each year produces a further complexity of referential interconnection, and the overall patterns can be read if one gains the necessary perspective.

Specialists know (and know how to find out about) the research frontiers of their own specialty: they read relevant literature and keep in touch with others working in the same area through meetings and correspondence, and the apprenticeship path for advanced students in a discipline insures that this model of specialized information use will continue. But beginners and interested laypeople face different challenges: the primary literature of a specialty is simply inaccessible to non-specialists, and judging what is relevant in professional literature and reliable in popular literature is difficult.

Users of Science Citation Index usually navigate from author names or specific articles (asking who has cited a particular article), or via keyword searches, and typically aren't much concerned with the grander-scale patterns of clusters of journal titles or the degree to which sets of articles cite one another. These patterns encode the fine structure of the advance of science, and offer a means to identify subfields, chart the evolution of research frontiers, and aim the searcher at the seminal articles of a subdiscipline. Systematic mining of citation data for clusters, carried out by expert systems, could provide a basis for improvements in "find more like..." and relevancy ranking functions (and thus add value to ISI products), but the same analysis could also address the important pedagogical frontier of general science literacy.

So I'd like to see courses in Scientific Information, for majors in specific disciplines and for non-majors as well. The disciplinary courses would address questions focused more on primary and secondary literatures, while those for non-majors would focus more on tertiary and secondary literatures and interdisciplinary questions. Such courses ought to address the tools and skills of exploring, locating, evaluating and presenting evidence.

It's useful to consider the rapidity with which searching tools have evolved, and that exercise suggests the speed with which that evolution will continue. The giant steps of (the now-ubiquitous) online library catalogs and keyword searching brightened the early 1990s; since 1994 the explosion of the WWW has pushed everything to new modalities of access and new expectations on the part of information seekers. A recent article in Nature states that

Scientists are increasingly using search engines to locate research of interest: some rarely use libraries, locating research articles primarily online; scientific editors use search engines to locate potential reviewers. Web users spend a lot of their time using search engines to locate material on the vast and unorganized web... (Lawrence and Giles, Nature 8 July 1999 pg 107)
But the search engines aren't comprehensive, are not particularly up to date, and don't overlap very much. And 'success' for a searcher requires fairly elaborate strategies and careful evaluation of retrieved documents. 'Advanced mode' searching is an essential skill to avoid wading through lots of irrelevant items. "Find more like..." algorithms (PubMed is a good example) are an important addition.

So what does all this come down to? We NEED better search tools, better understanding of how literatures are structured, clearer examples of the intellectual processes of science for students to learn from and emulate. The best we can do:

The approach I have taken in 6 years of teaching Use and Understanding of Biological Literature (required of all Biology majors), for which each student is assigned a topic by a biology department faculty advisor, has been to use a range of searching tools to lead students through a process of knowledge development, starting in quaternary and tertiary literatures and moving through secondary to primary, and including the full range of electronic resources. I have always considered this as a basically expository process, requiring students to write about the process of discovery as well as to present bibliographic results in the appropriate form. In the last iteration of the course I had students present all their work in the public medium of web pages --which necessitated the teaching of basic web skills (much easier than it used to be), but built a pride in the work done (and relied upon shame as well, since students looked at each other's pages). I wish I could say that I've been successful in inspiring biology majors to personal intellectual involvement with biological literatures, but that's not what being an undergraduate is about. I'll settle for a feeling of competence with the world of electronic access, which students generally report at the end of the course. Some examples: Bio 182 web page, and Adam Yablonski's pages.

Asked about the web pages at the end of the class, 28/47 said that they had looked at each other's pages and found it useful to do so. Only two of the 47 had ever made a web page before this class.