Digital Libraries

N.B. that this page is 'historical'. More current material, tracking continuing work in this area, can be found in my 2003- log file

Computer Science 402 (  Course Home Page  )
Spring term 2001

This is where I'll keep track of my version of what we're doing in the course. I've been accumulating a  Log File for a couple of months, and I'll continue to add to it as I find material.

Here's the Quintessence of the Problem with which we're wrestling:

...after taking approximately 300,000 years for humans to generate 12 exabytes [an exabyte is over 1 million terabytes or a million trillion bytes] of information, the next 12 exabytes will be accumulated in just two and a half years... only about 20 perscent of the world's data resides in relational databases; the rest is in a combination of flat files, audio, video, prerelational, and unstructured formats --not to mention the mountains of paper-based data just waiting to be digitized. The result of incorporating all these different data types and sources is that data management is changing into a broader category of managing content that includes all data types... (InfoWorld 16 April 2001, pg 42)
What to read? Where to begin?
...always an interesting problem, and perhaps it's more important to think about how to keep track of what one encounters. General introductions that I found useful:
Bruce R. Schatz  Information Retrieval in Digital Libraries: Bringing Search to the Net (Science Volume 275, Number 5298, Issue of 17 Jan 1997, pp. 327-334) is a concise review of historical developments.

Ben Schneiderman  Codex, memex, genex : The pursuit of transformational technologies (International Journal of Human-Computer Interaction 10:2 [1998], pp. 87-106) and Multi-dimensional information visualizations from  On-line Library of Information Visualization Environments and a large collection of Schneiderman's  CS technical reports from U. Md.

Some classics that everybody should encounter:
Vannevar Bush  As We May Think (The Atlantic Monthly Volume 176, No. 1, Issue of July 1945, pp. 101-108)

JCR Licklider  Man-Computer Symbiosis (IRE Transactions on Human Factors in Electronics March 1960, pp. 4-11) and The Computer as a Communication Device (Science and Technology April 1968)

Information Central for the course may be the  NSF Digital Libraries Initiative home page

...and a local project: Building a Digital Library of Spatial Data and Images for Rockbridge County

Vocabulary is an interesting part of any exploration of a subject. We need to collect and come to agreement about the meaning and range of an open-ended set of terms that are used in the literature and discourse of digital libraries. I think we should collect this Vocabulary systematically, realizing that definition is just the beginning, and that understanding how terms are used is the real objective. Can we contrive a way to build a dynamic glossary into the course, to contain expositions of terminology as it comes up?

Consider the example of the word informatics, which seems to crop up frequently. We can interrogate various online sources to get an idea of senses, of frequency, of context, of evolution... and each gives us different kinds of answers and provokes different questions.

Here's what the OED says:

informatics informæ;tiks. [tr. Russ. informátika (A. I. Mikhailov et al. 1966, in Nauchno-tekhnicheskaya Informatsiya XII. 35), f. information: see -ics.] (See quot.1967.) Cf. information science (information 8). Hence infor'matical a., informa'tician.

     1967 FID News Bull. XVII. 73/2 Informatics is the discipline of science which investigates the structure and properties (not specific content) of scientific
     information, as well as the regularities of scientific information activity, its theory, history, methodology and organization.

     1970 Times 2 Sept. 9 It was agreed..that an introduction to Informatics should form an integral part of general education.

     1972 Jrnl. Librarianship IV. 177 The name Informatics satisfies several criteria for the designation of a new discipline.

     1972 Jrnl. Librarianship, IV. 177 Other terms can be derived from it, such as Informatician for a person who is engaged in activities in this field..and the
     adjective informatical, to describe the attributes of the field.

     1973 Times Lit. Suppl. 28 Sept. 1133/1 The problem falls into two parts: the preparation of decisions, which is a matter of informatics, and the making of the decisions themselves, which is a matter of `politics'.

...but if we do a search of the Web with, the vast majority of links are to sites concerned with 'medical informatics' (though other compounds certainly occur: 'social informatics', 'driving informatics', 'legal informatics').

A search of Annual Reviews turns up 28 occurrences of 'informatics' (the first in 1995, in AR Anthropology and AR Public Health) and 41 of 'bioinformatics' (the first in 1996, but most in the last two years).

SciFinder Scholar (Chemical Abstracts) finds 3967 occurrences --the oldest in 1970, but the real growth from 1986.
Science has 82 occurrences of 'informatics' since 1995.

JSTOR finds the first occurrence of 'informatics' in Science to be 1979 (a company name, in an article's bibliography), and 'bioinformatics' in a 1990 article on the human genome project.

 William Arms Digital Libraries (Z692.C65 A76 2000) might have been a text for the course.

A link to the original organizational scheme for  Roget's Thesaurus (with links to the text as well), as an example of a schema for "all human knowledge"

...and my take on  types of digital libraries

 A clumsy and preliminary summary of findings for Assignment 2

 My candidate for a Digital Library : JSTOR

Important information sources for Projects:

 D-Lib Forum and  ACM Digital Library and  IEEE Digital Library
and a Memex-inspired site:  Memex and Beyond from Brown