Metadata

Metadata
http://home.wlu.edu/~blackmerh/acs2004/metadata.html
(prepared for an ACS Summer Workshop
"Planning Digital Collections for Education and Research"
Southwestern University 25-26 June 2004)

I want to disclaim expertise in these areas, and claim instead the honorable mantle of a student of emergent technologies generally, and digital information systems in particular. I continue to discover and read and try to absorb the implications of a wide range of materials, and I'm perpetually at or beyond the edges of my own safety/comfort zones. In two weeks or two months I'll probably see many of the issues differently...

Metadata

Gail Hodge cuts to the chase: Metadata is structured information
that describes, explains, locates, or otherwise makes it easier
to retrieve, use or manage an information resource.
(http://www.niso.org/news/Metadata_simpler.pdf)

Many writers have been exquisitely boring on this subject, but it's really pretty interesting once one gets past the "data about data" cliché, and into the practicalities. On the way, let's take a detour through Wikipedia's entry and look at the bottom of the page... and compare with METADATA®. A lost cause...

I've heard several people estimate that the creation and management of metadata may amount to 80% of a project's costs (2/3 is another estimate)... so we really need to inquire into what goes into such costs. The essence is that users really want improved access, which must rely upon improved specification of the qualities and contents of items in collections: the more points of access provided and the better described the items are, and the more appropriate to the searcher's purpose the granularity of metadata is, the more likely that a searcher will be able to locate relevant materials (see Roy Tennant on Granularity, from Library Journal).

Other elements in the cost:

insuring consistency requires specialist catalogers: not an add-on to an existing job!
essential to create simple rules and clear workflow paths
need to collect use metadata (log analysis, usability studies)
metadata maintenance is perpetual

The world of metadata is awash in heiratic knowledge and specialized terminology, and decoding the acronyms in this realm is a constant challenge. MetaMap from U. Montréal is a lovely example of presentation of complex information, and is actually helpful in getting one's bearings in the landscape.

Metadata is often invoked as the essential for orderly collections, especially for those that will be searched by "users". It can be argued that a collection that will only be used by its creator doesn't need explicit or orderly/systematic metadata... but it's remarkable how often something that begins as a private collection grows into a public resource, and in any case it's a good thing to learn a few of the basics of metadata examples that are being talked about pretty widely.

Different Types of Metadata and Their Functions
(from Anne J. Gilliland-Swetland "Defining Metadata")
(http://www.getty.edu/research/institute/standards/intrometadata/2_articles/index.html)

Type Definition Examples

Administrative Metadata used in managing and administering information resources - Acquisition information
- Rights and reproduction tracking
- Documentation of legal access requirements
- Location information
- Selection criteria for digitization
- Version control and differentiation between similar information objects
- Audit trails created by recordkeeping systems

Descriptive Metadata used to describe or identify information resources - Cataloging records
- Finding aids
- Specialized indexes
- Hyperlinked relationships between resources
- Annotations by users
- Metadata for recordkeeping systems generated by records creators

Preservation Metadata related to the preservation management of information resources - Documentation of physical condition of resources
- Documentation of actions taken to preserve physical and digital versions of resources, e.g., data refreshing and migration

Technical Metadata related to how a system functions or metadata behave - Hardware and software documentation
- Digitization information, e.g., formats, compression ratios, scaling routines
- Tracking of system response times
- Authentication and security data, e.g., encryption keys, passwords

Use Metadata related to the level and type of use of information resources - Exhibit records
- Use and user tracking
- Content re-use and multi-versioning information

Type	Definition	Examples
Administrative	Metadata used in managing and administering information resources	- Acquisition information - Rights and reproduction tracking - Documentation of legal access requirements - Location information - Selection criteria for digitization - Version control and differentiation between similar information objects - Audit trails created by recordkeeping systems
Descriptive	Metadata used to describe or identify information resources	- Cataloging records - Finding aids - Specialized indexes - Hyperlinked relationships between resources - Annotations by users - Metadata for recordkeeping systems generated by records creators
Preservation	Metadata related to the preservation management of information resources	- Documentation of physical condition of resources - Documentation of actions taken to preserve physical and digital versions of resources, e.g., data refreshing and migration
Technical	Metadata related to how a system functions or metadata behave	- Hardware and software documentation - Digitization information, e.g., formats, compression ratios, scaling routines - Tracking of system response times - Authentication and security data, e.g., encryption keys, passwords
Use	Metadata related to the level and type of use of information resources	- Exhibit records - Use and user tracking - Content re-use and multi-versioning information

(this is sometimes reduced to the three basics: Descriptive, Administrative, and Structural)

There are many metadata Standards defined by disciplines and communities of practice (Biology [NBII], Spatial Data [FGDC ], Darwin Core for natural history collections, Encoded Archival Description [EAD] --see A Guide to the Virginia Canals and Navigations Society Collection 1978-1983 (an example of an EAD record on the Web, a Finding Aid linked via W&L's OPAC), Metadata Encoding & Transmission Standard [METS]... and many others).

See an excellent Canadian Government introduction to the general subject, and Introduction to Metadata: Pathways to Digital Information (edited by Murtha Baca, Getty Research Institute)

For images, see Automated Exposure: Capturing Technical Metadata for Digital Still Images (RLG white paper by Günter Waibel and Robin Dale)

Dublin Core is probably the most commonly-mentioned metadata standard: 15 Elements, extendable ('qualified DC') and readily applicable to many domains, adaptable to others, and widely used in Web settings. Not all things to all people, but a good place to begin.

Dublin Core main pages and Projects, and some of the [seemingly] most important documents:

Using Dublin Core
DC Metadata Element Set
DCMI Metadata Terms
Some helper apps:
Dublin Core metadata editor
Type the URL of the page you want to describe... This service will retrieve a Web page and automatically generate Dublin Core metadata, either as HTML <meta> tags or as RDF/XML, suitable for embedding in the <head>...</head> section of the page. The generated metadata can be edited using the form provided and converted to various other formats (USMARC, SOIF, IAFA/ROADS, TEI headers, GILS, IMS or RDF) if required. Optional, context sensitive, help is available while editing.

Dublin Core Metadata Template
This service is provided by the Nordic Metadata Project in order to assure good support for the creation of Dublin Core metadata to the Nordic "Net-publisher" community. If you use the metadata created by this form and follow our examples, term lists and recommendations, your HTML documents will carry high quality metadata.
(see also Short and Simple Template)
DONOR metadatagenerator (also offers DONOR metadata generator "for lokal use" (tar.gz) from Netherlands Royal Library)

To translate/map from one standard to another, one may use a crosswalk:

Crosswalks: The Path to Universal Access? (Mary Woodley)
All About Crosswalks (Jean Godby)
MARC to Dublin Core Crosswalk

Dublin Core Management (Andy Powell presents three models for the way in which metadata can be managed across a Web-site and describes some of the tools that are beginning to be used at UKOLN to embed Dublin Core metadata into Web pages)

It's important to consider the uses and utilities of metadata, beyond the obvious point of the librarian's mania to create efficient and effective descriptions of items in a collection. Appropriately accessible and exposed, metadata makes it possible for searchers to find what they seek. This is true for the Web, of course: search engines look at (among other things) metatags, and appropriate entries in <meta> </meta> are a means to increase the 'findability' of Web pages, by providing fodder for Web crawlers to index. See Meta Resources from jarmin.com for a good treatment of metatags.

Another important contribution to findability is the contribution of harvestable metadata to Open Archives aggregators. Tom Whaley will show you an example from Alsos of the process of generating and exposing the metadata for a record, but the end result is easily seen by a quick search in NSDL. If I do a search for 'trinity', I get 141 hits... the first five are from Alsos, because NSDL harvested Alsos metadata.

Among the aggregators we should keep an eye upon is OAIster ...which did contain Alsos records a few months ago, but seems not to at present. OAIster is in its infancy, and in a year or two it will be much more useful as a source for material that otherwise might escape searchers and search engines. Example: a search for agroecology returns 18 hits, many from Center for Agroecology & Sustainable Food Systems, University of California, Santa Cruz ...but others from abstracts on various servers which have provided metadata for OAIster to harvest. See OAIster's Information for Data Providers

The end result is the inclusion of your metadata in our service -- a service that now has over 3 million metadata records from more than 300 institutions -- thus making your collections much more publicly available than they are at present...

The world of the Open Archives Initiative will grow very rapidly in the next few years. Take a look at Open Archives Initiative Metadata Harvesting Project of the University of Illinois at Urbana-Champaign, and specifically their Digital Gateway to Cultural Heritage Materials (try a search for 'cento'). And see Roy Tennant's The Expanding World of OAI (

Library Journal February 2004).

See also BioMed Central's OAI page and their materials on data mining in the sciences.

Bottom line: anybody building a digital library should be thinking about what's required to make metadata harvestable --and harvest-worthy.

Collection examples relevant to various participants/projects:

American Memory photograph collections, and others
(Search Keywords for Black-and-White Photos, Browse the Subject Index , Creator Index , Geographic Location Index --see search interface, and see also All Collections search interface
how the LoC cataloged the Berliner Collection ("Emile Berliner and the Birth of the Recording Industry is a selection of more than 400 items from the Emile Berliner Papers and 108 Berliner sound recordings from the Library of Congress's Motion Picture, Broadcasting and Recorded Sound Division)

Pictures of World War II (National Archives and Records Administration) "Pictures are listed by subject and campaign", and there's no search capability
Aerial Reconnaissance Archives ("The searchable catalogue will be launched in the future at a time when we can more accurately predict sustainable levels of demand and install appropriately sized hardware and software to fulfil this. We apologise for the inconvenience.")
Photograph Archives J. Willard Marriott Library, University of Utah ("Over 900 Collections are now accessible online! KEYWORD SEARCH our digitized photographs..." --alphabetized and browsable by subject)
Information systems of interest to Drama and Speech Communication from U. Waterloo (N.B.: Theatre Image Collections Online) --see also another Theatre image collections online from Theatre and Drama resources, WWW Virtual Library
United Methodist News Archives (searchable, but no browse function) and Archives of DePauw University and Indiana United Methodism

Bryan Alexander contributed this example of a digital library project, a nice example of an in-process project:

The Online Burma/Myanmar Library (http://www.ibiblio.org/obl/) is a database which functions as an annotated, classified and hyperlinked index to full texts of individual Burma documents on the Internet. It also houses a growing collection of articles, conference papers, theses, books, reports, archives and directories on-site (e.g. the 17MB archive of the Burma Press Summary). The Librarian requests help from specialists to refine the structure and add content.
Need for the Library
The Internet currently holds more than 100,000 Burma-related documents, from short news items to complete books, scattered over more than 500 websites (not all of which have internal search functions) run by the UN system, governments, academic institutions, media sites, listserv archives, human rights and other NGOs, activist groups and individuals. The volume is growing rapidly as more and more organisations choose to publish on the Internet. Even using modern search engines it is difficult and time-consuming to research this widely-scattered material. There is clearly need for a central index.
Structure
This is what the Online Burma/Myanmar Library seeks to provide. Launched in October 2001, it is organised on a database (using MySQL software, in combination with PHP) into 60 top-level categories based on traditional library classifications, with a hierarchy of some 850 sub-categories. These hold approximately 4000 links (mostly annotated, with keywords and descriptions) to individual documents, and about 400 links to websites which in turn give access to another 100,000 or so documents...

Looking Ahead

I seem always to accumulate piles of books and papers that are fraught with portent for the future, and the 'digital library' piles certainly outrun what I know enough to talk about in mid-June 2004. Here I'll mention some subjects and materials that I'm in the process of trying to integrate, largely as a heads-up for topics that will surely assert themselves in the near future.

Extracts from Primary Multimedia Objects and 'Educational Metadata': A Fundamental Dilemma for Developers of Multimedia Archives (Paul Shabajee, from D-Lib June 2002, and the source of the Rafflesia image that started me off)
Personalisation and Recommender Systems in Digital Libraries: Joint NSF-EU DELOS Working Group Report May 2003 Jamie Callan et al.
Digital libraries must move from being passive, with little adaptation to their users, to being more proactive in offering and tailoring information for individuals and communities, and in supporting community efforts to capture, structure, and share knowledge...
...if the library can tailor its services and materials for a wider range of users, the impact and utility of the library is magnified greatly. The next generation of digital libraries must support a wide range of personalized services that support the activities of a wide range of users.

If there's a single resource to keep an eye on, it's probably D-Lib Magazine,11 issues a year and all available online and free. It's there that I found a pointer to the DigiCULT article linked above, which was a real eye-opener: many pennies dropped and realizations dawned. DigiCULT, which "monitors and assesses resarch and technological developments in and for the cultural heritage sector in Europe" is itself a valuable site to visit and revisit.
Another organization to know about is ERPANET (Electronic Resource Preservation and Access Network), a European consortium --see, for example, XML as a Preservation Strategy (report of an October 2002 workshop)

In the last year's reading, I encountered many items that bear directly (or sometimes almost directly) on issues of planning digital libraries. Here are the most essential of them:

Roy Tennant's new Managing the Digital Library is a collection of his columns from Library Journal... but it's also a marvelous entrée for the subject. See his recent columns ...and links to most of the original versions of the (sometimes extensively rewritten) sections of the book.
Environmental Scan: Pattern Recognition (OCLC [2003]) seems essential big-picture reading:
"...produced for OCLC’s worldwide membership to examine the significant issues and trends impacting OCLC, libraries, museums, archives and other allied organizations, both now and in the future. The scan provides a high-level view of the information landscape, intended both to inform and stimulate discussion about future strategic directions."
The most significant challenge facing academic libraries undertaking these institutional repository projects is not technical, however. The major challenge is cultural. Too few initiatives include all the stakeholders --faculty, library staff, IT staff and instructional designers [to say nothing of students!!]-- and there is no common view of what an institutional repository is, what it contains and what its governance structure should be. Faculty have rarely involved librarians in developing teaching materials, digital or otherwise, and have not routinely made these available within the library infrastructure. Librarians have not routinely created metadata for such material. (64)
A Survey of Digital Library Aggregation Services (Martha Brogan, CLIR [2003])
The Digital Library: A Biography (Daniel Greenstein and Suzanne E. Thorin, CLIR [2002])
Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age (Clifford A. Lynch, CNI [2003])
The Open Archival Information System Reference Model: Introductory Guide (Brian Lavoie, OCLC Technology Watch Report [2004])
Deep Infrastructure Supports Digital Library Services (Paul Conway, Duke University [2004])
Duke University has recognized the strategic importance of the digital library as a change agent... Digital libraries may prove to be tremendous forces for needed change in teaching and learning and, particularly, for the transformation of the roles that traditional libraries play on and off campus. Duke University is embracing digital library services as a strategic mechanism for advancing deep information technology infrastructure on campus.
...The Digital Library @ Duke is a major component of the university’s overall strategic plan... “…seizing the opportunities of new technologies to enhance traditional resources and services and to build new roles for the library, presenting it as the resource of first resort for scholars and as the shared intellectual center of the university” ...
...the digital library is conceived as a resource environment, accessible through computing tools in buildings on campus and on individual desktops on and off campus. Duke’s digital library program becomes the essential mechanism for uniting people and ideas and presenting information that lives across the full spectrum of storage media.

Interoperability between Library Information Services and Learning Environments -- Bridging the Gaps (Neil McLean and Clifford Lynch, CNI and IMS Global Learning Consortium [2004])
...recently... libraries have been investing more heavily in bringing other materials such as digitized rare and historical materials, and institutional research and learning resources, into the distributed information environments. There is growing acceptance that simply making resources available on the network without an additional layer of services may not be very effective... resources are made available at interfaces with low levels of interconnectedness between them. This in turn puts the burden of interconnection back on the user, and it means that in many cases the potential value of interconnection is not realized. (3)
...within any information process, it may be necessary to interact with several services which do not coordinate their activities. Until recently, these services have been conceived and designed as standalone systems, rather than as parts of a fabric of information resources on a network. So, for example, there are services which allow people to discover the documents of interest to them, and there are packages which can format requests for dispatch to such services. These may not be linked up in such a way that the end-to-end process can be automated. (4)
...while the inclusion of learning objects in library collections is one issue, there is a large disconnect between the traditional focus of the e-learning community on these typically relatively small objects and the growing need to collect, archive, and repurpose much larger and more complex objects at the level of a collaboration or a course...(5)
Until recently, most learning and information content was tightly bound in learning management systems [LMSs]... transparent links between library systems and learning management systems have been rudimentary... Much of the current thinking is based on a fairly library-centric view of being able to "push" information resources into the LMS. There has been little thought given to the learner activity perspective where the learner may wish to draw on any number of information resources either prescribed, or of his or her choosing, at any given moment in the learning activity. There is a need, therefore, to develop more innovative use scenarios in order to map the dynamic functionality required in a "pull" runtime environment. (6)
In essence, academic institutions are only just beginning to grapple with the implications of developing the digital campus that includes the two important concepts of digital information management and e-learning management. Central to both of these key management challenges is the need to organize and manage the creation, flow, and use of content. In most institutions content is managed in silos that have little institutional interoperability... (8)

For an example of integrated digital library development, see About Digital Libraries from U. Minnesota --see especially University Libraries Metadata Core