Integrating the Pulse of the South and the Digital South

Hugh Blackmer
3 March 2003

(Death by Memo: The only way I know to work ideas out is to write what I'm thinking and then try to get others to read and respond. Not an elegant solution (because a lot of very tentative stuff gets floated and later superceded), but it's what I'm in the habit of doing. The trail is sometimes pretty long, as in the case of Links to various documents in the sequence I've written on GIS in the ACS context.)

The model I've been developing for the Digital South is one of a container into which data and information can be placed, and from which data and information can be drawn: a digital library. Some of the content is explicitly GIS-related; some has geographical coordinates or bounds, but isn't in the GIS realm in terms of format or likely users; and some is in between (the Reader Diary exemplifies this).

The critical feature and innovation is an efficient and effective means to collect metadata --to insure that resources entered into the data structure are consistently and adequately documented, so as to be in accord with the standards that allow this datastructure to interlink with others, as that becomes practical.

Just who is to establish metadata standards, and who is to build the conduits for creation and management of metadata, remains a matter for discussion and negotiation. I'm not at all sure that I understand enough of the technicalities to design the framework, but I have to start somewhere if only to identify the areas I need to explore more fully. So here's my outline of what I believe the metadata landscape to consist of:

Every format or medium has characteristics that keepers of such resources recognize as integral to "keeping". For librarians, the book is an understood medium, effectively represented by a MARC record, with numbered fields. The structure of the MARC record is readily adapted to handle other media found in libraries (DVDs, serials, special forms of 'book', boxes in archives, etc.). Nobody but librarians cares about the MARC record or its specific character, but a MARC record is the essential for a resource to find its way into an online catalog.

Approximations to MARC records can be made according to other standards, and there are procedures, like crosswalks, to move between MARC and some of these other standards. The most familiar is probably the Dublin Core record, some 15 fields that can be applied to many media and can be an intermediate step to the creation --by a librarian-- of a MARC record. A Dublin Core record can be an adequate metadata standard for other forms of retrieval that don't demand the detail of the MARC record. See NSDL metadata page for more details.

Dublin Core seems to have the status of an emerging standard, and there's also a conjunction which I don't fully grasp of Dublin Core with RDF and XML. My sense is that this conjunction is the locus of the work we need to attach ourselves to in digital library development.

The insight and vision of the Pulse of the South as outined by Jon Evans implies the digital library structure of Digital South, without specifying the connection. And the Digital South implies some level of connection with the world of MARC and online cataloging, but just what would be involved in leaping the gap still needs to be clarified.

"The gap" is an interesting thing to consider, because it's not just one gap --indeed, it's really a gradient of specificity and adherence to standards, of options and requisites, of procedures that require degrees of mediation by experts. Thus, I can name my files and folders by any conventions I care to invent or borrow. When I want to share parts of my file-and-folder world with others, it's helpful if tehre's some agreement on conventions. mWhen multiple collaborators are contributing information, it becomes necessary to define standards of naming conventions, and then enforce by consensus. When the products of such collaborations are released to larger publics, it becomes necessary to use the conventions of those larger systems.

That's really four levels, increasingly requiring adherence to standards. What's needed is a clear idea of how to move a resource from one level to another --procedures, which amount to ways to vet content, but also require the specification of forms.

I'm describing a path of convergence, the most elaborate requisites of which are defined by the fields of the MARC record. It's probably most practical to use the framework of Dublin Core as the definer of content at each level --that is, to extend Dublin Core downward to the individual collection, and to provide the means --an interface to permit the creator of an item --a file, a collection, etc.-- to start the process of building a Dublin Core record.

Some media provide interfaces to their metadata formats --ArcCatalog, for example-- and presumably there's some specifiable relationship between the Federal FIS metadata standard (see Content Standard for Digital Geospatial Metadata and Dublin Core (see Hillman FGDC to DC). I need to find out about that possible crosswalk, and also discover the intermedioate analogous steps to GIS metadata creation. XML is a piece of that, no doubt.are the data and how and under what circumstances can I access them?

So are there nice neat apps for metadata creation in GIS, or is it necessary to use/misuse the long form? There were some ArcView extensions that seemed to do the necessary, and I wonder if there aren't ways to produce intermediate-level records that will DO for the practical purposes of users, and be refinable to the canonical requirements when it's time to do that, when a particular record is 'promoted' to public status.

CAN we create XML-RDF-DC tools to apply to GIS data? Can we make them easy enough for users that it's reasonable to enforce their use in collection building for Digital South/Pulse of the South?

Another issue or two with Pulse of the South: the notion that we build toward a data structure that is space AND time is a fine one, but it's a long-run goal. The idea of beginning with a core of information surrounding each campus is very attractive, and is probably not most easily done by contributions from each of the 16 participants, but by creation of a core by a task force which can then be augmented by contributions as institutions and individuals are or can be inspired to make them.

Some examples of such a core:

What we need to achieve is uniform collections that can just be used straight-up, with no further fiddling, in ArcGIS --a place to start, with a set of clear instructions for what to do next to augment with data from other sources, AND the concrete steps to take to put the results of augmentation into a form(at) for upload to the larger collection.

The census materials should be for the states (not just the immediate locality), and presumably these and otehrs are available by pointing to remote sources from which they can be downloaded. If that was true, they wouldn't need to be repackaged and locally stored, but the necessary preprocessing may make it more sensible to include in a core collection.

We want people to be able to get to and make easy use, even desktop use, of such data, without having to know how to download and decompress and manage in personal file space. We want clear and universally applicable procedures for installing as read-only files on network drives, if the collections are to be used outside a GIS lab setting. Similarly, we need clear procedures for how to make derived materials available (thus, if a user JOINs a dataset, how to create a .shp file from the results, how to make that accessible to others, and so on).

5 March
Facilities like the 2002 National Transportation Atlas Data Shapefile Download Center and others at TranStats exemplify both resources and problem: lots available, but what to DO if one wants to get some data for use in mapping?