Candidate Profiles for UCB METS Objects

Overview of METS Profiles.  Jerry McDonough and the METS Editorial Board have developed a METS Profile schema to provide a formal means for institutions implementing METS to express the rules and encoding conventions governing their various categories of METS objects.  The hope is that lead institutions will develop and publish METS profiles in fairly short order, and that these will help encourage some consistency in METS objects across institutions.  METS objects, of course, will all have a certain consistency just by virtue of being METS objects.  But the fact is that the METS schema allows a wide lattitude in the structuring and encoding of METS objects and it also does not specify controlled vocabularies for many key attribute values.   Insofar as research libraries want to share METS objects with each other, and make them available to other communities (such as the learning technology community), they need 1) to  develop METS profiles to insure consistency among their METS objects and to publish their encoding practices to potential users of their METS objects; and 2) to align their METS profiles with the other institutions and communities with whom they wish to share METS objects and METS tools. One possible way of accomplishing the latter would be for institutions that want to share METS objects to derive their specific METS profiles from common “model” profiles.

 

Miscellaneous considerations.

Categories of METS objects. The categories of METS objects that need to be represented by discrete profiles is not immediately obvious.  It seems likely, however, that some kind of categorization on the basis of content will be appropriate.  The candidate profiles referenced from this document are primarily content based. 

 

Relationship between METS profiles. METS profiles can build on other METS profiles, and the METS profile schema provides for expressing relationships between profiles.  The referenced candidate schemas posit two basic kinds of relationships:  “extends/extended by”, and “subset”. 

“Extends/Extended by” relationship.  A profile “extends” another profile if it provides for another kind of content in addition to the content type or types represented by the extended profile. In the candidate profiles, for example, the ModelPagedTextProfile extends the ModelImagedObject profile.  The latter provides for image only content; while the former provides for both image and text content. The ModelPagedTextProfile expands certain controlled vocabularies specified by the ModelImagedObjectProfile, and provides for more elaborate structures in the <structMap>.

 

“Subset” Relationship.  A profile is a subset of another profile if it specifies all of the same parameters as the parent profile, but specifies parameters and restrictions in addition to these.  An object implementing a particular profile will also satisfy the parent profile of the implemented profile.  For example, in the case of the candidate profiles, the UCBImagedObjectProfile is a subset of the ModelImagedObjectProfile.  The UCBImagedObjectProfile is identical to the ModelImagedObjectProfile in almost all respects, but it specifies a specific set of  supported extension schemas.  Any object satisfying the UCBImagedObjectProfile will also satisfy the ModelImagedObjectProfile.

 

Preliminary Candidate Profiles for image and text content. This document references candidate profiles for METS Objects with only image content (ModelImagedObjectProfile.xml and UCBImagedObjectProfile.xml) and for METS objects with only image and/or text content  (ModelPagedTextObjectProfile.xml and UCBPagedTextObjectProfile.xml). 


 

Model profiles vs UCB Profiles. 

Model Profiles. Rick Beaubien has been working on the model METS profiles on behalf of the METS Editorial Board. These profiles are intended to establish some possible general standards for certain kinds of METS objects—specifically those with image only, or both image and text content.  The hope is that once these profiles have been completed and are registered in the METS profile registry they might help to encourage some consistency in the application of METS across repositories.  They reflect input from other METS Editorial Board members, particularly Jerry McDonough, Robin Wendler at Harvard, and Bruce Washburn at RLG.  It is anticipated that at least METS objects representing image and/or text content produced by UC System libraries, by Harvard, by RLG and by NYU will satisfy these model schemas.

 

UCB profiles. The UCB profiles are subsets (as defined above) of the model profiles, and reflect the specific encoding conventions—both current and anticipated—governing  METS objects being produced in the Library at UC Berkeley.  These METS objects include those being produced for the California Cultures project and hence represent materials held by numerous UC repositories and museums, and not just the UC Berkeley Library. The UCB profiles as written try to allow some room for change and growth, so that UCB will not need constantly either to update the profiles or to create new profiles.

 

Overview of Model Profiles.

ModelImagedObjectProfile.  This is a fairly simple profile intended primarily as a basis on which to build more complex profiles.  However, it could be used directly for METS objects which only have associated image content files.  The salient features of this profile are:

·         It provides for image content only

·         It specifies controlled vocabularies for the <file> USE attribute and the <structMap> TYPE attribute.

·         It neither requires nor excludes descriptive and/or administrative metadata.  It does not attempt to specify appropriate schemas for expressing descriptive, technical, source, rights, or digital provenance metadata.

·         It specifies that the METS <fileSec> should be organized into <fileGrp> elements that represent image content of similar format.(For example, one <fileGrp> for tiff masters, one <fileGrp> for hi-res jpegs, one <fileGrp> for medium-res jpegs, one <fileGrp> for gif thumbnails.)

·         It specifies that there will be one and only one <structMap>; and that this will be either of “physical” or “mixed”  type.  The type “physical” designates a purely physical structure.  For example:  a single image without subdivisions; an image divided into recto and verso views; a photo album divided into page views divided into views of the individual photographs on the page; a pamphlet divided into page views. The type “mixed” designates a mixed structure.  For example, a book divided into chapters, divided into page views.

·         It excludes the use of the <area>, <par> and <seq> elements from the <structMap>.  Each <fptr> element in objects of conforming to this profile must point directly its associated content file via its FILEID attribute.

 

ModelPagedTextObjectProfile.  This profile extends the ModelImagedObjectProfile.  Its salient features are:

·         It provides for image content and/or text content.

·         It specifies controlled vocabularies for the <file> USE attribute and the <structMap> TYPE attribute.  The <file> USE attribute values build on those specified in the Model Imaged Object profiles, but include additional values to cover text content.

·         It neither requires nor excludes descriptive and/or administrative metadata.  It does not attempt to specify appropriate schemas for expressing descriptive, technical, source, rights, or digital provenance metadata.

·         It specifies that the METS <fileSec> should be organized into <fileGrp> elements that represent image or text content of similar format.(For example, one <fileGrp> for tiff masters, one <fileGrp> for hi-res jpegs, one <fileGrp> for medium-res jpegs, one <fileGrp> for gif thumbnails, and one <fileGrp> for TEI transcriptions).

·         It allows for multiple <structMap> elements of any type: “physical”, “logical” or “mixed”. The type “physical” designates a purely physical structure.  For example:  a single image without subdivisions; an image divided into recto and verso views; a photo album divided into page views divided into views of the individual photographs on the page; a pamphlet divided into page views. The type “logical” designates an intellectual structures.  For example, a book divided into chapters.  The type “mixed” designates a mixed structure.  For example, a book divided into chapters, divided into page views.

·         It allows for the use of the <area> and <seq> elements in the <structMap>, but excludes the use of the <par> element.

 

Overview of UCB Profiles.

UCBImagedObjectProfile.  This profile is a subset of the ModelImagedObjectProfile.  It is identical to this profile in most respects, except:

·         It specifies the use of the MODS, MIX and METSRights extension schemas for embedded descriptive metadata, image technical metadata and rights metadata respectively.  Conforming objects need not include these kinds of metadata; but must use the specified extension schemas when they do.

·         It specifies that the <mets> root element must contain an OBJID attribute with a valid ARK value that uniquely identifies the object. (The Archival Resource Key, or ARK, identifier is a naming scheme for persistent access to digital objects. For more information see: http://www.cdlib.org/inside/diglib/ark/ )

 

UCBPagedTextObjectProfile.  This profile is a subset of the ModelPagedTextObjectProfile.  It is identical to this profile in most respects, except:

·         It specifies the use of the MODS, MIX, TextMD and METSRights extension schemas for embedded descriptive metadata, image technical metadata, text technical metadata and rights metadata respectively.  Conforming objects need not include these kinds of metadata; but they must use the specified extension schemas when they do.

·         It specifies that the <mets> root element must contain an OBJID attribute with a valid ARK value that uniquely identifies the object. (The Archival Resource Key, or ARK, identifier is a naming scheme for persistent access to digital objects. For more information see: http://www.cdlib.org/inside/diglib/ark/ )