Overview of METS Profiles. Jerry McDonough and the METS Editorial Board
have developed a METS Profile schema to provide a formal means for institutions
implementing METS to express the rules and encoding conventions governing
their various categories of METS objects.
The hope is that lead institutions will develop and publish METS profiles
in fairly short order, and that these will help encourage some consistency
in METS objects across institutions. METS
objects, of course, will all have a certain consistency just by virtue of
being METS objects. But the fact is
that the METS schema allows a wide lattitude in the structuring and encoding
of METS objects and it also does not specify controlled vocabularies for many
key attribute values. Insofar as
research libraries want to share METS objects with each other, and make them
available to other communities (such as the learning technology community),
they need 1) to develop METS profiles
to insure consistency among their METS objects and to publish their encoding
practices to potential users of their METS objects; and 2) to align their
METS profiles with the other institutions and communities with whom they wish
to share METS objects and METS tools. One possible way of accomplishing the
latter would be for institutions that want to share METS objects to derive
their specific METS profiles from common “model” profiles.
Miscellaneous considerations.
Categories of METS
objects. The categories of METS objects that need to be
represented by discrete profiles is not immediately obvious. It seems likely, however, that some kind of
categorization on the basis of content will be appropriate. The candidate profiles referenced from this
document are primarily content based.
Relationship between
METS profiles. METS profiles can build on other METS
profiles, and the METS profile schema provides for expressing relationships
between profiles. The referenced
candidate schemas posit two basic kinds of relationships: “extends/extended by”, and “subset”.
“Extends/Extended by”
relationship. A
profile “extends” another profile if it provides for another kind of content in
addition to the content type or types represented by the extended profile. In
the candidate profiles, for example, the ModelPagedTextProfile extends the
ModelImagedObject profile. The latter
provides for image only content; while the former provides for both image and
text content. The ModelPagedTextProfile expands certain controlled vocabularies
specified by the ModelImagedObjectProfile, and provides for more elaborate
structures in the <structMap>.
“Subset” Relationship. A profile is a subset of another profile if
it specifies all of the same parameters as the parent profile, but specifies
parameters and restrictions in addition to these. An object implementing a particular profile will also satisfy the
parent profile of the implemented profile.
For example, in the case of the candidate profiles, the
UCBImagedObjectProfile is a subset of the ModelImagedObjectProfile. The UCBImagedObjectProfile is identical to
the ModelImagedObjectProfile in almost all respects, but it specifies a
specific set of supported extension
schemas. Any object satisfying the
UCBImagedObjectProfile will also satisfy the ModelImagedObjectProfile.
Preliminary Candidate Profiles for image and
text content. This document references candidate profiles
for METS Objects with only image content (ModelImagedObjectProfile.xml and
UCBImagedObjectProfile.xml) and for METS objects with only image and/or text
content (ModelPagedTextObjectProfile.xml
and UCBPagedTextObjectProfile.xml).
Model profiles vs UCB
Profiles.
Model Profiles. Rick
Beaubien has been working on the model METS profiles on behalf of the METS
Editorial Board. These profiles are intended to establish some possible general
standards for certain kinds of METS objects—specifically those with image only,
or both image and text content. The
hope is that once these profiles have been completed and are registered in the
METS profile registry they might help to encourage some consistency in the
application of METS across repositories.
They reflect input from other METS Editorial Board members, particularly
Jerry McDonough, Robin Wendler at Harvard, and Bruce Washburn at RLG. It is anticipated that at least METS objects
representing image and/or text content produced by UC System libraries, by
Harvard, by RLG and by NYU will satisfy these model schemas.
UCB profiles.
The UCB profiles are subsets (as defined above) of the model profiles, and
reflect the specific encoding conventions—both current and
anticipated—governing METS objects
being produced in the Library at UC Berkeley.
These METS objects include those being produced for the California
Cultures project and hence represent materials held by numerous UC repositories
and museums, and not just the UC Berkeley Library. The UCB profiles as written
try to allow some room for change and growth, so that UCB will not need
constantly either to update the profiles or to create new profiles.
Overview of Model
Profiles.
ModelImagedObjectProfile. This is a fairly simple profile intended primarily
as a basis on which to build more complex profiles. However, it could be used directly for METS
objects which only have associated image content files. The salient features of this profile are:
·
It provides for image content only
·
It specifies controlled vocabularies for the
<file> USE attribute and the <structMap> TYPE attribute.
·
It neither requires nor excludes descriptive
and/or administrative metadata. It does
not attempt to specify appropriate schemas for expressing descriptive,
technical, source, rights, or digital provenance metadata.
·
It specifies that the METS <fileSec>
should be organized into <fileGrp> elements that represent image content
of similar format.(For example, one <fileGrp> for tiff masters, one
<fileGrp> for hi-res jpegs, one <fileGrp> for medium-res jpegs, one
<fileGrp> for gif thumbnails.)
·
It specifies that there will be one and only one
<structMap>; and that this will be either of “physical” or “mixed” type.
The type “physical”
designates a purely physical structure.
For example: a single image
without subdivisions; an image divided into recto and verso views; a photo
album divided into page views divided into views of the individual photographs
on the page; a pamphlet divided into page views. The type “mixed” designates a mixed structure. For example, a book divided into chapters,
divided into page views.
·
It excludes the use of the <area>,
<par> and <seq> elements from the <structMap>. Each <fptr> element in objects of
conforming to this profile must point directly its associated content file via
its FILEID attribute.
ModelPagedTextObjectProfile. This profile extends the ModelImagedObjectProfile.
Its salient features are:
·
It provides for image content and/or text
content.
·
It specifies controlled vocabularies for the
<file> USE attribute and the <structMap> TYPE attribute. The <file> USE attribute values build
on those specified in the Model Imaged Object profiles, but include additional
values to cover text content.
·
It neither requires nor excludes descriptive
and/or administrative metadata. It does
not attempt to specify appropriate schemas for expressing descriptive,
technical, source, rights, or digital provenance metadata.
·
It specifies that the METS <fileSec>
should be organized into <fileGrp> elements that represent image or text
content of similar format.(For example, one <fileGrp> for tiff masters,
one <fileGrp> for hi-res jpegs, one <fileGrp> for medium-res jpegs,
one <fileGrp> for gif thumbnails, and one <fileGrp> for TEI
transcriptions).
·
It allows for multiple <structMap>
elements of any type: “physical”, “logical” or “mixed”. The type “physical” designates a purely
physical structure. For example: a single image without subdivisions; an
image divided into recto and verso views; a photo album divided into page views
divided into views of the individual photographs on the page; a pamphlet
divided into page views. The type
“logical” designates an intellectual structures. For example, a book divided into chapters. The type “mixed”
designates a mixed structure. For
example, a book divided into chapters, divided into page views.
·
It allows for the use of the <area> and
<seq> elements in the <structMap>, but excludes the use of the
<par> element.
Overview of UCB
Profiles.
UCBImagedObjectProfile. This profile is a subset of the ModelImagedObjectProfile.
It is identical to this profile in most respects, except:
·
It specifies the use of the MODS, MIX and
METSRights extension schemas for embedded descriptive metadata, image technical
metadata and rights metadata respectively.
Conforming objects need not include these kinds of metadata; but must
use the specified extension schemas when they do.
·
It specifies that the <mets> root element
must contain an OBJID attribute with a valid ARK value that uniquely identifies
the object. (The Archival Resource Key, or ARK,
identifier is a naming scheme for persistent access to digital objects. For
more information see: http://www.cdlib.org/inside/diglib/ark/
)
UCBPagedTextObjectProfile. This profile is a subset of the ModelPagedTextObjectProfile.
It is identical to this profile in most respects, except:
·
It specifies the use of the MODS, MIX, TextMD
and METSRights extension schemas for embedded descriptive metadata, image
technical metadata, text technical metadata and rights metadata
respectively. Conforming objects need
not include these kinds of metadata; but they must use the specified extension
schemas when they do.
·
It specifies that the <mets> root element
must contain an OBJID attribute with a valid ARK value that uniquely identifies
the object. (The Archival Resource Key, or ARK,
identifier is a naming scheme for persistent access to digital objects. For
more information see: http://www.cdlib.org/inside/diglib/ark/
)