Glossary

The information contained in this page is meant to be a quick reference guide to users or consumers of digital content. There was a concerted effort to try and make the language contained within the definitions as accessible as possible, but because of the complex nature of digital work, this may not always be the case. If there are any questions about the information on the page, of if you desire additional clarification or information that please feel free to contact us!

Administrative Metadata: Used for managing the digital object and providing more information about its creation and also provides details about governing its use.

Administrator: A person or entity authorized to define users and their roles within an inventory. An administrator also has the rights of a submitter and a consumer.

API: Application Programming Interface- A set of instructions or rules that enable two operating systems or software applications to communicate.

Authenticity: A digital object must be authentic to the original; it must be verified through an authentication process.

Authentication: The procedure for verifying the integrity of an object.

Authority Control: The entity of control in a collection that maintains the integrity of the headings and vocabulary.

Boolean Query: A search parameter that allows a user to combine different symbols and operators within a single search string.

CGI: Common Gateway Interface- A standard for applications to work in tandem with web servers. In the interface customization tool kit, CGI refers to the application that executes the search and generates the retrieval set from the collection of METS records.

Collection: A number of documents assembled in a single physical or virtual location by one or more persons, or by a corporate entity, and arranged in some kind of systematic order to facilitate retrieval.

Consumer: A person or client system authorized by the producer to view or disseminate objects from the Digital Preservation Repository.

Content File: A file that is either born digitally or produced using various kinds of capture application software. Audio, image, text, and video are the basic kinds of content files. Versions of a content file may be dispersed across several file formats.

Controlled Vocabulary: A list of preferred terms from which a term must be selected when assigning subject headings or descriptors in a bibliographic record as authorized by the authority control.

Crosswalk: A human-generated chart or diagram indicating equivalencies and relationships between the data elements of two or more metadata standards. In other words, a document stating the equivalent fields across two different standards.

CSS: Cascading Style Sheets- A style sheet language used to modify the format and presentation of marked up languages such as HTML or XML.

Curation: To take care of, to manage, or to provide access to.

Data Content Standard: Rules for determining and formulating data values within metadata elements. Examples include the Anglo-American Cataloging Rules (AACR2), Cataloging Cultural Objects (CCO), Describing Archives: a Content Standard (DACS), and Graphic Materials (GIHC).

DDI: Data Documentation Initiative- An effort to establish an international XML-based standard for the content, presentation, transport, and preservation of documentation for data sets in the social and behavioral sciences.

Data Interchange Standard: Used to define the encoding, storage, transmission, and interchange of data values represented within a data structure standard.

Data Structure Standard: Standards that define metadata elements. Examples of data structure standards include Dublin Core, MODS, and MARC21.

Data Value: A discrete unit of data within a metadata element, i.e., the data encoded within a tag.

Descriptive Metadata: Metadata used for the discovery and interpretation of the digital object.

Digital Library: A library in which a significant proportion of the resources are available in machine-readable format and accessible by means of computers.

Digital Object An entity in which one or more content files and their corresponding metadata are united, physically and/or logically, through the use of a digital wrapper.

DOI: Digital Object Identifier -A stable identifier (URL). DOI

Digital Preservation: The managed activities necessary for ensuring the long-term retention and usability of digital objects.

Digital Provenance Administrative Metadata: Administrative metadata that is the history of migrations, transformations, or translations performed on a digital library object's content files from their original digital capture or encoding. It should contain information regarding the ultimate origin of the content files.

Digital Wrapper: A structured XML-based text file that binds digital object content files and their associated metadata together and that specifies the logical relationship of the content files.

DIP: Dissemination Information Package -An external representation of an object exported from the Digital Preservation Repository, optionally including an Archival Information Package, Submission Information Package, and object metadata.

Document: A generic term for a physical entity consisting of any substance on which is recorded all or a portion of one or more works for the purpose of conveying or preserving knowledge.

DSpace: An open source software package that provides the tools for management of digital assets, and is commonly used as the basis for an institutional repository. DSpace

DTD: Document Type Definition- A common way of defining the structure, elements, and attributes that are available for use in a SGML or XML document that complies to the DTD. For example, the (TEI) DTD governs the structure, elements, and attributes of a TEI document.

Dublin Core: A simple set of metadata elements used as a common meeting ground between richer, more granular metadata standards from diverse groups. Dublin Core

Element: A component of metadata, or a component of a data structure defined by a DTD or schema. Within an XML definition an XML element is everything from (including) the element's start tag to (including) the element's end tag.

Emulation: The imitation of a computer system, performed by a combination of hardware and software, that allows programs to run between incompatible systems.

EAD: Encoded Archival Description-is a nonproprietary standard for encoding, in SGML or XML, the finding aids used in archives, libraries, museums, and other repositories.

Expression: the form in which a creative work is realized, for example, a single variant of the text of a literary work. For example, a single variant of the text of a literary work (Shakespeare''s Hamlet).

Fedora: Flexible Extensible Digital Object Repository Architecture- “Fedora provides a core repository service (exposed as web-based services with well-defined APIs). In addition, Fedora provides an array of supporting services and applications including search, OAI-PMH, messaging, administrative clients, and more.” Fedora

File Inventory Metadata: A list of all files, content files, and corresponding (metadata) comprising the digital object.

Finding Aid: A guide or inventory to a collection held in an archive, museum, library, or historical society. It provides a detailed description of a collection, its intellectual organization and, at varying levels of analysis, of individual items in the collection.

Folksonomy: a system of classification in which users collaboratively create, assign, and manage tags to annotate and categorize information content. A folksonomy is often visualized in a “tag cloud” on the Internet.

FRBR: Functional Requirements for Bibliographic Records- Provides a framework for relating the data that are recorded in bibliographic records to the needs of those records. It uses an entity-relationship model of metadata for information objects, instead of the single flat record concept underlying current cataloging standards. The FRBR model includes four levels of representation: work, expression, manifestation, and item.

FTP: File Transfer Protocol- is a standard network protocol used to copy a file from one host to another over a TCP/IP-based network.

GIF: Graphics Interchange Format- is a bitmap image format that has come into widespread usage on the World Wide Web due to its wide support and portability.

Granularity: The level of detail at which an information object or resource is viewed or described.

Handle: is a particular kind of data type whose value refers directly to (or "points to") another value stored elsewhere in the computer memory using its address.

Harvest: The process by which software can collect metadata packages from remote locations that describe information resources available at those locations.

Hierarchy: The arrangement of classes in a classification system, from the most general to the most specific.

HTML: HyperText Markup Language- is the predominant markup language for web pages and uses markup tags to describe web pages.

Hyperlink: Is an embedded link that points to a whole document or to a specific element within a document.

Islandora: is an open source framework developed by the University of Prince Edward Island's Robertson Library. Islandora combines the Drupal and Fedora open software applications to create a robust digital asset management system.

Ingest: The process by which a digital object or metadata package is absorbed by a different system than the one that produced it.

Interoperability: The ability of different types of computers, networks, operating systems, and applications to work together effectively, without prior communication, in order to exchange information in a useful and meaningful manner.

Item: a single concrete exemplar of a manifestation of an expression of an intellectual or artistic work, in most cases a single physical object. (such as a copy of an edition of a single-volume monograph)

JPEG: Joint Photographic Experts Group - A compression format for image files that is most commonly used with digital cameras.

JPEG 2000: Joint Photographic Experts Group 2000 - The second generation of JPEG that offers an increased rate of compression and superior processing capabilities.

LCSH: Library of Congress Subject Headings- A bibliographic thesaurus for subject headings maintained by the Library of Congress.

Lossless: A term that describes a data encoding process that allows the exact original data of an object to be reconstructed from the compressed data without any loss of data.

Lossy: A term to describe a data encoding method that loses data during the compression process.

MADS: Metadata Authority Description Schema- An XML schema maintained by the Library of Congress that can be used to provide metadata about agents (people, organizations), events, and terms (topics, geographics, genres, etc.).

Manifestation: the result of a single act of physical embodiment/production of a specific expression of a creative work. For example, an edition of one of the variant texts of a literary work (1993 Yale University Press edition of Hamlet)

MARC21: MAchine Readable Cataloging - a data structure and interchange standard for the representation and communication of bibliographic and related information in machine-readable form. MARC21

MARCXML: A version of XML that is designed to work in concert with MARC 21.

Markup Language: A system for annotating a text in a way that is syntactically distinguishable from that text.

Metadata: Structured information about an object, a collection of objects, or a part of an object such as an individual content file. See also administrative metadata, descriptive metadata, preservation metadata, technical metadata, and usage metadata.

METS: Metadata Encoding and Transmission Standard -A standard for encoding descriptive, administrative, and structural metadata about objects within a digital library, expressed using XML. METS is the emerging national standard for wrapping digital library materials. It is being developed by the Digital Library Federation (DLF) and is maintained by the Library of Congress. METS

Metadata Harvest The harvest of existing metadata records from resource repositories, such as through OAI, to gather metadata for query results or index creation.

MODS: Metadata Object Description Schema-An XML schema, and a data structure and interchange standard, used for the creation of original resource description records (and may also be used as an alternative method for representing MARC data). MODS was developed by the Library of Congress' Network Development and MARC Standards Office. MODS

Metasearching: The act of searching more than one database simultaneously through the use of metasearch software.

Migration: The transfer of digital objects from one hardware or software configuration to another, or from one generation of computer technology to a subsequent generation. The purpose of migration is to preserve the integrity of digital objects; and to retain the ability for clients to retrieve, display, and use them in the face of constantly changing technology.

Mirroring: The process of making exact replicas of resource items, such as web pages, with slight modifications to hyperlinks as needed to reproduce the behavior of the items. This is similar to using the "save as" function from a browser to save a local copy of the page, including its contents and images.

Namespace: A unique name that identifies an organization that has developed an XML schema, identified by a Uniform Resource Identifier (URL or URN)

NISO: National Information Standards Organization - a non-profit association accredited by the American National Standards Institute which identifies, develops, maintains, and publishes technical standards to manage information in our digital environments. NISO

OAI: Open Archival Information -Develops and promotes an interoperability based framework and associated standards for the dissemination of content. The essence of the open archives approach is to enable access to web-accessible material through interoperable repositories for metadata sharing, publishing, and archiving.

OAIS: The Reference Model for an Open Archival Information System -is a very loose standard that lays out a framework for the features that should be in an archival system.

OAI-PMH: Open Archives Initiative-Protocol for Metadata Harvesting -A protocol defined by the Open Archives Initiative. It provides a method for content providers to make records for their items available for harvesting by service providers, such as centralized search services.

OCLC: Online Computer Library Center -“Is a nonprofit, membership, computer library service and research organization dedicated to the public purposes of furthering access to the world’s information and reducing information costs.” OCLC

OCR: Optical Character Recognition- Technology that “reads” text images such as newspapers or books to provide searchable text for locating words or phrases from your search. While the overall quality of the OCR results is good, the results may vary depending on how well it “reads” the original materials

Ontology: A relationship structure that is similar to a taxonomy, but instead of being hierarchical, it is based upon relationships. There is no rigid structure to confine the organization of an ontology.

Persistent Link: An Internet address that remains unchanged over time.

PDF: Portable Document Format - An open standard for document exchange designed and maintained by Adobe.

PHP: Hypertext Preprocessor -PHP is a general purpose scripting language which is used for making dynamic and interactive Web pages.

PID: Personal IDentifier - Is a unique number assigned to an object to help identify it in the database and in the digital landscape at large.

PNG: Portable Network Graphic -Is a bitmapped image format that employs lossless data compression.

Preservation Metadata: Metadata used to describe the prescribed resources within an object.

Provenance: The agency, office or person of origin of records, i.e. the entity which created, received or accumulated and used the records in the conduct of business or personal life.

Rastor: A data structure representing a generally rectangular grid of pixels, or points of color, viewable via a monitor, paper, or other display medium.

RDF: Resource Description Framework- It is a general-purpose XML language for describing objects, their properties, and their relationships. It is being used as the basis for the "Semantic Web".

Schema: A common way of defining the structure, elements, and attributes that are available for use in a XML document that complies to the schema.

SGML: Standard Generalized Markup Language- is an ISO standard technology for defining generalized markup languages for documents.

SQL: Structured Query Language- is a database computer language designed for managing data in relational database management systems.

SIP: Submission Information Package- An external object representation prepared by the producer for the purpose of ingest into the Digital Preservation Repository, where it will be converted automatically to an Archival Information Package.

Tag: A short, formal name used to indicate data structure or metadata elements.

Tagging: Adding and/or inserting metadata or data structure to an object.

Technical Metadata: Metadata that describes the technical attributes of the digital file.

TEI: Text Encoding Initiative- An initiative that publishes Document Type Definitions catering to a wide range of academic electronic text projects. Books, manuscripts, collections of poetry, and other kinds of literary and linguistic texts for online research and teaching that are available electronically are encoded in TEI.

Thesaurus: A type of controlled vocabulary

TIF: Tagged Image File Format- a format for high color-depth images.

URI: Uniform Resource Identifier-is a string of characters used to identify a name or a resource on the Internet.

URL: Uniform Resource Locator-specifies where an identified resource is available and the mechanism for retrieving it.

URN: Uniform Resource Name-is intended to serve as persistent, location-independent resource identifiers and designed to make it easy to map other namespaces.

User: A login identity used to authenticate a person or client system as a submitter consumer, or administrator for an inventory.

Validation: A process to check one or more aspects of a submission for schema errors, file format problems, and ingest parameter inconsistencies that might affect its suitability for preservation. Results of a validation may include any combination of structural analysis information, warning messages, or fatal errors that prevent an object from being ingested.

Vector: Vector graphics formats are the representation of images as an array of pixels, as it is typically used for the representation of photographic image It is considered to be complementary to raster graphics.

Work: a distinct intellectual or artistic creation, independent of any concrete realization or expression of its content. (example: Hamlet as opposed to a specific text of the play)

XHTML: eXtensible Hyper Text Markup Language- is a stricter and cleaner version of HTML

XML: eXtensible Markup Language- is a set of rules for encoding documents in machine readable form.

XSD: XML Schema Definition- An alternative to a DTD that describes the structure of an XML document.

XSLT: eXtensible Stylesheet Language Transformations- Can be used to transform an XML document into another form such as PDF, or HTML. XSLT stylesheets work as a series of templates that produce the desired formatting effect each time a given element is encountered.

Some definitions borrowed from the California Digital Library CDL,

and the Online Dictionary for Library and Information Science.