Digital Preservation Policy

Critical Processes and OAIS Mandatory Responsibilities

Critical Processes and OAIS Mandatory Responsibilities

  1. Introduction
    • This document traces critical processes employed by York University Libraries (YUL) to meet the "mandatory responsibilities" of a digital repository as described in OAIS. This document identifies which processes are necessary for the repository to fulfill its mandatory responsibilities.
  2. OAIS 3.1: "Negotiate for and accept appropriate information from information Producers."
    • YUL has a clearly defined process for negotiating with producers and ensuring that it acquires appropriate information. See the Rights Policy for more information.
  3. OAIS 3.1: "Obtain sufficient control of the information provided to the level needed to ensure Long-Term Preservation."
    • YUL obtains rights from individual producers that give the repository control over all of the information deposited by the producer. The nature and scope of these rights varies by submitter. In cases where the repository takes responsibility for the preservation of information, the rights include provisions for YUL to receive a local copy of the information and host it in perpetuity. In some cases, the repository obtains the right to modify information in order to ensure long-term preservation and accessibility. See the Rights Policy for more information.
  4. OAIS 3.1: "Determine, either by itself or in conjunction with other parties, which communities should become the Designated Community and, therefore, should be able to understand the information provided."
  5. OAIS 3.1: "Ensure that the information to be preserved is Independently Understandable to the Designated Community. In other words, the community should be able to understand the information without needing the assistance of the experts who produced the information."
  6. OAIS 3.1: "Follow documented policies and procedures which ensure that the information is preserved against all reasonable contingencies, and which enable the information to be disseminated as authenticated copies of the original, or as traceable to the original."
    • YUL has policies and procedures for the long-term preservation of information. See the Preservation Implementation Plan, and Definition of AIP for more information about the repository’s ingest, data management, and archival storages processes. AIPs are not deleted as a part of the repository’s normal operations.
    • The repository maintains backups of all content. See the Backup Plan for more details.
    • The repository is an integral component of disaster recovery planning for YUL.
    • YUL negotiates submission policies and procedures with individual producers. See the Rights Policy and the Definition of SIP for more information.
    • The repository has policies and procedures for the dissemination of information to its Designated Community. See the Definition of DIP for more information. To maintain the understandability and accessibility of disseminated information, YUL carries out extensive usability testing and solicits feedback from its Designated Community.
    • To ensure authenticity, each AIP is linked to a specific object and source file by information in the preservation metadata. This information is not visible to the Designated Community in the DIP, but can made available if necessary. The repository’s DIPs are always generated from a single AIP.
  7. OAIS 3.1: "Make the preserved information available to the Designated Community."
    • YUL disseminates the information to its Designated Community through its own user interfaces. See the Definition of DIP for more information about the repository's dissemination process. Depending on the license, access may be restricted to users affiliated with the York University community. See the Access Policy for more information.

Acknowledgements

Adapted from and inspired by:

License

CC0

CC0 1.0 Universal (CC0 1.0) Public Domain Dedication

URI Policy

URI Policy

Policy Statement

URIs created by York University Digital Library

  • York University Digital Library uses a systematic convention to generate unambiguously unique identification for digital objects within its repository. This convention will create a stable name or reference to an object that can be permanently associated with that object, regardless of future changes to organizational structure or to digital access protocols.
  • This is in conformance with section 4.2.4 of Metrics for Digital Repository Audit and Certification (CCSDS, June 2009) which states that a compliant repository "shall have and use a convention that generates persistent, unique identifiers for all AIPs" and "its components."
  • This convention will ensure that “each AIP can be unambiguously found in the future” and that "each AIP can be distinguished from all other AIPs in the repository"

Implementation

Islandora object

York University Digital Library canonical URIs are consistently constructed in the following manner:

  • /islandora/object/PID

These URIs are aliased using Islandora Pathauto to the following pattern:

  • [fedora:pid]/[fedora:label]

Example:

  • Photograph: New Woodbine : racehorses train for opening of season
  • Canonical URI: http://digital.library.yorku.ca/islandora/object/yul:88675
  • Aliases URL: http://digital.library.yorku.ca/yul-88675/new-woodbine-racehorses-train-opening-season

Islandora object datastream

York University Digital Library object datastream canonical URIs are consistently constructed in the following manner:

  • /islandora/object/PID/datastream/DATASTREAM_NAME/view
  • /islandora/object/PID/datastream/DATASTREAM_NAME/download
  • [fedora:pid]/[fedora:label]/datastream/DATASTREAM_NAME/view
  • [fedora:pid]/[fedora:label]/datastream/DATASTREAM_NAME/download

Example:

  • Photograph: New Woodbine : racehorses train for opening of season
  • Canonical URI: http://digital.library.yorku.ca/islandora/object/yul:88675/datastream/JPG/view
  • Aliases URL: http://digital.library.yorku.ca/yul-88675/new-woodbine-racehorses-train-opening-season/datastream/JPG/download

Publicly available datastream names

Audio:

  • TN (Thumbnail)
  • PROXY_MP3 (Streaming quality MP3)

Book:

  • TN (Thumbnail)
  • ORIGINAL_PDF (Only for Buddhism Across Boundaries: Buddhist Periodicals and Books from Colonial Burma collection )

Images:

  • TN (Thumbnail)
  • JPG (Medium sized JPG)
  • OCR (OCR'd text)

Metadata:

  • MODS (Descriptive metadata)
  • DC (Descriptive metadata)
  • TECHMD_FITS (Technical metadata)
  • RELS-EXT (Fedora Object to Object Relationship)

Video:

  • TN (Thumbnail)
  • MP4 (Streaming quality MP4)

Web ARChive:

  • TN (Thumbnail)
  • JPG (Medium sized JPEG)
  • WARC_CSV (WARC Index)
  • WARC_FILTERED (WARC filtered)
  • OBJ (Warc)

Acknowledgements

Adapted from and inspired by:

License

CC0

CC0 1.0 Universal (CC0 1.0) Public Domain Dedication

Registry of file formats

Policy Statement

YUDL requires immediate identification of the type of file format submitted in order to help mitigate risk posed by format obsolescence. To this end, YUDL employs the use of DROID, JHOVE, file utility, Exiftool, PRONOM, NLNZ Metadata Extractor, ffident, and Tika through the FITS software package.

While YUDL is not dependent on or restricted to any particular format or group of formats, it aims to use well-known, widely accepted formats that support long-term preservation. If a submitter wants to use a specific format not meeting these criteria, an agreement must be reached between the submitter and YUDL.

Implementation Examples

YUDL makes use of FITS for format identification during the ingestion process where a file format is associated with each file.

Example characterization and reference to format registry:

<?xml version="1.0" encoding="UTF-8"?>
<fits xmlns="http://hul.harvard.edu/ois/xml/ns/fits/fits_output" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://hul.harvard.edu/ois/xml/ns/fits/fits_output http://hul.harvard.edu/ois/xml/xsd/fits/fits_output.xsd" version="0.7.4 (fits-mcgath fork)" timestamp="02/07/13 4:26 PM">
  <identification>
    <identity format="Tagged Image File Format" mimetype="image/tiff" toolname="FITS" toolversion="0.7.4 (fits-mcgath fork)">
      <tool toolname="Jhove" toolversion="1.9" />
      <tool toolname="file utility" toolversion="5.09" />
      <tool toolname="Exiftool" toolversion="9.13" />
      <tool toolname="NLNZ Metadata Extractor" toolversion="3.4GA" />
      <tool toolname="ffident" toolversion="0.2" />
      <tool toolname="Tika" toolversion="1.3" />
      <version toolname="Jhove" toolversion="1.9">5.0</version>
    </identity>
  </identification>
  <fileinfo>
    <size toolname="Jhove" toolversion="1.9">33543972</size>
    <creatingApplicationName toolname="Exiftool" toolversion="9.13">Adobe Photoshop Elements 2.0</creatingApplicationName>
    <lastmodified toolname="Exiftool" toolversion="9.13" status="CONFLICT">2007:03:09 11:00:49-05:00</lastmodified>
    <lastmodified toolname="Tika" toolversion="1.3" status="CONFLICT">2007-03-09T11:00:48</lastmodified>
    <filepath toolname="OIS File Information" toolversion="0.1" status="SINGLE_RESULT">/mnt/DIY/Archives/ASC/tiffs/02000-02999/ASC02000.tif</filepath>
    <filename toolname="OIS File Information" toolversion="0.1" status="SINGLE_RESULT">/mnt/DIY/Archives/ASC/tiffs/02000-02999/ASC02000.tif</filename>
    <md5checksum toolname="OIS File Information" toolversion="0.1" status="SINGLE_RESULT">b2b263bf5207481e42ac5945538ec985</md5checksum>
    <fslastmodified toolname="OIS File Information" toolversion="0.1" status="SINGLE_RESULT">1173456049000</fslastmodified>
  </fileinfo>
  <filestatus>
    <well-formed toolname="Jhove" toolversion="1.9" status="SINGLE_RESULT">true</well-formed>
    <valid toolname="Jhove" toolversion="1.9" status="SINGLE_RESULT">true</valid>
  </filestatus>
  <metadata>
    <image>
      <byteOrder toolname="Jhove" toolversion="1.9" status="SINGLE_RESULT">little endian</byteOrder>
      <compressionScheme toolname="Jhove" toolversion="1.9">Uncompressed</compressionScheme>
      <imageWidth toolname="Jhove" toolversion="1.9">7108</imageWidth>
      <imageHeight toolname="Exiftool" toolversion="9.13">4716</imageHeight>
      <colorSpace toolname="Jhove" toolversion="1.9">BlackIsZero</colorSpace>
      <orientation toolname="Jhove" toolversion="1.9" status="SINGLE_RESULT">normal*</orientation>
      <samplingFrequencyUnit toolname="Jhove" toolversion="1.9" status="CONFLICT">in.</samplingFrequencyUnit>
      <samplingFrequencyUnit toolname="Tika" toolversion="1.3" status="CONFLICT">Inch</samplingFrequencyUnit>
      <xSamplingFrequency toolname="Jhove" toolversion="1.9" status="CONFLICT">6000000/10000</xSamplingFrequency>
      <xSamplingFrequency toolname="Exiftool" toolversion="9.13" status="CONFLICT">600</xSamplingFrequency>
      <xSamplingFrequency toolname="NLNZ Metadata Extractor" toolversion="3.4GA" status="CONFLICT">600.0</xSamplingFrequency>
      <ySamplingFrequency toolname="Jhove" toolversion="1.9" status="CONFLICT">6000000/10000</ySamplingFrequency>
      <ySamplingFrequency toolname="Exiftool" toolversion="9.13" status="CONFLICT">600</ySamplingFrequency>
      <ySamplingFrequency toolname="NLNZ Metadata Extractor" toolversion="3.4GA" status="CONFLICT">600.0</ySamplingFrequency>
      <bitsPerSample toolname="Jhove" toolversion="1.9" status="CONFLICT">integer</bitsPerSample>
      <bitsPerSample toolname="Exiftool" toolversion="9.13" status="CONFLICT">8</bitsPerSample>
      <samplesPerPixel toolname="Jhove" toolversion="1.9">1</samplesPerPixel>
      <scanningSoftwareName toolname="Jhove" toolversion="1.9">Adobe Photoshop Elements 2.0</scanningSoftwareName>
      <YSamplingFrequency toolname="Tika" toolversion="1.3" status="SINGLE_RESULT">600.0</YSamplingFrequency>
    </image>
  </metadata>
</fits>

Acknowledgements

Adapted from and inspired by:

License

CC0

CC0 1.0 Universal (CC0 1.0) Public Domain Dedication

Definition of DIP

Definition of DIP

Dissemination Information Package (DIP)

  • OAIS describes a DIP as "the Information Package, derived from a part, or all, of one or more AIPs, received by the Consumer in response to a request to the OAIS."
  • York University Digital Library's (YUDL) DIPs are always generated from a single AIP.
  • User access to archival objects is provided through the YUDL website.
  • The user, depending on their level of access, will may see basic object metadata, and an access version of the digital object.
  • Context information is provided in the form of links to other items in a given collection.
  • The DIP is retrieved using the URI for the corresponding AIP. In turn, the AIP contains metadata tying it back to the SIP.

Acknowledgements

Adapted from and inspired by:

License

CC0

CC0 1.0 Universal (CC0 1.0) Public Domain Dedication

Definition of AIP

Definition of AIP

Archival Information Package (AIP)

  • The information package consisting of the Content Information (CI), Preservation Description Information (PDI), Packaging Information (PI), and Descriptive Information (DI) that is archived by York University Libraries (YUL).
  • The level of content in a York University Digital Library (YUDL) AIP can vary, depending on the amount of content provided by the submitter.
  • This description will use the OAIS Information Model to illustrate completeness of our conceptual model, and will describe, in general terms, what a YUDL AIP looks like.

Content Information (CI)

  • The Content Data Object (CDO) is generally stored with from the primary preservation metadata file, which is held in Fedora Commons.
  • Representation Information is maintained, and contains information on the CDO's file format, version, and a reference to a format registry in order to provide information on how to interpret the file. See: registry of file formats

Preservation Description Information (PDI)

  • Reference Information - Identifiers are stored for each object identifying it globally (e.g. YUDL PID) and locally (e.g. URI).
  • Provenance Information - Provenance metadata is maintained for each object that provides a history of preservation events in the object's lifetime, beginning at ingest into the YUDL repository and referencing any preservation activities taken on the object (e.g., replacement due to corruption, format migration, etc.).
  • Context Information - As appropriate, information on how a CDO relates to other CDOs or to other conceptual entities. Examples of these relationships can include: a newer version of an object that supersedes an older one.
  • Fixity Information - Fixity information is generated at the time of ingest in order to later determine whether or not the item remains in the same state as when it was ingested. This information can be used to determine integrity of an object being copied within the system (as in the case of a change in storage location), or for periodic integrity checks.

Packaging Information (PI)

  • YUDL preservation metadata packages both the descriptive and preservation metadata together.

Descriptive Information (DI)

  • Depending on the type of CDO, the format of this descriptive metadata can vary (MODS or Dublin Core), but is selected to maximize findability. In all cases, the descriptive metadata will be recreated within the preservation metadata.

Acknowledgements

Adapted from and inspired by:

License

CC0

CC0 1.0 Universal (CC0 1.0) Public Domain Dedication

Definition of SIP

Definition of SIP

Submission Information Package (SIP)

  • The information package that is delivered to York University Digital Library for use in the construction of one or more AIPs.
  • The format of the SIP may vary from submitter to submitter, based on the submitters willingness and ability to provide the content and metadata in a specific format.
  • For a given Content Type, any requirements or restrictions on the type of content that can be contained in the SIP will be described in that Content Type's Preservation Action Plan.

Acknowledgements

Adapted from and inspired by:

License

CC0

CC0 1.0 Universal (CC0 1.0) Public Domain Dedication

Backup Plan

Backup Plan

1. Policy Statement

As part of York University Libraries (YUL) implementation of the Bit-stream Copying Preservation Strategy (as detailed in the Preservation Implementation Plan), YUL is committed to regular backup procedures of both data storage areas and its operational areas (e.g. databases, application files). These backups are intended to serve as the basis for restoration of York University Digital Library (YUDL) materials in the case of disaster recovery or corruption of data.

Data backup at YUL is coordinated through Library Computing Systems and University Information Technology. Since the data is stored on physical hardware located in the YUL data centre, it uses the same backup hardware and software as the general university systems.

2. Implementation

2.1 Database Backup

This backup strategy applies to content that is stored in a database. Primarily, this refers to objects located in YUDL's databases (MySQL).

2.1.1 MySQL Database Backup Strategy

  • Database dumps are backed up daily and taken off site. Each backup is kept for 60 days.

2.2 Application Backup

2.2.1 Fedora Commons Backup Strategy

  • Fedora Commons is backed up daily and taken off site. Each backup is kept for 60 days.

2.2.2 Drupal (Islandora)

  • Drupal is backed up daily and taken off site. Each backup is kept for 60 days.

2.2.3 Solr

  • Solr is backed up daily and taken off site. Each backup is kept for 60 days.

2.3 Objects

  • Fedora objects are backed up daily and taken off site. Each backup is kept for 60 days.

2.4 Verification

  • Quarterly disaster recovery drills coordinated between the Digital Assets Librarian and YUL Library Computing Systems to test system verification.

Acknowledgements

Adapted from and inspired by:

License

CC0

CC0 1.0 Universal (CC0 1.0) Public Domain Dedication

Fixity procedures

Policy Statement

York University Library are committed to maintaining the integrity of objects in its care. This includes creating checksums for all archival format objects -- plus associated datastreams -- ingested into the repository, and regular fixity checking of those objects.

Implementation

At the time of ingest an SHA1 checksum value is calculated for the archival format object, and is stored along the object in the repository.

Daily, a set number of files in the repository will have their current checksum calculated (using a single checksum) and compared to this stored value, which is expected to match. In cases where the calculated and stored values do not match, this is reported to the repository manager.

Acknowledgements

Adapted from and inspired by:

License

CC0

CC0 1.0 Universal (CC0 1.0) Public Domain Dedication