Digital Library Process:
The Process...Why Digitize?
There is a wealth of information in older printed materials and in special collections documents such as letters and diaries. However, many of these documents are in poor condition, and are too fragile for frequent use. It is important to capture a digital copy of these works before they deteriorate completely. Researchers can use a digitized version for most purposes, saving wear and tear on the original. By digitizing the unique documents in special collections and archives, we make them available to a far wider audience. Researchers no longer have to travel to the place where the document is held; the document can come to them. People who would never be allowed to handle rare documents - schoolchildren, college students, casual researchers, hobbyists - can actually use these historical artifacts in their studies.
Digitization is a long and complicated process. There are many steps involved, as illustrated in the flowchart below. Every project is different, but the four basic stages include:
Stage 1. Select Material
Stage 2. Convert normal text into electronic text
Stage 3. Format electronic text for the Internet
Stage 4. Create website for access and navigation
.
There is a wealth of information in older printed materials and in special collections documents such as letters and diaries. However, many of these documents are in poor condition, and are too fragile for frequent use. It is important to capture a digital copy of these works before they deteriorate completely. Researchers can use a digitized version for most purposes, saving wear and tear on the original. By digitizing the unique documents in special collections and archives, we make them available to a far wider audience. Researchers no longer have to travel to the place where the document is held; the document can come to them. People who would never be allowed to handle rare documents - schoolchildren, college students, casual researchers, hobbyists - can actually use these historical artifacts in their studies.
Digitization is a long and complicated process. There are many steps involved, as illustrated in the flowchart below. Every project is different, but the four basic stages include:
Stage 1. Select Material
Stage 2. Convert normal text into electronic text
Stage 3. Format electronic text for the Internet
Stage 4. Create website for access and navigation
.
Stage 1
Selection
In order to be considered for digitization, materials must go through a selection process. To determine eligibility materials should fulfill the following criteria:
A specific checklist of attributes, access, infrastructure and preservation concerns are included in the "Suggested Collections/Materials to be Digitized" form, available on the Library's web site at digital.library.okstate.edu/suggest.html, or from the Suggestions link on the navigation bar. The Collection Development Committee will make decisions as to which suggested materials will be chosen for digitization. Established collection development criteria and policies will be utilized. Selection for digitization requires that materials have enduring value and be available in a sufficient number or quantity that they form a significant and unique research corpus. Further, the decision to digitize must take into account many factors, as evidenced by the criteria on "Suggested Collections/Materials to be Digitized" form.
In selecting materials, the OSU Library will actively seek out partners, both collaborators on specific projects and supporting partners to supply funding or technical assistance. Institutions such as the Oklahoma Department of Libraries, the State Historical Society, other academic libraries, and other organizations in Oklahoma or out of state will be approached for long-range planning on digitization projects. Foundations and/or corporate sponsors will be approached, and the Director of Library Development and Outreach will facilitate the Library's efforts to prepare grants and solicit monies from funding agencies and corporations. In addition, the Library respects cultural traditions of different ethnic and racial groups in preparing its digital collections; consultation with tribal or other interested organizations will be conducted prior to digitizing potentially sensitive materials. Back to flowchart.
Copyright: The #1 Concern
Securing copyright permission is an overriding concern with all projects. The most immediate problem involving copyright and digitization is identifying what collections or parts of collections can be legally mounted on our web server. The rigor of establishing copyright clearance is not grounds for automatic dismissal of potential projects; however, ease of establishing permission will influence the priority of projects. Digitization projects with clear rights or easily obtained rights should be undertaken first. While these projects are undertaken, rights can be sought for subsequent projects. Back to flowchart.
Preservation
Many of the materials to be digitized will be in a deteriorating state. We will perform all necessary repairs to the original materials before beginning digitization. Preservation of the original is our primary concern, and we will take every precaution to protect the originals from damage. While digitization of fragile materials can prevent wear and tear on the original and can thus act as a preservation tool, it is in no way a substitute for the original material.
Stage 2
To Scan or Re-key?
The condition of the materials will determine how they are converted to electronic form. Very fragile materials, anything printed before 1940, and any manuscripts will have to be re-typed, because the optical character recognition ("OCR") software used to convert a scanned image to text will be unable to recognize the textual characters. We use an overhead scanning device that is less damaging to books than a flatbed scanner. If the print is clear enough to OCR, the documents will be scanned, OCR'd, and saved as text files. Whether scanned and OCR'd or re-keyed, all text will be proofread. Our goal is 99.95% accuracy.
Stage 3
WEB DESIGN & XML
Standards for metadata, scanning and storage developed by the Colorado Digitization Project (now a part of the BCR Collaborative Digitization Program) will be utilized. The BCR CDP Best Practices & Publications are available athttp://www.bcr.org/cdp/best/index.html. It is most desirable to employ non-application specific encoding, such as XML, as this is the standard used by the major digitization projects internationally. XML (Extensible Markup Lnguage) is an application-, platform- and vendor-imdependent format that allows you to mark up a text's structure rather than just specify the layout and appearance as we do in HTML. By using XML, we achieve several goals:
Stage 4
Finished Product
In order to display the XML files on the web, we must prepare stylesheets that will tell browsers how to display the files. The staff will design the website, and we are then ready to present the collection on the web. Depending on the size and complexity of a project, and because of our dedication to preservation and accuracy, it can take several months to complete a project. Once a project is finished, however, the final product may be used and enjoyed by countless people for years to come. Visit our collections to view the results of our efforts.
Selection
In order to be considered for digitization, materials must go through a selection process. To determine eligibility materials should fulfill the following criteria:
- Meet the research needs of faculty, students, and scholars within and beyond the OSU Community. In assessing what material meets the needs of our constituency, consideration should be given to the scholarly content of the material; the uniqueness of the material; and the demand for the material.
- Benefit from increased access and should contribute to the Library's service and collection development missions. Materials that are difficult to access in their original formats or that would benefit from increased speed or depth of access via electronic delivery formats should be given priority.
- Have clear ownership and copyright clearance. Before a digitization project is undertaken, the Library needs to secure sound legal advice about the ownership and rights to reproduce or publish materials electronically.
- Be of interest to potential partners. Materials that would be of interest to campus and outside partners, both collaborators on the content and potential sources of funding and other support, should be given strong consideration.
A specific checklist of attributes, access, infrastructure and preservation concerns are included in the "Suggested Collections/Materials to be Digitized" form, available on the Library's web site at digital.library.okstate.edu/suggest.html, or from the Suggestions link on the navigation bar. The Collection Development Committee will make decisions as to which suggested materials will be chosen for digitization. Established collection development criteria and policies will be utilized. Selection for digitization requires that materials have enduring value and be available in a sufficient number or quantity that they form a significant and unique research corpus. Further, the decision to digitize must take into account many factors, as evidenced by the criteria on "Suggested Collections/Materials to be Digitized" form.
In selecting materials, the OSU Library will actively seek out partners, both collaborators on specific projects and supporting partners to supply funding or technical assistance. Institutions such as the Oklahoma Department of Libraries, the State Historical Society, other academic libraries, and other organizations in Oklahoma or out of state will be approached for long-range planning on digitization projects. Foundations and/or corporate sponsors will be approached, and the Director of Library Development and Outreach will facilitate the Library's efforts to prepare grants and solicit monies from funding agencies and corporations. In addition, the Library respects cultural traditions of different ethnic and racial groups in preparing its digital collections; consultation with tribal or other interested organizations will be conducted prior to digitizing potentially sensitive materials. Back to flowchart.
Copyright: The #1 Concern
Securing copyright permission is an overriding concern with all projects. The most immediate problem involving copyright and digitization is identifying what collections or parts of collections can be legally mounted on our web server. The rigor of establishing copyright clearance is not grounds for automatic dismissal of potential projects; however, ease of establishing permission will influence the priority of projects. Digitization projects with clear rights or easily obtained rights should be undertaken first. While these projects are undertaken, rights can be sought for subsequent projects. Back to flowchart.
Preservation
Many of the materials to be digitized will be in a deteriorating state. We will perform all necessary repairs to the original materials before beginning digitization. Preservation of the original is our primary concern, and we will take every precaution to protect the originals from damage. While digitization of fragile materials can prevent wear and tear on the original and can thus act as a preservation tool, it is in no way a substitute for the original material.
Stage 2
To Scan or Re-key?
The condition of the materials will determine how they are converted to electronic form. Very fragile materials, anything printed before 1940, and any manuscripts will have to be re-typed, because the optical character recognition ("OCR") software used to convert a scanned image to text will be unable to recognize the textual characters. We use an overhead scanning device that is less damaging to books than a flatbed scanner. If the print is clear enough to OCR, the documents will be scanned, OCR'd, and saved as text files. Whether scanned and OCR'd or re-keyed, all text will be proofread. Our goal is 99.95% accuracy.
Stage 3
WEB DESIGN & XML
Standards for metadata, scanning and storage developed by the Colorado Digitization Project (now a part of the BCR Collaborative Digitization Program) will be utilized. The BCR CDP Best Practices & Publications are available athttp://www.bcr.org/cdp/best/index.html. It is most desirable to employ non-application specific encoding, such as XML, as this is the standard used by the major digitization projects internationally. XML (Extensible Markup Lnguage) is an application-, platform- and vendor-imdependent format that allows you to mark up a text's structure rather than just specify the layout and appearance as we do in HTML. By using XML, we achieve several goals:
- The structural mark-up indicates the major divisions of the text (e.g., "chapter", "section", "verse") AND various characteristics of the text (names of people and places, dates, spelling irregularities).
- The file is in an archival format that will easily migrate to new platforms as they emerge.
- XML is emerging as the new standard on the Web. We anticipate that there will be affordable software available in the near future that will allow us to take advantage of XML's structural nature (e.g., fielded seraching).
Stage 4
Finished Product
In order to display the XML files on the web, we must prepare stylesheets that will tell browsers how to display the files. The staff will design the website, and we are then ready to present the collection on the web. Depending on the size and complexity of a project, and because of our dedication to preservation and accuracy, it can take several months to complete a project. Once a project is finished, however, the final product may be used and enjoyed by countless people for years to come. Visit our collections to view the results of our efforts.