Saturday, November 19, 2005

Full resolution and digital content management systems

My attempt at briefly and concisely explaining full resolution images and digital content management systems (specifically ContentDM and MDID):

Best practices recommend that digitized images be stored as large tiff files. Tiff files are lossless compressed files that are considered archival masters. Generally, tiff files are not intended to be stored on a web server.

Most digitization standards and best practices consider full resolution images to be between 1000 and 3000 pixels on the long side, with 3000 pixels most common. Exceptions are for printed materials and original images that can not support the high level of detail. The term full resolution does not imply file type. On most current monitors, images of 1000 pixels in width fill the screen. Therefore, larger images (1000-3000 pixels) can be “zoomed in” to show greater detail. The needs of the project will determine what resolution (pixel size) of image is appropriate.

Digital content management systems (like ContentDM and MDID) were originally intended as a means of discovery for the original full resolution tiff file being stored off line. MDID users can download the largest image stored on the server, and ContentDM users can zoom in to the level of detail supported by the size of the image stored on the server. The images stored on these servers should be jpegs. Tiffs require too much storage space and, due to the large file size, have slow download times. MDID will not allow uploads larger than 16 MB, which most tiff images surpass.

In the digitization workflow, after images are scanned as full resolution tiffs, the next step is to create derivative jpeg images to upload to the server. The resolution (pixel size) of these derivative files depends on the needs of the project. ContentDM users can incorporate this step into the image upload by using image optimization in full resolution archiving. MDID users need to do this before uploading the files.

Full resolution archiving in ContentDM provides a means of organization for large tiff images that are used to create the derivative images stored on the server. ContentDM populates one of the metadata fields with information on where the full resolution tiff file is stored. MDID metadata creators need to provide this information by hand. Without full resolution archiving in ContentDM, the original tiff file is uploaded, and the location of the original file in not noted in the metadata. Because ContentDM will only show 600-1000 pixels of any image, it is important to be able to find the full resolution image being stored off line.

Draft digitization standards from the Indiana Digital Library Summit

Sunday, November 13, 2005

Conference impressions

I was lucky enough to attend two conferences this week.

The first conference was held at Ball State University in Muncie, Indiana, and dealt with digital library projects -- more specifically with ContentDM. The presentations I found the most helpful were those that demonstrated specific projects and solutions to problems. It’s helpful for me to see the thought processes of others dealing with similar projects and problems. And since so many of us are using the same software it’s helpful to see the capabilities others have found.

The second conference was the Midwest chapter meeting of the Visual Resources Association, held at the Indianapolis Museum of Art and at Indiana University. There were two presentations, both of which were extremely helpful.
Kenneth Crews presented “Copyright and education: trends, developments, and future directions.” After hearing his name several times over the past few years, it was nice to see him in person. I learned that the case of Bridgeman vs. Corel determined that there is no copyright protection on reproductions of two dimensional public domain works. This is why visual resource librarians can digitize their slides of public domain works. But it also means that the libraries don’t own copyright on our digital reproductions of those photographs. Photographs that include a public domain work and additional elements chosen by the photography are copyright protected. Also, photographs of three dimensional works have copyright protection. He also spoke about Kelly v. Arriba Soft Corp. (search engines can provide thumbnails of copyrighted works), and cases involving Texaco and Kinkos. Finally, he provided us with his Checklist for Fair Use.
Eileen Fry at Indiana University presented a workshop on the CCO metadata scheme. It’s too bad that we can't input XML metadata into ContentDM. It was good to see how the VRA pros handle certain metadata challenges, and maybe I can incorporate certain elements of this into our metadata definitions and fields.

Saturday, November 05, 2005

Some thoughts

I was at a meeting this week and heard two interesting opinions:
1. Feeds are the new way to discover content, and people don't enter web sites from the front page anymore.
2. Dynamically generated web pages (and related databases) are the wave of the future.

My thoughts about these opinions:
1. True, especially with regards to newspapers. This means that it's even more important for content to be discoverable through other sources than the home page of the website. My impression of librarians so far is that we expect people to discover our sites, but don't really know how to attract our users other than through individual contact.
2. Seems like old news to me, but I was surprised at how many people were creating static web pages by hand to deliver content. It seems like such an inflexible approach. It's so much easier to use a database to make global changes and keep information updated.

My self-imposed project of the week was to explore digitization standards from various institutions. However, our database decided that a certain field only needed to be 950 characters, and that br tags didn't really need the <>, so I didn't get to explore digitization until Friday afternoon, and my Friday afternoons are useless for projects like this.

I did put together some links for more exploration next week:
The first three I read before I started my job, but now that I'm aware of certain issues I will probably pick up more:
A framework of guidance for building good digital collections -- a good resource, and I need to spend time thinking about each point and how it relates to our project.
Handbook for Digital Projects: a management tool for preservation and access. VI: Technical primer -- a good introduction to digitization -- and from Chapter VII:
Working with photographs -- an introduction to digitizing photographs

University of Tennessee's digitization standards -- I need to find more specific documentation like this
Harvard's digitization services bibliography -- some of these are outdated, but a good place to start

And I have to include this because it's a nice set of suggestions for creating descriptive metadata about photographs from the University of Washington for a similar project. Maybe I should hand it out to our project metadata creators.