Saturday, September 24, 2005


It seems that while Google Print manages to get themselves into hot water with the publishers, and librarians try to prove why we are better than Google, Amazon is trying to stay out of the spotlight. I just discovered a new feature on Amazon that was released in April! Maybe it passed by without me noticing it (it was a busy summer), but it looks like the Washington Post just noticed it last month. Amazon has enhanced their Search Inside features to include a list of SIPs (Statistically Improbably Phrases) and CAPs (Capitalized Phrases). According to Amazon:

"'s Statistically Improbable Phrases, or "SIPs", are the most distinctive phrases in the text of books in the Search Inside!™ program. To identify SIPs, our computers scan the text of all books in the Search Inside! program. If they find a phrase that occurs a large number of times in a particular book relative to all Search Inside! books, that phrase is a SIP in that book.

SIPs are not necessarily improbable within a particular book, but they are improbable relative to all books in Search Inside!. For example, most SIPs for a book on taxes are tax related. But because we display SIPs in order of their improbability score, the first SIPs will be on tax topics that this book mentions more often than other tax books. For works of fiction, SIPs tend to be distinctive word combinations that often hint at important plot elements.

Click on a SIP to view a list of books in which the phrase occurs."

"Capitalized Phrases, or "CAPs", are people, places, events, or important topics mentioned frequently in a book. Along with our Statistically Improbable Phrases, Capitalized Phrases give you a quick glimpse into a book's contents.

Click on a Capitalized Phrase to view a list of books in which the phrase occurs. You can also view a list of references to the Capitalized Phrase in each book.

For example, if you're looking at a Sherlock Holmes mystery, you can click on "Professor Moriarty" to see a list of books that feature or mention Holmes's nemesis."

The SIPs basically create subject headings for the book, and the CAPs include every person named in the book (which is more than librarians do when creating MARC records). Both of these fields link to the same words in other books. It kind of reminds me of our catalogs, only created by a computer. Although the SIPs aren't using controlled vocabulary, it would only take some mapping to link it to other books with the same subject.

I was surprised when I found this feature, and hadn't heard about it before. While it seems that I hear about another cool Google feature every day, Amazon is quietly releasing their new features.

Saturday, September 17, 2005


This week's exploration took me into the world of OAI-PMH (Open Archive Initiative-Protocol for Metadata Harvesting). What a powerful tool for librarians! I was exploring this topic because I am facilitating the transfer of our metadata to a service provider on Monday. I found a great tutorial that explained the entire concept from the beginning to how to use it. My only question is why didn't any of my teachers in library school mention it? Actually, I think someone mentioned it once, but didn't go into any detail. Maybe they didn't teach it because it's hard to understand theoretically (as with most of technology). It's much easier for me to understand this type of thing if I have hands on practice. Maybe library schools should provide computer-based services for their students to play with, so that we can see how concepts relate to each other, and how they are used, like MARC records, METS, integrated library systems, webpages, databases, OAI files. . . .

The week's effort paid off when I learned about Google Site Maps. They accept OAI files. I can submit the URL to our oai file so that Google can learn about the structure of our website. They say it won't increase the ranking of our site, but it will let them know more about it.

Saturday, September 10, 2005


When I created this blog, I decided on the spur of the moment to call it Metalibrarian, short for Metadata Librarian. However, the more I’ve thought about it, the title makes sense. This blog is a librarian writing about librarians or librarianship. A Google search on metalibrarian turns up a few hits (although this blog isn’t there yet), but it seems that no one has used this word often. I did find an interesting article by Stephen Abram from Information Outlook (June 2004): “What About Us? The Meta Librarian: Information for Information Pros.” In it, he discusses librarians’ need for information about our profession. He makes a nice point by saying, “Every article, book, list posting, discussion thread, and blog entry is a gift to the profession.” Librarians need to share information about our projects with each other so that we can benefit from other’s experiences. As a new librarian, I agree completely with this statement. Every little piece of information I can gather helps me learn about my profession and makes me think about what I’m doing, and the long-range consequences of my decisions.

I also spent the week thinking about the prefix “meta.” Where did it come from, and why do we use it the way we do? According to Wikipedia, “The current English usage is accidental, deriving from the classification of Aristotle's works to include the category of metaphysics, which could more or less be described as the study of the physics of physics itself. This was initially merely the extras left over from the physics category.” According to this entry, the Greek prefix meta has several meanings including a prepositional use meaning with, or the verb use connoting change.

So where did the word metadata come from? The Oxford English Dictionary doesn’t give metadata its own entry, but lists the word in the definition for the prefix meta. According to their article, “metadata” was first used in 1987 in the Philos Trans Royal Soc. (OED abbreviation) with the sentence, “The challenge is to accumulate data..from diverse sources, convert it to machine-readable form with a harmonized array of metadata descriptors and present the resulting database(s) to the user.

I also learned from that the word Metadata is a trademarked name a company. They trademarked the name in 1986, before the word had a common usage.

The more time I’ve spent learning about metadata, the more I realize how little other people know about it. Even more surprising is the fact that few librarians even understand what metadata is. Maybe the metadata librarians’ gift to the field should be educating other librarians as to what our job is really about. After all, shouldn’t all librarians understand what other librarians do?