Sunday, June 19, 2005

For a forgettable memory

[Forwarding from SPARC Open Access Forum.]

Dr T B Rajashekar - A Tribute

Dr Tarikere Basappa Rajashekar, who was the Associate Chairman of the National entre for Science Information (NCSI) at the Indian Institute of Science,Bangalore, was killed in a road accident near Bangalore on 3 June 2005.

Raja, as he was known to his close friends, was an achiever. When many others were talking about digital libraries, he started working on building one. He was clear in what he wanted to do and he went about it with dedication and commitment and what is more he knew his limitations and never attempted to do anything in which he did not have the required expertise. He was serious about his work and never would waste his time or energy in fruitless pursuits.

When new technologies came in quick succession and transformed the way we handle information, Raja was quick to learn those technologies and apply them intelligently in the areas of database management and information dissemination.

Raja's role in the creation of the Nation's first computerized current awareness service in the early 1980s at NCSI is commendable. He was one of the earliest in India to use COBOL and database, in both of which he had a rich experience and knowledge, in library applications. He had an exemplary skill in programming and an innate ability to understand, apply and disseminate newer programming paradigms to stay current all the time.

Right from the early days of NCSI, Raja not only led the young team from the front with his insistence on discipline and timely delivery of quality services but also played a key role in capacity building. He was largely responsible for the content and curriculum of the 18-month training programme on information and knowledge management at NCSI, which is unlike any other programme taught anywhere else in India. The curriculum always reflected the most recent developments. Not only did he teach the key courses (information and knowledge organization, digital library and information services in enterprises,internet information resources and services) but also helped his younger colleagues to acquire the skills to teach state-of-the-art courses. Many of the professionals trained by him occupy important positions across the country and elsewhere and have been contributing immensely to the growth of
information science. The fact that they stayed in touch with him and continue to hold him in reverence, vindicates the fact that Raja was not only a great and
inspiring guru but also a fine gentleman. Raja had become synonymous with NCSI.

Raja set up India's first interoperable institutional open access archive, but was dismayed at the rather slow pace at which it was filling. When it was pointed out that he should be more proactive and meet with faculty and talk to them about the archive, he took the suggestion in the right spirit and at the time of his tragic death there were more than 2,000 papers in the archive. He had conducted many training programmes on setting up open access archives. Raja was also largely responsible for marrying the Greenstone digital library software with an earlier version of the Eprints software (when the latter did not support full-text earching), and a few months before his death he and his colleague Francis Jayakanth, in collaboration with researchers at the Old Dominican University, developed two approaches to make CDS-ISIS databases OAI compliant. Raja was also the first in
India to set up an e-mail based electronic discussion forum for the library and information science (LIS) professionals. Since its inception in 1994, he moderated the LIS-Forum democratically.

Born on 2 November 1954, Raja took his bachelor's degree in library science from the Mysore University and the postgraduate Associateship in documentation and information science from the Documentation Research and Training Centre, Bangalore. After a few years at the National Informatics Centre, New Delhi, he had a brief stint at the British Council Library in New Delhi, where he impressed everyone with his technical savvy. It is then he moved to the Indian Institute of Science, Bangalore, and worked quietly and earned a reputation as a performer. Along the way
he also took a doctoral degree from the Poona University.

A very shy person, Raja would keep himself away from the limelight. Deeply committed to his family, when he took a sabbatical in 2000 he did not move out of
Bangalore. He worked for Informatics (India) Ltd, a Bangalore-based company, and developed an excellent multidisciplinary current awareness tool, an e-journal
portal and gateway called J-Gate, to aggregate thousands of journals. He also wrote a series of essays on digital libraries.

Raja was on many committees, including the CODATA, and had contributed to the development of INFLIBNET and INDEST. He was also elected Fellow of the Society of
Information Science (India).

When he was on course to achieve much more, fate has snatched him away from us. The large number of condolence messages received from within the country and elsewhere is an indication of the regard he had earned. Once he remarked to his students, what
mattered in life was what one left behind for others to remember and continue. By that yardstick he has done extremely well. The best tribute the LIS professionals in this country could pay to Raja is to set up institutional open access archives as soon as possible and fill them with papers, modernize their curricula and teach their students the values practiced by him.

Subbiah Arunachalam & N Balakrishnan

Sunday, June 05, 2005

Target Your Brand

Library Journal

Target Your Brand
Build an identity that works in the age of the superstore

As bookstores and the Internet march forward, the library community continues to question and forecast its role in society. Innovative libraries nationwide have seized the opportunity to reinvent themselves, bringing a new level of excitement to the industry. Yet puzzlement remains on what strategies, what roles will work in communities where alternatives are so readily available.

A cue from corporate America is to deal with a changing competitive structure by more effectively managing the brand the library holds in the community. A slippery concept that's often confused and misused, a brand is the definition of your institution that exists in the mind of the customer, says Chris Pulleyn, chief executive officer of Buck & Pulleyn, a Rochester, NY–based agency specializing in brand strategy. Your library's brand is the space you've captured in the minds of customers—it's all the things that come to mind, all the expectations they have, when they hear the word library.

The Borders lesson

"There's home, office, and a third place," says Jenie Dahlmann, the Borders' corporate communications manager. Borders wants to be that third place. "It's a place where you can relax and explore…where you can stay in a comfortable, community atmosphere. Everything we do in the store fosters that."

Borders excels in creating what could be anyone's third place; everything about the environment says, "Stay awhile." The light is flattering, the colors are warm and modern, and the merchandise mix is interesting and stimulating, without being overwhelming. Customers are invited to sit on benches in the store's wide aisles, examining media of all types, with no pressure to buy…like a library.

Lines are blurring as alternatives appear for what has traditionally been the sole domain of the library. Indeed, Dahlmann says, elements of Borders' "Stay and explore" strategy have been inspired by libraries. Informal lectures on a variety of topical issues round out a monthly programming schedule that includes book signings, poetry readings, and book discussion groups. Cafés include wireless connections for businesspeople in transit. But these bookstores are certainly not the only competitor to traditional library service: the Internet has become so firmly ingrained as America's default research tool that the New York Times proclaimed, "Google has become a verb, a way of life, and a new add-on for the brain."

Brand promises

Is a brand that exists in the mind of the customer out of our reach? The entire discipline of brand management has sprung up around this concept, providing a methodology for shaping that mental image and leveraging it to gain a competitive edge.

Good brand management is the crux of the business model, driving the strategy behind the experience you provide the customer. It begins with understanding what your institution currently means to customers (brand identity) and what you want it to mean (brand aspiration). While plenty of organizations know how they want to be viewed, what mental space they want to capture, staying true to the brand aspiration in every aspect of the customer experience—from graphics to service to collection—is what separates those that succeed from those that are merely ambitious.

The brand aspiration is the benchmark against which all outreach is measured. Most libraries have a mission statement. If they're brand driven, the mission statement puts into words how the brand aspiration will be delivered. It makes what Pulleyn calls a "brand promise" to the customer. Web presence, programming, policies, services, collection, and expectations of staff are the ways in which the promise is either kept or broken.

Think Target

To see the power of an effective strategy, consider the outrageously successful Target brand.

Target stepped out of the pack of retail also-rans when it spotted an opportunity in the market to make discount shopping better than it was. It could capture a segment of customers who had found discounters lacking. Target made a promise to customers to make discount shopping more stylish, more efficient, more pleasant…cool. That brand—captured in its slogan "Expect more, pay less" and now firmly embedded in the minds of customers—is driven through all aspects of the Target retail experience.

The merchandise mix is consistently stylish, with low-rent products from high-rent designers displayed within a traditional discount store structure. The appeal is decidedly youthful (a nod to the market that's the ultimate arbiter of what's cool and what's not), yet, more traditional styles and products are also available to ensure that a wide variety of needs are met. Keeping the aim at youth also allows Target a continuing source of fresh customers and freedom from the threat of an aging customer base—a particularly pertinent issue in the library world.

The service at Target is structured like a discount store—primarily self-serve—but it is exceptionally efficient, with processes that make returns and checkout faster. Again, the customer can "expect more" in a "pay less" framework.

Target has defied what had been the natural laws of discounters: it has infiltrated the world of high style, with magazine coverage that places Target merchandise in the seasonal "must-haves" columns, and expanded its customer base to include those who had never before set foot in a discount store. Target has made discount shopping cool.

But is it "branded"?

Brand management is often narrowly and inappropriately defined as the development of a consistent look and logo, but that definition misses the power of a real brand strategy. Would Target have the dominance it enjoys in the marketplace if it relied solely upon its signature red and white bull's-eye? American corporate history is littered with good-looking failures. Three elements make Target an effective brand—careful selection of the market niche, the promise to the customer, and the delivery on the promise in every customer interaction.

At Buck & Pulleyn, they avoid using brand as a verb. Pulleyn says, "Branding is done to a cow. We try to always refer to a brand strategy or to brand management to keep the whole process top of mind."

Pulleyn refers to libraries as "experience brands," meaning the brand is experienced by all the senses. As at a Starbucks or Target, the customer is immersed in the brand, making it easy to break the promise to the customer if a single element provides a conflicting message.

The moral of the story then is to view every interaction with the customer as an opportunity to seal the brand. Conversely, if bobbled or inconsistent, every interaction can be a threat to the brand.

A signature design

But, what about the visuals? We are a people who judge books by their covers, which makes graphic design a powerful tool for your brand.

The elements of a good design strategy are fairly simple. Choose a scheme that fits your brand aspiration and use it consistently. Over time, this will marry one look to the brand image. More importantly, it sends a single message; it tells your customers that all elements of the library are aiming at the same goal.

Consistency of design comes in developing a palette that includes a small variety of typefaces, layouts, and colors. Starbucks' design strategy is a model of consistency in action. Store interiors, packaging, and advertising are variations on a theme—a coffee-driven theme. Customers have been taught to spot Starbucks' packaging and Starbucks' stores at a distance because of their consistent design. More important, of course, is the consistency of the coffee experience that will follow.

Translated to a library environment that means signs, handouts, brochures, newsletters, library cards, web sites—everything that can carry a physical imprint of the brand, should.

The lure of continually refreshing yourself and your staff with a new look is powerful, but it is also a threat to the brand image you're building. Establish the palette and stick with it. When assigning the creation of a graphic, whether with an outside firm, through a freelancer, or internally, make sure the designer understands the rules—what elements can be varied, what must always be the same.

A good place for design variation is when particular customers are being targeted. When you develop the design palette consider variations within the scheme that allow you to subbrand its departments. For example, a whimsical version of your design scheme can be developed for children's programs.

Calling on the cavalry

A distinctive and consistent look adds the polish that reminds customers they're in the hands of pros, an important consideration in a competitive environment. One look inside a Barnes & Noble shows the kind of design sophistication libraries are up against. Good design comes from good design firms—an investment that can establish both the look and a structure that carries the brand image forward.

Keeping a professional firm on call for all the library's design needs rarely sits well with boards and budgets. If that is out of reach, consider bringing in the pros to develop design templates (for multiple media and multiple uses) that will be used by internal graphics staff or freelancers. Buoy that strategy with an annual design "checkup" to be sure the tone and execution are not straying or substandard.

If that's still too rich, Pulleyn suggests contacting a local design school and running a contest to establish the library's design scheme. Establish contest rules that support the library's needs and be sure the students understand the brand aspiration.

Where to start?

Pulleyn says brand strategy starts with a comprehensive analysis of the environment, including the competition; the library's strengths, weaknesses, opportunities, and vulnerabilities; and, most important, an understanding of the library's current brand—what customers think now. This analysis is the foundation for a successful plan and where to invest if you have limited dollars.

With about $20,000, a library can expect an experienced outside firm to handle this kind of market analysis and deliver the guts of an effective brand strategy: recommendations for brand positioning, key messages, personality, tone and feel (the elements that will drive the design scheme), and a broad-strokes marketing plan. Handled with care, this one-time investment will guide the development of a new identity or secure one that's already working.

Why bother?

Few libraries beg for business. In an industry where institutions are chronically understaffed, why invest in a strategy that will likely bring in more business?

Brand is about identity, about clearly defining value and contribution to the community. Good brand management provides the answers before the questions are raised—whether those questions come from staff or taxpayers or board members. It aligns staff and aims the organization toward a single goal.

More poignant, consider what's lost without a careful cultivation of a brand that's relevant to the community. The associations with the word library are powerful and emotional—equal access to information as a fundamental democratic value, civic pride in having a solid resource, an intelligent and economical choice. The ability to secure those feelings for at least another generation is within reach.



Author Information
Beth Dempsey (beth@bethdempsey.com) is principal of Dempsey Communications Group, a firm specializing in strategic communications for knowledge organizations

How To Make a Strong Brand

Life used to be so simple: A brand was a name, like Coke or IBM. But the science of brand management simply defines a process for owning a space in the psyche of the customer.

FIND YOUR BRAND. What is the space you want and are able to own in your community? Don't hang your hat on a service or a product—they're easy to replace. Think big but maintain credibility. Also, it has to be unique and valuable. That requires understanding the community you serve.

DEFINE IT. What does your identity look and feel like? What does it do? Consider the Borders brand of the "Third Place." If Borders wants to be the place its customers go when they aren't at home or work, then they must encourage them to stay and explore. What needs to happen in the store to foster that?

GIVE IT A LOOK. Look at your library's logo, its printed pieces, and signage. What about its display cases? Is there a consistent look and what does it say about the library's identity? The simple, modern lines of the Salt Lake Public Library logo, inspired by books on a shelf, blend traditional library service in a modern context. The lines are repeated in all the library's printed pieces, with a consistent layout and typeface. Customers can quickly identify a mailing from the library by its look. Your library's look is its signature—make it compelling and make it consistent. (For inspiration, take a stroll through Starbucks. Look at the packaging and signage and count the number of times you see the company's logo.)

EXECUTE IT. To build ownership of an identity, the brand messages need to be consistent in the collection, staff, programming, service, web site, advertising and PR, and even buildings. In both Salt Lake and San José, the public libraries' brands of "community gathering place" are translated through the structures, with comfortable areas for gathering and access to food and drink. But before you undertake structural changes, realize that the collection and staff are the primary carriers of the brand. If your library aspires to be a community place but the collection caters to a small core of users, then it's not a community place. (In San José, use of the library tripled over a decade just through changes to the collection in language and media and greater emphasis on youth and children.) Likewise, staff need to be included in brand translation. Talk with them, train them, allow staff to help define the service that makes the brand alive in human interaction.

LIVE IT. Building a brand identity takes discipline and a commitment over time. Don't let your library's focus on its identity fade. David Aaker, author of Managing Brand Equity and Building Strong Brands (Free Pr.), recommends charging someone with brand oversight, to ensure it drives the entire organization.

When the Brand Leaves the Building

Publications, library programs held offsite, brochures, all are opportunities to leverage the brand beyond the walls of the library. Good examples of how to do it abound, but the ultimate traveling brand translation is perhaps the New York Public Library Desk Reference. Now in its fourth edition, the book has introduced the NYPL to millions worldwide. It's a model of effective brand strategy.

It married the prestige of its name to the convenience of a desk reference to address an opening in the market for an NYPL reference service "to go." The product reinforces the library's service to two groups of users: researchers and leisure users. Reviews consistently point to the ability of the book to address quickly virtually any reference question succinctly and accurately while providing a nearly endless amount of fascinating trivia that can be perused for pleasure. The packaging carries NYPL's signature lions, making it impossible to mistake the ownership of content.

Gardiner's List

Most library publications and programs can be used to elevate or secure the institution's brand. Consider Gardiner Public Library, a one-building system serving an inland-Maine community. The library positions itself as the community resource for reading and has gained worldwide attention for a small publication it produces called Who Reads What?It was begun by now-retired director Glenna Nowell as a way to spread reading interest throughout the stacks, with less dependence on the best sellers table. Nowell noticed that when celebrity interviews happened to include a book recommendation, there was demand. There was obvious interest in the information, yet no one was collecting it. She began to write to celebrities, asking them to recommend a favorite book to her patrons. They wrote back, and she began publishing the recommendations annually. When the Associated Press caught on, Who Reads What? became an annual press opportunity, with the Gardiner list written about worldwide.

This innovation supports and spreads the Gardiner brand of "community reading resource"—what greater evidence of being the center of the reading universe than having celebrities answer your query about their favorite books? Though design changes have occurred throughout the nearly 20 years it's been published, the list continues to carry the library's design signature—a pencil drawing of the library itself—a beautiful, vintage building.

Rules of thumb for letting the brand out of the building: be sure the program or publication takes the brand message with it in both purpose and execution; stamp it with the library's design signature.

8 TOP Ways To Promote Your RSS/XML Feed For MAXIMUM Exposure

RSS (Real Simple Syndication) is the new technology on the block and is taking the Internet by storm as Internet marketers are hurrying to incorporate this new form of communication and technology into there existing online businesses to Maximize there exposure online with NEW and/or Existing customers.

I recently received a post on my Blog from a fellow that inspired me to write this article since I found the nature of the topic important to ALL who are serious about getting the most out of there RSS feed and this new technology.

I'm pretty sure you can guess what he asked from reading the headline of this article.

Well... I did some research on what he had asked of me and came up with...

"8 TOP Ways To Promote Your RSS/XML Feed For MAXIMUM Exposure"

So, with that said, let's dive into the first and MOST important step to Maximizing your RSS feed for the exposure it deserves.

Step #1. Build a dedicated webpage for your RSS feed.

This is probably the most important part of getting the most out of your RSS feed: building your own dedicated webpage for your RSS feed subscription.

The KEY here is to give your potential readers many options to adding your RSS feed.

The best way for me to illustrate this to you is by clicking on the link below that leads to my dedicated RSS feed subscription page so you can see first hand what you need to do to get yours started.

Click here: http://www.internetwondersezine.com/rss_feed.html

Did you notice all the different options I give?

That's what you need to do.

Now, for those of you who aren't so web savvy, don't worry, I have something for you that will auto-generate a RSS feed webpage for you within minutes if you already have a Blog or RSS feed.

The service I'm talking about is called FeedBurner.com -- http://www.feedburner.com -- and is a free service for you to sign up for.

Here's what my webpage through FeedBurner.com looks like so you'll have an idea of what yours will look like. http://feeds.feedburner.com/TheInternetWondersBlog

Do you see all the options they give your potential readers to add your RSS feed to there RSS readers?

Once you've accomplished one of the two options above your all set to start promoting your RSS feed for MAXIMUM exposure.

Step #2. Add links to your RSS feed webpage on your website.

This is yet just another way to pull your visitors towards your RSS feed webpage by simply adding Text or Graphic links to your existing webpages.

Make sure you put them in Highly visible area's where your visitors will see your links.

I would would put one at the top, middle and bottom of your website.

This really depends on what kind of website you have so you'll have to use your own dicression.

Here's what I have done on my website to give you an example. Click here: http://www.internetwondersezine.com/

Step #3. Add this HTML tag to your RSS feed webpage.

Here's something you can add to all your websites webpages that will get the attention of Search Engines spiders to come on over and check out your RSS feed further.

Simple add the following HTML tag to the of your document:

link rel="alternate" type="application/rss+xml" title="YOUR SITE TITLE RSS Feed" href="URL TO RSS FILE"/

(note: add this <> to the end after the forward slash /)

Step #4. Some idea's if you have your own newsletter.

If you have your own newsletter like I do, here's a couple ways to get your visitors to visit your RSS feed webpage.

Add a link on your "Thank You" page that leads to your RSS feed webpage. Whether its a Text or Graphic link.

The key here is to get your RSS feed link in front of your readers as much as possible to get them to add your RSS feed to there readers.

The next one is, add a link inside your "Welcome" email that's sent out to your New newsletter subscriber after they've subscribed.

This again, will give you another chance of getting them to add your RSS feed to there readers.

Step #5. Put together a "Signature File".

Here's another couple great ways to get your RSS feed webpage more exposure every time you send out an email or post to any online forums is by simply putting together a "Signature File".

Now, every time you sent out an email to your list and/or business contact you can attached your "Signature File" at the end of every email you send.

The same goes for online forums, every time you make a post or answer someone else's, your "Signature File" will be automatically attached.

Your "Signature File" doesn't have to be a huge, a few enticing lines will do fine with your RSS feed URL.

Step #6. Submit your RSS feed to RSS Directories and SE's.

Another great way to give your RSS feed more exposure is by submitting it or them to RSS Directories and Search Engines.

I've listed a few resources for you below to get you started with.

- Feed Shark http://feedshark.brainbliss.com/

- Ping-O-Matic http://pingomatic.com/

- RSS Top 55 http://www.masternewmedia.org/rss/top55

Step #7. Write an article, if you write articles.

This is a great way to get your RSS feed webpage in front of thousands of targeted readers absolutely FREE!

Simply write an article on a HOT topic within your niche, and at the end of your article add an enticing, attention grabbing "Resource Box" that points to your RSS feed webpage URL.

Step #8. Set-Up a PPC(Pay-Per-Click) campaign.

For the last step to getting your RSS feed webpage Maximum exposure is to set-up a PPC campaign.

By doing this you will be able to send HIGHLY targeted traffic to your RSS feed webpage that are hungry for the information you have to offer.

The only downside to doing the PPC tactic is that it will cost you.

And, this tactic solely depends on whether or not you chose to set-up an RSS feed webpage of your own.

Now, I'm sure there are many others ways out there that could draw in more visitors to your RSS feed, but, these ones I just outlined in this article are the Best ones in my mind and are the ones I use.

Well, this concludes the, "8 TOP Ways To Promote Your RSS/XML Feed For MAXIMUM Exposure", so, the only thing I have left to say to you is... Get Started!

Access Allowed

Sue Bushell, CIO

Thanks to digitization and the Web, institutions like the National Library of Australia, the Australian War Memorial and the National Archives have changed their view of service delivery and are rapidly transforming themselves from providers of collections to providers of access.

For much of this vast country's history the tyranny of distance has meant rural Australians have been pretty much denied access to our largely Canberra-based cultural institutions, and research has traditionally suffered most.

Now, thanks to digitization and a Web presence, institutions like the National Library of Australia, the Australian War Memorial and the National Archives of Australia have found their mission transformed. With a new-found ability to deliver a service beyond their doors, their focus has drastically changed, from being mere providers of collections to providers of access.

"Cultural institutions such as the National Library have dramatically reformed their view of service delivery and access with the online environment," says Roxanne Missingham, assistant director, resource sharing, with the National Library of Australia (NLA). "What it really means is that we can deliver a service that extends beyond our walls and is truly national in reach.

"We are also undertaking extensive digitization of material in our collection, so that all Australians can access our resources and services through the Internet wherever they are, as are the Australian War Memorial and National Archives of Australia. This involves considerable involvement in the development of international standards.

"In addition to using the Internet as a channel, cultural institutions have completely reformed their service models and technological solutions."

It is through collaboration that much of the advances have been achieved. By working with each other, plus their state and overseas counterparts, many of our cultural institutions are making vast strides in ensuring the preservation and future accessibility of the records that document the key government decisions that impact so many spheres of Australian life and culture.

"As for what can be gained by working collaboratively? More and more people know about their national collections and they understand that they can see them and gain knowledge from them," says Anne Lyons, assistant director-general, access and communication, National Archives of Australia (NAA).

For instance, up to a million visitors a year trek to the Australian War Memorial (AWM), many looking for a chance to feel closer to the experience of a family member or close family friend who served in conflict. Each year since it was opened to the public eight years ago some 34,000 of those visitors have spent time in the reading room. Now some two million intrepid visitors are trekking to the memorial annually online.

"The really big story that technology has allowed us to do, which wasn't there 20 years ago, is that we've now got two million visitors that are staying 15 minutes at our Web site who are accessing the stuff that we can provide, and that's what technology has allowed us. So it really was a case that if you build it they will come," says Mal Booth, head of the AWM's Research Centre.

"Not that they won't keep coming through the front door - they certainly do that - but they have now found a new resource there."

Working Together

Where service agencies struggle to collaborate in the face of stovepipes and competing agendas and missions, Australia's cultural institutions have been leading the charge on collaboration.

"Australia has a long tradition of cooperation between libraries," says Missingham. "The national network of libraries has been strongly supported by the National Library of Australia for many decades."

In such a large country, with a network of public, state, university, research and special libraries spread across 7.7 million square kilometres, libraries in Australia have always worked together, their cooperative ethos built on the recognition that the national collection would inevitably be distributed in libraries across the nation.

Our libraries hold rich and diverse collections. In fact the 4850-odd libraries in Australia (excluding primary and secondary school libraries), have built significant collections over the past two hundred years and together have a stock of some 75 million volumes, borrowed at a rate of 193 million volumes a year.

Like other cultural institutions, the NLA must meet long-term objectives like building its collection, while continuing to evolve to meet the ever more sophisticated demands of its customers. Above all that means providing fast and convenient access to library collections and services, and whenever possible letting those users discover and obtain information not only from its own collection, but that of its partners.

Digital technologies and the Internet continue to provide major opportunities to reach new audiences, to streamline and broaden its services to innovate. One response has been PANDORA (Preserving and Accessing Networked Documentary Resources of Australia), an ever-growing collection of copies of Australian online publications, established initially by the NLA in 1996. As it became clear the volume of material being published and the complexity of the task of collecting it would make it impossible for the NLA to build alone an archive of sufficient depth and breadth, other libraries got on board, starting in 1998 with the State Library of Victoria. Now a total of 10 partners identify, select, seek permission from publishers, archive and catalogue publications and Web sites for the archive.

The NLA stores the archive centrally on its server. It takes responsibility for maintaining PANDORA on behalf of partners, backing it up according to standard IT management practices, and taking preservation action over time as required. With the rapid disappearance of many Web sites, the PANDORA archive now holds the only copy of many significant resources, such as the Sydney Olympic Games, state and federal elections, and Centenary of Federation Web sites.

The NLA also provides online access to the Australian National Bibliographic Database. Kinetica is an Internet-based service giving Australian libraries and their users access to the national database of material held in Australian libraries, known as the Australian National Bibliographic Database (ANBD). Kinetica lets users search for any item and locate which library in Australia holds it and also provides gateways to other major library databases.

Some of the Kinetica aids include: cooperative cataloguing, so that Australian libraries can reduce the costs of their cataloguing by using records created by others; interlending, allowing libraries to share resources by borrowing (or receiving copies) of library materials; cooperative collection development, to allow libraries to reduce the amount of duplication in their collections; and access to the collections of Australian libraries for individual researchers, enabling identification of relevant material in Australian libraries and online.

The service is used by some 1100 libraries. With more than 38 million holdings, approximately 14 million bibliographic records including over 574,000 electronic resources, it forms an essential tool for Australian libraries in all sectors - public, special, academic, technical and further education, health, corporate, law, state and national. More than 6.5 million searches were undertaken on the service in 2003-04.

Kinetica also supports cooperation and resource sharing within the Australian library community through the delivery of MARC records and the provision of a document delivery service. In addition, there is an incentive scheme, which offers a search rebate for libraries that contribute records and/or holdings to the ANBD.

The value of the service to Australians was recognized by the Senate Environment, Communications, Information Technology and the Arts Reference Committee's report on "Libraries in the Online Environment", which recommended providing the NLA with additional funding to provide improved access to Kinetica for all Australian libraries and their users.

In response, the NLA is redeveloping Kinetica to provide a more modern, standards-based service, and to find ways to increase access by Australians to the nation's collections. After all, Australia is one of a handful of countries with a national database of this kind, and it is too valuable a resource to be kept a secret from the public.

"We are half-way through a major redevelopment of the Kinetica service where we are taking the significant step of integrating access not only to collections in libraries, but also to put in place interactive links, using APIs (application program interfaces) with Australian and overseas booksellers so that users can truly get material," Missingham says.

The new service, known as Libraries Australia, is set to revolutionize the way Australians can find and get information resources for their research, study, work or leisure. It has the potential to change the way libraries deliver information to all Australians. Libraries Australia gives users access to resources whether they are available online, through libraries or through booksellers or document supply services.

Shrinking Boundaries

The boundaries between libraries and other cultural institutions are becoming increasing fluid. One result is PictureAustralia, an Internet-based service that allows users to search many significant online pictorial collections at the same time, and which proponents cite as a model for further collaborative work.

PictureAustralia provides access to images that cover all aspects of Australiana, from artworks to photographs and objects like sculpture. It contains approximately 1.12 million images in total, with about 10,000 new images added every month

Users doing a search in PictureAustralia can transparently search images from 40 museums, galleries, libraries and universities, including those digitized by Scottish and New Zealand institutions. It averages more than 330,000 page views a month, or more than 72,000 searches, with usage growing by a staggering 64 percent last financial year alone. Using this service, a user might search on "St Kilda"to retrieve images from all the agencies that hold relevant material, including the Nolan Gallery, the National Library of New Zealand, the National Archives of Australia, the State Library of Victoria, and so on.

The search results in sets of "thumbnail" or preview images. Clicking on one of those takes the user to the Web site of the relevant agency to view the full-size version and where they can, if desired, order a high-resolution copy. Users move between PictureAustralia and the participating agencies' Web sites using the Back button in their Web browser.

The NLA also launched a companion service, Music Australia, in March 2005. Music Australia provides access to the resources of many different organizations including the National Film and Sound Archive, the Australian Music Centre, Australian Music Online, state libraries and university libraries. With about 140,000 resources, of which over 10,000 are available freely online, the service enables easy access to scores, sound recordings, Web sites, books and theses, archives, pictures, moving images, multimedia and other resources.

Meanwhile the National Archives of Australia has made more than five million items available online and aims to have 10 million by the end of 2005, placing it among the top five professional online repositories in the world.

"It's not so much the 'new' online services the Archives is providing, but rather the extension of the current service that is making our clients happy," reports the NAA's Lyons.

"More and more publications are now available online and together with our cost recovery e-commerce system this means we are able to provide a better service to our clients."

For example, in March some 1121 people downloaded John Curtin, Guide to Archives of Australia's Prime Ministers. "When you stop to think that the guide retails for $19.95 you can see that providing clients with a PDF can save them a considerable amount of money," she says.

The NAA is keen to open up its collection and is integrating its old photos with new technology, continually digitizing images for access online. More than 125,000 images are already on PhotoSearch, accessed via the NAA's Web site at www.naa.gov.au, and each month several thousand more are added.

"Recent thank you letters we've received include a client who said that being able to look at a digital copy of the history of her great uncle brought him alive again for her daughters. Another said that having a picture of the grandfather she never knew helped her understand why her sister had ginger hair! And only last year a client became overwhelmed in our orientation room upon reading for the first time about her father who had passed away before she was born. It's these types of instances that tell me that our online services really do touch people in a personal way," Lyons says.

The NAA has also recently launched the test version of Vrroom. Short for "virtual reading room", Vrroom is a new Web interface for teachers and students to access and interpret the collection of Australian government records. There is a test site at www.vrroom.naa.gov.au.

Vrroom supplies teachers with a growing range of records, ideas for using them and help on using primary sources. For students, Vrroom provides an online research experience. They can choose a question, explore the topic, and then find, annotate and export records. The content in Vrroom is relevant to SOSE (Studies of Society and Environment), Australian history, political science, Australian studies, English, geography, ICT and many more areas of the curriculum.

"IT has enabled us to transform our services from a focus on collections to a focus on access," Lyons says. "In all I think we represent an example of taking technology as a tool to transform our services, thinking always of what the end user wants and how we can liberate access to the nation's resources."

The NAA is also collaborating with state archives and their New Zealand counterparts in an alliance, the Digital Records Initiative (DRI). Lyons says all participants see this collaboration as imperative to ensure the preservation and future accessibility of the records that document the key government decisions that impact so many spheres of Australian life and culture.

In addition, the NAA works collaboratively with many other cultural institutions on exhibitions. For instance it provided much of the archival material for Old Parliament House's The Petrov Affair exhibition, events (for example the National Capital Authority for the opening of the Old Parliament House Rose Gardens), and marketing campaigns like the Australian Capital Tourism's "Summer of Silver" display, and has an ongoing association with the Australian War Memorial with regard to the management of Commonwealth war records. It has also contributed to PictureAustralia.

The AWM offers a range of services on its Web site for family historians and other researchers, including a Research & Family History service providing links to its ReQuest online reference service, its Encyclopaedia and its XML finding aids (written using Encoded Archival Description). There are also biographical databases containing personal data like nominal rolls, records of honours and awards, and what the AWM's Booth describes as "some really touching files" of those wounded and missing in the First World War (contained in Red Cross files), avidly used by family historians.

The AWM also offers three online collections for its general museum collection (including all objects, photos, film, sound, art, military technology, private records), books and official records, and another for its growing collection of digitized documents from official war diaries - currently mostly from the Second World War and Korea, but it is now working on Vietnam and the First World War, and will soon have about two million pages online in these databases.

And there are also First World War Official Histories placed online using OCR (optical character recognition) scanning in image-over-text format and which have been tremendously successful. The AWM says it is nearly finished scanning the Second World War set and they will go up online shortly.

"We're using a couple of technologies that big organizations like Google are just now getting into, and American organizations are just getting into," Booth says. This includes using a combination of OCR and Adobe image-over-text files to put documents online that allows users to see the document but also provides for searchability and text capture.

"I think we've really embraced digital technologies to help us provide wider public access to our collections, now in full colour," Booth says. "This has been done here by combining scanning for preservation purposes (to create an archival copy) with the production of a lower resolution file for online access - thus 'killing two birds' with the one process.

"We have scoped up some significant/iconic private records (letters) collections for scanning over the next year or so and will now attack that as best we can in the current, rather complex, copyright environment.

"With all this content online our next technological challenge is to provide simpler pathways into it for all users and more context and meaning around it. Currently, there are vast resources online, but the users have to go through too many gates, so federated search is a priority for us. And as we are now creating vast volumes of digital assets, we need a system to help us with our workflows, management and storage of these assets," he says.

Streets Ahead

In the services they have provided, and the collaboration they have achieved, Booth says Australian cultural institutions are streets ahead of much of the rest of the world. In both the extent and quality of its offerings and solutions, as well as in the effectiveness of its collaborative efforts between cultural agencies, Australia is way ahead of fellow cultural institutions overseas.

"Google has just made a big thing about going into the major libraries and scanning the great volumes and such," Booth says. "Now there are not a lot of people who put text up online, and we're one of them. New Zealand has done its Second World War histories recently and Prime Minister Helen Clark has been heavily behind it, but they have done it in a very resource-hungry way using the best technologies: I think you can download their versions in HTML and XML, but that sort of markup is very, very time consuming and resource-hungry.

"We've found that certainly we didn't have those resources so we did it in a much cheaper fashion."

The NLA's Missingham points out that like many of Australia's other cultural institutions, the NLA will often hold the only copy in the world of an item like a painting, a letter from a prime minister or an original map, many of them extremely fragile. By digitizing such items, the cultural institutions can provide all Australians with access to their precious heritage.

"The Library is currently working with state and university library sectors to secure funding for a major digitization project to put historic Australian newspapers online. Newspapers are a great untapped research resource and if we create a full digitized database of out of copyright Australian newspapers all Australians will be able to connect with their history," she says.

"And I think we are probably leading on collaboration, and it's probably because we've got a strong history of collaboration because we've always known no library industry could buy everything they needed right from the very early days."

The secret of that collaboration? It has been underpinned, Missingham says, by developments like the Open Archives Initiative and the protocol for meta data harvesting developed by the NLA. But its main basis has been the subtle recognition that the cultural institution's major clients are many individuals who now have access to the Internet - authors, journalists, historians and academics, and also family historians as well as those with personal and recreational interests. If the AWM, the NAA and the NLA can work together to provide these clients with desired services, they can much better fulfil their mission.

"There's always been a bit of a culture amongst librarians and archivists that they're fairly cooperative people and they're now starting to realize the benefits of cross agency collaboration," Missingham says.

Wednesday, June 01, 2005

The Importance of Open Access, Open Source, and Open Standards for Libraries

Edward M. Corrado Systems Librarian The College of New Jersey corrado@tcnj.edu
Abstract
The open access, open source software, and open standards concepts have been garnering increased attention in the field of librarianship and elsewhere. These concepts and their benefits and importance to libraries are examined. Benefits include lower costs, greater accessibility, and better prospects for long-term preservation of scholarly works.
Introduction
Open access, open source software, and open standards are three concepts that have been receiving increased attention lately in the library world. Open access is seen by some as a possible solution to the increasing price of serials and as a way for governmental funding agencies to receive a better return on investment. Open source software can benefit libraries by lowering initial and ongoing costs, eliminating vendor lock-in, and allowing for greater flexibility. Open standards allow for interoperability to exist between diverse library resources and eases data migration between systems. All three of these concepts are important to libraries individually and they can be even more beneficial when they are leveraged simultaneously.

Open Access
Open access to scholarly information has been a hot topic for debate among librarians, scholars, and publishers over the last few years. Recent proposals by the National Institutes of Health (NIH) in the United States (requiring for scholarly works that come out of NIH funded research to be made available via NIH's PubMedCentral open access database), by the government in the United Kingdom (requiring that all UK government-funded research to be available via open access), and by others has expanded this debate. Various different, though similar, definitions of open access exist with the Budapest Open Access Initiative definition being the most widely used (Goodman 2004). Other definitions include the Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities, the Bethesda Statement on Open Access Publishing, and the Washington DC Principles for Free Access to Science. While there are multiple definitions and flavors of open access, open access basically calls for scholarly publications to be made freely available to libraries and end users.
Willinsky (2003) identified nine flavors of open access. The flavors are: 1) e-print archive (authors self-archive pre- or post-prints), 2) unqualified (immediate and full open access publication of a journal, 3) dual mode (both print subscription and open access versions of a journal are offered), 4) delayed open access (open access is available after a certain period of time), 5) author fee (authors pay a fee to support open access), 6) partial open access (some articles from a journal are available via open access), 7) per-capita (open access is made available to countries based on per-capita income), 8) abstract (open access available to table of contents/abstracts, and 9) co-op (institutional members support open access journals).

The growth of the open access movement is partially in response to the enormous costs of many scholarly journals. With traditional journal publication methods it is not uncommon for an institution to have to pay for an article twice. First they pay scholars to produce the work and then the institution's library pays to purchase the work back from the journal publisher. Anderson (2004) is correct that there is no such thing as free information and that there are costs involved in producing scholarly information. However, with the advent of new technologies and software programs, it is becoming increasingly less expensive to compile and distribute scholarly information. By using different funding methods and electronic delivery of journals, the costs can be absorbed by alternative means to subscription fees. One of the great benefits to open access is that libraries in smaller institutions or in economically disadvantaged areas around the world can have greater access to these scholarly resources.

Open access helps to ensure long-term access to scholarly articles. Unlike articles that are licensed in traditional article databases, libraries and others can create local copies and repositories of these resources. Libraries, by working together to make repositories of open access literature, can ensure continued access to these scholarly publications into the distant future.


Open Source
Open source software is software that includes source code and is usually available at no charge. There are additional requirements besides the availability of source code that a program must meet before it is considered open source including: the software must be free to redistribute; derivative works must be allowed; the license can not discriminate against any persons; and the license cannot discriminate against any fields of endeavor. Software that is licensed under an open source license allows for a community of developers from around the world to improve the software by providing enhancements and bug fixes.
Libraries can realize many advantages by using open source software. One of the most obvious advantages is the initial cost. Open source software is generally available for free (or at a minimal cost) and it is not necessary to purchase additional licenses for every computer that the program is to be installed on or for every person who is going to use the software. Open source software not only has a lower acquisition cost than proprietary software, it often has lower implementation and support costs as well.

It is easier to evaluate open source software then proprietary software. Since open source software is typically freely available to download, librarians and systems administrators can install complete production-ready versions of software and evaluate competing packages. This can be done not only without any license fees, but also without having to stick to a vendor's trial period, evaluate a limited version of the software, or deal with the vendor's sales personnel. If the library likes an overall open source package but would like a few added features they can add these features themselves. This is possible because the source code is available. Even if a library does not have in-house expertise they can benefit from source code availability because another library may be able to provide them the fix or they can hire a consultant to make the changes that they desire. Fuchs (2004) points out that if a proprietary program "is deficient in some way [the user] must wait until the vendor decides it is financially viable to develop the enhancement -- an event that may never occur." With open source software the user can develop the enhancement themselves.

Open source software allows for more support options. Proprietary software vendors often package service with the product. This is particularly true of proprietary library-specific software. When support from a vendor is inadequate it is an additional expense to purchase another tier of support, assuming that it is even available. Open source software allows for different vendors to compete for support contracts based on quality of service and on price. Access to the source code also allows for self-support when practical and desired.

The amount of vendor lock-in is dramatically reduced with open source software. The large initial costs often associated with proprietary software makes it difficult to reevaluate the choice of software when it does not live up to expectations. Proprietary software can lead to a single point of failure. If a vendor goes out of business or decides not to support a program anymore there is often nothing an user can do. Organizations using the software could provide self support or other vendors can come in and fill the void left by the previous vendor if the program were available as open source software.


Open Standards
Pountain (2003) defines an open standard as "a standard that is independent of any single institution or manufacturer, and to which users may propose amendments." This definition is a good starting point, but in reality the term "open standard" means different things to different people. Three key characteristics of open standards identified by Coyle (2002) are 1) that anyone can use the standards to develop software, 2) anyone can acquire the standards for free or without a significant cost, and 3) the standard has been developed in a way in which anyone can participate. When a standard has the first two of these characteristics (the ability to use the standard and to obtain it with out a significant cost) it can be said to be an open standard in an utility sense. That is to say that an open standard is a standard that is not encumbered by a patent, does not require proprietary software, and can be utilized by anyone without cost. Proprietary standards can sometimes be expensive and it may be cost prohibited to purchase access to a proprietary standard if it is ever needed. Many people consider a standard to be sufficiently open as long as it is open in a utility sense. Others take this a step further and consider a standard to be open only if the process meets the criteria of being created and modified in an open process as well. An example of a standard that fits the definition of a standard that is open in utility but not in process is XHTML. In order to help develop the XHTML specification one has to be a member of W3C. In order to become a member of W3C businesses pay between $5,000 and $50,000 per year (Coyle 2002). Conversely, Dublin Core is a completely open standard that is open both in utility and in process. All one has to do is show up and participate in order to contribute to the development of Dublin Core.
It is important for libraries and other cultural institutions to ensure long-term access to digital information. The rapid growth in digital technologies has led to new and improved applications for digital preservation. However at the same time it has also led to some problems as well. Two of these problems are obsolescence and dependency issues. The obsolescence problem is caused by the advances in hardware and software making many computers obsolete within three to five years (Vilbrandt et al. 2004). Dependency problems can arise if tools that are needed to communicate between systems or read file formats become unavailable. In order to account for obsolescence and dependency problems organizations must be able to migrate data into new systems. Data migration, however, cannot occur without access to data file formats.

Properly created open standards for file formats are less likely to become obsolete (Vilbrandt et al. 2004) and are more reliable and stable then proprietary formats (Breeding 2002). In the event that an open standard file format does become obsolete, having access to the file format would allow anyone to easily, and legally, create a data conversion utility. File formats that use open standards can assist in long-term archiving because they allow for software and hardware independence. Open standards help alleviate issues caused by obsolescence or dependency problems since files created in formats that adhere to open standards are "more likely than proprietary formats to be readable twenty or fifty years from now" (Baker 1999). This allows for greater flexibility and easy migration to different systems in the future.

The use of open standards can help assure interoperability of diverse systems. There are various software packages that are being used to create digital libraries, online library catalogs, and other resources that libraries relay on. These various systems need to be able to interact in order to provide the best possible service to patrons. The way to make certain that these diverse systems, and any future systems, can communicate with each other is by using open standards to help achieve the "free flow of information through interoperability" (The Open Group 2005).

Many different organizations are advocating open standards. One of the most prominent organizations is The Open Group which created the Developer Declaration of Independence. The hope is that the Developer Declaration of Independence will help pull together the information technology industry in support of open standards. Some library-centric initiatives, including the Open Archives Institute (OAI), also support open standards. OAI's mission is to develop and promote "interoperability standards that aim to facilitate the efficient dissemination of content" (Open Archives Institute 2005). OAI has created a Protocol for Metadata Harvesting (OAI-PMH) that provides an application-independent interoperability framework based on metadata harvesting. Other common open standards for information retrieval relevant to libraries include Digital Object Identifier System (DOI), Dublin Core Metadata Initiative (DCMI), and OpenURL.

While open standards have garnered increased attention in libraries recently, the use of open standards in librarianship is not new. The use of open standards in librarianship can be traced all the way back to the first American Librarian Association meeting in 1877 when the dimensions of the card catalog were standardized to 7.5 x 12.5 centimeters (Coyle 2002). A more modern example of an open standard used by libraries is the Machine-Readable Cataloguing (MARC) record. Other common open standards for bibliographic data include Metadata Object Description Schema (MODS) , Metadata Encoding & Transmission Schema (METS), and the XML Organic Bibliographic Information Schema (XOBIS).


Putting Open Access, Open Source, and Open Standards Together
Open access, open source software, and open standards each individually offer a number of significant benefits to libraries. When they are combined the results can be even greater. Open source and open standards can help libraries provide patrons with easier access to open access materials and other resources. There are literally thousands of open access titles available and without open standards it would be very difficult to find what one is looking for or to view various articles. Imagine the difficulty, and costs involved, in maintaining a library's information technology infrastructure if each electronic journal required a separate, proprietary piece of software to read or search the journal. Open standards make it possible to create interoperable systems to access the literature in various open access journals seamlessly.
Open standards and open source can help preserve long-term access to open access and other types of electronic journals. Libraries working together can use open source software such as LOCKSS to ensure continued access to these scholarly publications long into the future. LOCKSS (short for "Lots Of Copies Keeps Stuff Safe") is a system that caches copies of digital collections around the world. As current computers, software, storage media, file formats, and other types of information technology become obsolete, it will be necessary to migrate open access articles and other data to new systems. Without the assistance of the software manufacturer (who may or may not even still be in business, let alone willing to help) proprietary software and file formats may make migration practically impossible. By utilizing open source software and open standards from the beginning, libraries can assure that this type of systems migration will be possible years down the road.

Not only has the growing cost of serials caused libraries to drop journal subscriptions, it has also factored into a 26% decrease of monograph acquisitions by the typical research library between 1986 and 1999 (Create Change 2002). Library budgets can be reallocated to monographs and other areas because of the lower costs typically involved with open access, open source, and open standards.


Conclusion
These benefits of open access, open source, and open standards are numerous. The benefits include lower costs, great accessibility, and better prospects for long-term preservation of scholarly works. Libraries should embrace all three of these concepts now and in the future. By supporting open access, open source, and open standards libraries not only can help ensure that their current and future patrons will have easier and more comprehensive access to scholarly research, they will also be helping other libraries around the world, including those in disadvantaged areas, to have access to important scholarly research.

References
Anderson, R. 2004. Open access in the real world: confronting economic and legal reality. College and Research Library News 64(4). [Online]. Available: http://dlist.sir.arizona.edu/archive/00000351/ [Accessed March 25, 2005].
Baker, T. 1999. TIAC White Paper on Appropriate Technology for Digital Libraries. Bangkok: Technical Information Access Center.

Breeding, M. 2002. Preserving digital information. Information Today 19(5): pp. 48-49.

Coyle, K. 2002. Open source, open standards. Information Technology and Libraries 21(1): 33-36.

Create Change. 2002. Coping Strategies. [Online]. Available: http://www.createchange.org/librarians/issues/coping.html [Accessed May 11, 2005].

Fuchs, I. 2004. Learning management systems: are we there yet? Syllabus. [Online]. Available: http://www.campus-technology.com/article.asp?id=9675 [Accessed: April 1, 2005].

Goodman, D. 2004. The criteria for open access. Serials Review 30(4). [Online]. Available: http://dlist.sir.arizona.edu/archive/00000798/ [Accessed: March 11, 2005].

Open Archives Initiative. 2005. Organization. [Online]. Available: http://www.openarchives.org/organization/index.html [Accessed: March 17, 2005].

Pountain, D. 2003. The Penguin Dictionary of Computing. New York: Penguin Putnam.

The Open Group. 2005. Developer Declaration of Independence. [Online]. Available: http://www.opengroup.org/declaration/declaration.htm [Accessed: March 29, 2005].

Vilbrandt, T., et al. 2004. Cultural heritage preservation using constructive shape modeling. Computer Graphics Forum 23(1): 25-41.

Willinsky, J. 2003. The nine flavors of open access scholarly publishing. Journal of Postgraduate Medicine 49: 263-267.

On the Theory of Library Catalogs and Search Engines

Supplementing the talk on "Principles and Goals of Cataloging", German Librarians' Annual Conference Augsburg 2002.

Nothing is more practical than a good theory. A banal statement, considering that a theory should always enable its users to easily derive the statements they need for practice.
But a theory for catalogs or cataloging? Is that really necessary? A question anyone is likely to ask who has never been confronted with the matter nor considered it with any seriousness.

Using Internet search engines, and knowing their operation is fully automated, people tend to view with skepticism all practical and theoretical effort invested in catalogs. Any good search engine, however, has to be be based on a good theory - though that one may differ quite a bit from a catalog theory.

What do libraries and the Internet have in common?

Both provide access to collections of recordings. One need not use the difficult-to-define concepts of information and knowledge here. We may leave it open whether or not an "information society" exists, or a "knowledge economy", and whether everything is information or knowledge that is squeezed between book covers or on Web pages. The "Pisa Studies" have reminded everybody that knowledge doesn't come without learning. Being in the possession of printed matter does not mean to possess knowledge, but printed text turns into living knowledge only after reading and understanding, and then this knowledge will sit in the head of the reader and not on the paper or screen. Nobody will doubt that ours is a learning society, and recordings of experience and insight are of central importance for learning. One learns from direct interaction between humans, by one's own doing, by observation, or through studying. Which mostly consists of taking in what others have recorded.

In many cases, suitable recordings have to be found first. Millions of humans, over millennia, have recorded their experience and encounters, their findings, their insights, and their inspiration. When this started with the Greeks, Platon saw in it a symptom of decline: people wouldn't exercise their memories any more because they would now rely on inferior surrogates. But people did not stop at making use of their own recordings, they started using those of others as well. Collecting began. Libraries were created. After collecting more than a few hundred papers or papyri, a system of ordered shelving had to be invented or else the usefulness of the collection would have suffered.

How did cataloging come about?

Once several thousand items have been collected, their physical arrangement, whatever the system, becomes tedious. One will need finding aids, i.e., secondary recordings (or meta-recordings), which will reveal where in the collection a particular item is located. This is the birth of cataloging: it shifts the process of ordering from the shelf to paper, to files or, nowadays, to databases. Unless one also invents a nice theory along with this, the usefulness of the catalog will diminish with its size rather than increase.

Once one has millions, the assembling of the finding-aids in itself becomes quite a considerable effort. No wonder there are attempts at automating the process, at least for collections that exist in digital formats. The metaphoric term "search engine" suggests, misleadingly, that a machine peruses the documents as such, focusing on their content. The actual searching is, however, always performed on surrogate files the system constructs for this purpose. Software can only match character strings, not concepts or ideas. This cannot be done in just some arbitrary way, but there has to be a systematic way, an algorithm, which means a theory.

Contents of libraries and Internet

Combine libraries, archives, and the Internet, and they comprise nothing less but the accumulated intellectual and artistic recordings of humankind, inasmuch as these survive, from all periods, all countries and cultures, in all languages and scripts and about all subjects, by all individuals who ever wished to make a contribution. The size and the complexity of this is staggering. It is naive to expect that navigating this multidimensional universe might be easy or might be made a simple matter. One may try to simplify the description of the world, but the world itself will not become any simpler that way. Note that the initial enthusiasm of the metadata movements has softened a bit...
A catalog attempts to help with finding documents and with orientation among documents, and internet search engines strive to do the same. The question is: in what way, with what principles and methods, by what theory can or should they work in order to help the most people in the largest number of cases in the best possible way. No one single method can serve all purposes and all searchers all the time - everybody will know this who has tried to find anything on more than one occasion.

Books or Internet - a matter of taste?

There is not really an either-or situation. Only the combined contents of both worlds constitute the complete universe of recorded knowledge and achievement. Library catalogs on the Internet do not change this, be they as comfortable as they may, because catalogs carry only descriptions, not the publications themselves which exist only on paper or in microform. To digitize all of these and make them full-text searchable is presently utopia: there are many millions of texts and new ones continue to be produced by the tens of thousands per year, a great many of which are not in machine-readable formats, Google's efforts notwithstanding. Catalogs contain only very brief and standardized descriptions of the documents, whereas for internet content full text is the norm. But: diversity is enormous, and most documents lack a standardized description (a.k.a. "metadata"). From this it follows there will be a number of differences between catalogs and internet search engines. In libraries, we not only have to understand that but we should also be able to transfer this knowledge to our readers.
Further down we make an attempt to juxtapose catalogs and search engines in a table.

First, however, let us look at catalogs as such, and at the difference between the contemporary device, the OPAC, and the card catalog (now gathering dust, if not discarded). We also have to ask what consequences should be envisioned for cataloging rules. It goes without saying that the OPAC is here to stay and that card catalogs are history, but one may still learn from a comparison.
For readers who want more detail, there's the introductory chapter of an outstanding book: Martha M Yee's and Sara Shatford Layne's "Improving Online Public Access Catalogs" (ALA, 1998. - ISBN 0-8389-0730-X).

What is the principal problem today in searching?
The true problem with OPACs is no longer, as it was for cards, that users have difficulty just finding something at all. Instead, for most queries the OPAC does bring up some results - then, however, there is no easy way of knowing if this is all there is and whether the best-suited items have at all been brought up. Users, in other words, cannot know they have missed something or even a lot or even very important things. Their awareness of this is generally low, and catalog use studies have shown that it is very difficult to entice users into making several different attempts - or briefly, to set them thinking. Their confidence in technology, in other words, is improperly high. What most of them use is just the standard or default options, making barely more than one attempt. This is probably based on an overall least-effort tendency, or on the unreflected assumption that what's offered as default is also the best possible way and others are inferior. The catalog itself cannot overcome this. The catalog may be as good as it gets, that's not the point. Users have to think and judge for themselves, today as much as yesterday, and this is not going to change with any new generation of technology. And they should even be happy about this, for otherwise they themselves might be replaced by machines... Be that as it may: there certainly have to be easy ways of searching for simple questions, but ambitious and knowledgable users should also be provided with and invited to use sophisticated techniques.

What is a good catalog?

From all we know, we may characterize it like this (formulated originally by LC's Thomas Mann, as quoted in M. Yee's book):
  • Reliability: Starting from a citation, one should be able to ascertain quickly and with certainty if the item is in the collection or not. (In an unreliable catalog, esp. the latter may require many trials before one can be sure. Catalogs need to have that feature because of acquisitions checking, for example, or to find out whether an ILL order is necessary.)
  • Serendipity : Browsing functions are essential, firstly because one doesn't always have precise search criteria, and secondly because chance findings are sometimes valuable. That's one reason why users tend to go to the stacks first when they know the arrangement. Catalogs should therefore make related materials browsable - the question of course being, what exactly is "related"? OPACs can, for example, support browsing in these ways
    1. provide alphabetical indexes of names, terms, titles etc., browsable up and down,
    2. present result sets in more than one arrangement for the user to choose, and
    3. make related publications accessible via hyperlinks (for subject terms, classification codes, names).
  • Depth : This covers two aspects that are not exactly part of cataloging:
    1. a policy saying what materials or objects are subject to cataloging. Classically, these are books, meaning self-contained knowledge packages. More often than not, however, a book consists of several or even many packages of recorded knowledge, each of which representing a unit that might become the subject of a bibliographic record itself - because someone might well be searching for it. Just think of proceedings volumes or festschriften, not to mention periodicals. With the exception of belles-lettres, readers will, in many cases, be interested and thus actually be looking for a chapter or parts of a book rather than the whole volume. If cataloging restricts itself to title page information, the catalog will be completely oblivious to all the constituent parts of books. For economic reasons (labor, space), not many libraries have ever done chapter-level cataloging. One important case are "multipart publications" with individually titled volumes: are these to be cataloged as a whole or each volume separately - or both? The focus of European cataloging seems to have been heavily on the parts, whereas American catalogers have more often only perceived the whole.
    2. a concept for subject indexing. Is it enough to assign a few subject terms and/or classification symbols to a document to nail down its content matter, or should the aim be to index every subject that is actually dealt with in some part of the publication? There are experiments, for example, with tables of contents of books (as in OhioLink). There are also experiments with automatic assignment of additional terms or notations by software.


From one dimension to many

The most decisive difference between conventional and online catalogs is this:
(We are not talking about technical differences here, like availability around the clock around the globe, just catalog theory!)

Card Catalog: a linear sequence of entries, i.e., a one-dimensional space, the ordering principle being the alphabet on the lowest level, names/titles/subjects on an upper level. Some libraries had several catalogs for two or more time periods or for otherwise defined parts of their holdings. Every document can be represented by more than one card in several places of the sequence, one of these being called the "main entry". It served two purposes. Firstly, that of collocating related works in one place (like an author's works under an established form of his/her name). Its second and probably more important function was to provide a predictable location for the item in the catalog: if one knew the principle, one was able to find with certainty what one was looking for in just one attempt. Practicability limited the number of cards per item to an average of well below ten. There are many conceivable ways of arranging a card catalog, and in particular, of determining the entries to be represented in it. The pattern, once chosen, has to be followed consistently in order for the catalog to be reliable. Therefore, a card catalog is the utmost extreme of pre-coordination. Very elaborate rules had proved to be necessary in order to establish the pre-coordination.

OPAC: in principle, it contains an unordered mass of structured records. Software, however, can easily produce a dozen or more different indexes, each being a linear, sorted sequence of certain parts of the records. Logically, these are still quite like card sequences, but then software, processing a user query, can extract arbitrary subsequences and merge or intersect them with subsequences from one or more of the other indexes, yielding subsets of the database which can then be presented in one or more different meaningful arrangements. Criteria like names, titles, numbers, subject terms etc. may thus be combined in all conceivable ways. Indexes are thus like the axes of a multi-dimensional space in which software enables the user to navigate. Multi-dimensonal spaces are abstract, mathematical entities and therefore present a challenge for many users to comprehend. As opposed to card catalogs, it means that OPACs rely heavily on post-coordination.

The actual arrangement of the pre-coordinated card sequence results from two decisions:

  1. Entries: What are the criteria for the selection of entries - which persons or other entities are to be represented by a card for a main or added entry, and which not?
  2. Headings: What is the exact spelling of the card headings for the selected entities?
    The difficulties encountered here gave rise to the whole edifice known today as authority control.
Metadata schemes, as an aside, seem to neglect the second question more often than not, at least when it comes to names and titles. This relates to the assumption that OPACs do no longer require the elaborate edifice of rules that had been necessary for cards because now every detail can be made searchable, so if one access point fails one can try another.
This is, however, a premature conclusion, becoming apparent when looking at the situations in which a catalog is consulted:


Standard situations of catalog use

The situation most frequently encountered is probably the factual search: For this, catalogs are not very helpful because they contain descriptions of reference works only, not their contents. Search engines, however, index the available documents directly and in their entirety and can thus lead immediately to the facts contained therein. When looking for facts, search engines are therefore the first stop for most anybody these days: the engines serve as directory, dictionary, encyclopedia, atlas, calendar, timetable, picture book, etc. Catalogs can only point users to all those reference tools , which makes the search for facts more cumbersome and time-consuming.
If, however, we turn to document searching, we can observe at least three broad categories of situations frequently encountered when people use a catalog or search engine:

(a) Known item search ("I know exactly what I need"): looking for something cited or referred to in some other place, like a bibliography (before the advent of hyperlinks).

The user then has to know what data elements are likely to yield results. Rules for the selection of these search criteria are called "entry rules".
For cards, these rules had to be very restrictive because, for economic reasons, one could always only produce and file a very limited number of cards for any given item. In contrast, OPACs produce and arrange their indexes automatically. Index entries, and thus access points, can therefore be very numerous. As one attempt fails, for whatever reason, another and yet another can be tried in rapid succession. Before soon, a lack of reliability will be perceived, leading to the desire to have more things standardized (or under authority control) than ever before, like publishers' names or place names.
In addition, there have to be rules governing the description of items. Descriptions have to be brief but to the point: they have to ensure that the database user will be able to differentiate between dissimilar items, like different versions or editions of a document. The important principle is: meticulous transcription from the item at hand.
(b) Collocation search ("I want everything written by XYZ"): What the user knows is, for example, little more than a name or title, or one single document. Starting from this, they want to find all logically related items, like other editions or versions, translations and so on, or all of the output of one author. This objective calls for rules that bring together what belongs together. Such rules are traditionally called "rules for headings" because it was the card headings that eventually brought all the cards together that described one author's works and such. Roughly, headings rules prescribe that a name or title be spelled in exactly the same way all the time. Related items do not come together all by themselves when names or titles are different. Many a name and title therefore has to be spelled different from what's printed on the title page or equivalent - which may sit square with situation a), requiring precise transcription. Sometimes, because of this, a name or title has to be recorded in both the standardized form and the form found in the piece itself. For card catalogs, this led to the invention of reference cards (like Samual Langhorne Clemens: see Mark Twain). For databases, references are collected in "authority files". An authority record for a person contains all the different forms of a name encountered. With an OPAC properly set up, this should then lead to the same result for any query using any of the different forms. For every single document then, just the authority form or its id-number have to be recorded, plus the form found in the piece itself for proper identification and distinction. Some authority records contain as many as 30 or more forms, for example for names like Chechov or Tchaikovsky.
The only authoritative authority file in the AACR world is the one of the Library of Congress, for names of persons and corporate bodies. For persons, this file also contains the titles ("uniform titles") of many works that have been issued in numerous editions and translations.
In Germany, the Deutsche Bibliothek is running similar files, based on German cataloging rules (RAK = Regeln für Alphabetische Katalogisierung).
(c) Subject search ("I'm looking for material on xyz"): Very often, someone embarks on a search without prior knowledge any specific title or any author related to the subject. This situation is, in principle, much more problematic than (a) or (b). "What is this book about?" is a question that very often cannot be answered with a brief list of terms (see above, remarks on "depth"). Books are normally not full-text searchable for lack of access to the source file. Situation (c) is, however, likely the most frequent and important one for many end-users, who tend to perceive a) and b) as rather unproblematic. There are authority files for subject terms just like for names and titles: the Library of Congress Subject Headings (LCSH) for English-speaking countries, the SWD maintained by the Deutsche Bibliothek for German libraries.Situation (b) and its aspect of "editions of a work" often gets overlooked or is not given much attention. It may occur less frequently than the others - how many works, after all, run into two or more editions? One gets more of a sense for it when considering the following search situations, all of which can only be successful if the catalog does indeed "bring together what belongs together":
  • Some users don't know there is a newer (better, more complete) edition than the one he/she has been referred to.
  • A citation may be imprecise but still good enough to find at least one edition - this one should then lead to the others.
  • Users are sometimes happy with any edition of a cited work, no matter the real title.
  • Users may enjoy the serendipity resulting from being presented with more than one edition.
And something else: the fact alone that a translation exists or that several editions have been produced may be viewed as a quality indicator. The card catalog made this readily apparent when editions were all filed under the "uniform title" (and referenced from the various real titles). For OPACs, it might be considered to use the presence of edition statements and uniform titles for ranking in result sets. If this has already been done somewhere, not much is known about it. OPACs can (and should) of course provide a link to "related editions/versions" based on the presence of a uniform title.

Perfection, however, is out of reach: for example, very often a library has only one edition of a work and the cataloger is unaware of the existence of others (esp. ahead of time before other editions would be published!). Then, only this edition can be found in the catalog, but not under any other title by which it may be known to a searcher. Such cases are less frequent in large, shared databases.

Plus ça change, plus n'est pas la même chose ...

Technology enabling proliferation like never before, it is now very common to encounter diverse "manifestations" of a text: the same content can be presented in different versions or file formats and with all sorts of modifications. This can aggravate the difficulties with collocation searches (situation (b)). And titles, though being the most important element identifying a document or work, are not handled with a lot of care in the Internet.
Classically, the manifestation problem varies from one discipline to another. It is probably least virulent in the sciences and in technical disciplines, for it is rather an exception for a document to live through more than one edition. In belles lettres, it is more common, but music has arguably the most and the worst examples: many pieces can be found in dozens of interpretations, titles changing all the time as well as the forms of names (Tchaikovsky!). The "uniform title" is therefore nowhere as important as in music to bring all editions or versions together.

AACR are concerned with the formal level, not the subject level!

The AACR code of cataloging rules, like the German RAK, deals with situations (a) and (b) only. These pose problems that can be solved by purely formal or descriptive means, whereas (c) requires attention to the content of things cataloged.
In the world of cards, there were sometimes (in Germany, nearly always) separate catalogs for situation (c). OPACs, however, always combine formal and subject access points in the same database if not generally in the same index. They can differ in having or not having an "anyword index" actually combining all words (but not phrases) occurring in bibliographic records. In any case, it seems important to have uniform access forms for personal and corporate names, serving for both kinds of accesses. German rules are not yet fully unified in this regard.

The problems described here have been known at least since Antonio Panizzi's work at the British Museum in the 19th century (his "Ninety-One Rules" were published in 1841). He had set himself the task of setting up the first complete catalog for the library. His employers found his ideas somewhat overly complicated and were reluctant to support him. This situation keeps repeating itself...

Attempts at formulating international principles for cataloging set in only in the mid 20th century, the all-time highlight being the IFLA Conference of 1961 in Paris. The "Statement of Principles" promulgated there became the foundation for AACR as well as for RAK. Only as late as 1999, IFLA came up with a new milestone paper, entitled "Functional Requirements of Bibliographic Records", which is gaining ground not just in library circles but also in metadata projects. Some of its main points are presented in a separate paper, "What should catalogs do?", for the German Annual Conference in May, 2002, Augsburg.

Is AACR2 inextricably intertwined with MARC21 (and RAK with MAB)?

The MARC21 and MAB exchange formats were created to serve the exchange of library data. The Deutsche Bibliothek creates RAK records in MAB2 format, the Library of Congress produces AACR2 records in MARC21. However: the Deutsche Bibliothek can and does deliver the same data cast into the MARC mold. Format and rules are not inextricably intertwined: a data format is nothing more than a container. With a bit of goodwill, wrinkles can be ironed out. A worldwide, unified exchange format can be envisioned, despite rules remaining different. UNIMARC was created for this purpose, but it has not caught on. Some samples have been set up for demonstration.

Catalogs and search engines

Time and again, catalogs and search engines are juxtaposed in a pears vs. apples comparison.

The intention here is not to find out which is the better gadget but to show what differences exist. Not just librarians may be interested to get a clearer picture of strengths and weeknesses.

There is actually no competition, for catalogs and search engines cover different ground. Most print material remains offline and thus inaccessible for harvesters, and on the other hand, many online resources have unprintable characteristics and thus could not be published in print.

There are, however, widening "grey" areas: Genuine internet resources are being cataloged to enrich catalogs. And search engines index files that contain book reviews, abstracts, whole chapters, descriptions, etc. Some categories of publications, like preprints and dissertations, which used to appear in print are now mounted on webservers. Important older books no longer subject to copyright are digitized and made freely available. The works of "classics" in many languages are freely available as text files, most prominent example being the "Project Gutenberg". And reference works that used to be published in book form are increasingly made available online and turned into databases or (in library cataloging parlance) "continuing integrating resources". And then, last but not least, there is Google's effort to digitize books on a grand scale. At the time of this writing, one cannot do much more than speculate about the potential of this project.



Catalog
Search Engine
Document base, Coverage
Describes a particular collection, predominantly books, located in one or several buildings.Indexes documents distributed all over the planet. The majority of these "resources" are not very much like books.
Size
The collection is a selection from a much larger number of existing documents. The selection will mostly be by objective and quality criteria but it can also be subjective. However, lack of funds can cause the lack of important materials.
Union catalogs describe many more items than individual catalogs, but not everything is easily accessible.
The intention is for comprehensive and global coverage, but in reality no more than some 30% of accessible materials are indexed by any one search engine.
Selection for quality is generally not possible.
Size and currency of coverage are not obvious to the user, selection is an automatic process. Many documents covered have never been published conventionally, and most conventionally published material is not on the web.
Objectives
A catalog has clearly defined goals (RAK §101) one of which is to ensure reliable access for some types of queries. "Known item searches" and "Collocation search" are deemed particularly important. In many cases, one has to know the right search terms with some accuracy in order to be able to ascertain presence or absence of an item in the collection.Guiding principles for search engines would be difficult to work out, at least in the sense than one could know with a high degree of certainty how the presence or absence of something can be ascertained. In particular, "Subject searches" and "Collocation searches" are technically impossible to be made reliable. For "Known item searches", the situation is better: knowing two or three characteristic and not too common words the text mustcontain, an AND search is very reliable. The dominating use, however, may well be the factual search: with some luck, it is nowhere else that one can so swiftly find an address, a statistical figure, a historic date, a word's meaning, or a picture.
Expectations of users
Holdings of a library are usually smaller than users expect for their fields of interest though libraries usually try to build balanced collections of quality materials of long-term value. Union catalogs may be viewed as catalogs for a much larger yet virtual collection.The number of "documents" indexed may be much larger than any user would imagine, but valuable resources are side by side with utter ephemera and all sorts of useless matter. There are various attempts to use formal criteria for "ranking".
Nature of data
Data consist of highly standardized brief descriptions, following elaborate codes of rules. The most widely used codes are AACR2 and MAB. Every item is represented by a structured record containing well-defined data fields. The data formats have been designed to accomodate all elements prescribed by the rules. The most widely used formats are MARC21 and MAB.
Some examples are provided to illustrate how code and format complement each other.
There are no standardized descriptions of the documents indexed. The database consists of nothing but large inverted files, derived directly from the documents but never shown as such. Standardization in the sense of authority control is not possible because of a general lack of standardized metadata.

Even where metadata exist, they are not always helpful: they are insufficiently standardized, too simple and meager. The most widely advocated semantic standard is the "Dublin Core", but this is a container, like MARC, and what matters is its content. But for content, any standard like AACR2 is mostly absent.

Creation and content of the database
Full texts are not available for direct access or automatic indexing. Catalog records are just very brief and artificial surrogates.

Descriptions are based on title pages or equivalents and little else.

Record structure is still related to traditional catalog card structure in terms of content and layout.

Automatic cataloging (scanning title pages etc.) is not feasible, descriptions have to be prepared by manual and intellectual input.

Some search engines index the entire text of web documents. Things like title pages do either not exist or are not detectable by software. Programs can, however, evaluate the proximity of words, their being highlighted or specifically tagged (headlines, image tags)
Search criteria
Searches can be restricted to certain fields and boolean combinations thereof: names, title words, title phrases, subjects etc., some OPACs have an "anyword" index allowing for keyword searches in the entire text of the records.

With regard to books and similar documents, search criteria relate to a book as a whole, not to any of its parts, like chapters or contributions.
(the "depth" of indexing, in other words, is rather limited).

Full-text searching is the default. There are mostly no fields for titles, names, subjects, so these do not exist als search criteria. If a title search is possible, then it operates on the titles "as is", and not all web sources do have proper titles. Searches for URL components can be a useful complement.
Because of the full-text searching (which means more "depth"), using combinations of not-too-common words can often yield good results where no library catalog would turn up anything, but one can just as well get scores of irrelevant items.
There may be additional functions like, for example, image searching, based on image tags in HTML text.
Some engines do a kind of ranking that attributes more weight to words in the opening section.
Browsing
Instead of direct queries, most OPACs also offer index browsing (up and down, in sorted lists of terms).

Browsable indexes can assist in finding words and names the exact spelling of which is not known. Also, it can be useful to see which inflected forms exist (Plural, Genitive, etc.) For an untruncated word search will find only that particular spelling, but titles can contain other forms. English may be the least afflicted language in this regard.
Serendipity can also be helped a lot by browsable indexes.

Search engines generally do not feature browsable indexes. Although rarely noticed, this would be very helpful because of the total lack of authority control. The enormous amount of data may make the production of browsable indexes unfeasible.
Because of full-text indexing, the inflection problem is less serious: the important words will usually occur
in several inflected forms in any given text.
But: there are prominent search engines not yet featuring truncation...
Result set arrangement
Result sets are usually shown sorted by author, title or reverse chronologic.
Some systems offer a choice.
For ranking, an OPAC might employ word proximity, language, number of pages, or facts like existence of a uniform title or edition statement. Not many OPACs presently apply any ranking technique. This may be because the very brief textual content of catalog records severely limits the applicability of techniques developed for search engines.
Some engines present results in no predictable order.
Some talk of relevance ranking, employing various formal techniques. Strictly speaking, relevance can be judged only by the person searching, not by a machine. The word is used only as metaphor, like so many in the computing field. One ought to make users aware of it.
Search engines can, however, use criteria like link evaluation that have no parallel in catalog data.
Ordering by date or alphabet are not possible because there are no corresponding data fields. Standard HTML files do not even contain a creation date, and the