My university has a mandate to increase our international reach through research collaborations, courses offered, and support for international students.
From the technical services side, this means our catalogers must provide metadata for resources in unfamiliar languages, including some that don’t use the Roman alphabet. A few of the challenges we face include:
- Identifying the language of an item (is that Spanish or Catalan?)
- Cataloging an item in a language you don’t speak or read (what is this book even about?)
- Transliterating from non-Roman alphabets (e.g. Cyrillic, Chinese, Thai)
- Diacritic codes in copy cataloging that don’t match your system’s encoding scheme
I’d like to share a few free tools that our catalogers have found helpful. I’ve used some of these in other areas of librarianship as well, including acquisitions and reference.
Sometimes I open a book or article and have no idea where to start, because the language isn’t anything I’ve seen before.
I turn to the Open Xerox Language Identifier, which covers over 80 different languages. Type or paste in text of the mysterious language, and give it a try. The more text you provide, the more accurate it is.
Web translation tools aren’t perfect, but they’re a great way to get the gist of a piece of writing (don’t use them for sending sensitive emails to bilingual coworkers, however).
Google Translate includes over 75 languages, and also a language identification tool. Enter the title, a few chapter names, or back cover blurb, and you’ll get the general idea of the content.
If you catalog in Roman script, and you wind up with a resource in Cyrillic or Chinese, how do you translate that so the record is searchable in your ILS? Transliteration tables match up characters between scripts.
The ALA-LC Romanization Tables for non-Roman scripts are approved by the American Library Association and the Library of Congress. They cover over 70 different scripts.
We’re fortunate that librarians love to share: there are quite a few sites produced by libraries that look at common bibliographic terms you’d find on title pages: numbers, dates, editions, statements of responsibility, price, etc.
To share two Canadian examples, Memorial University maintains a Glossary of Bibliographic Information by Language and Queen’s University has a page of Foreign Language Equivalents for Bibliographic Terms.
If you’ve ever seen the phrase “bibliographic knowledge of [language]” in a job posting, this is what it’s referring to—when you’ve cataloged enough material in a language to know these terms, but can’t carry on a conversation about daily life. I have bibliographic knowledge of Spanish, Italian, and Germany, but don’t ask me to go to a restaurant in Hamburg and order a hamburger.
Similar to bibliographic dictionaries, these are for terms common to specific subjects.
My university has significant music and map collections, so I often consult the language tools at Music Cataloging at Yale (…and I once thought music was the universal language) and the European Environment Agency’s Terminology and Discovery Service.
In order to ensure that accented characters and special symbols display properly in the catalog, it’s important to have the correct diacritic code.
Our system uses Unicode, and we often rely on the Unicode Character Code Chart or Unicode Character Table. Which interface you use is personal preference.
It may also be worth coming up with a cheat sheet of the codes you use most frequently – for example, common French accents if you’re cataloging Canadian government documents, which are bilingual.
Many Integrated Library Systems also have diacritic charts built in, where you can select the symbol you need and click it to place it in the record.
Diacritic charts can be long and involved (the Unicode example above is a bit of a nightmare), so if you’re working with a new language, browsing through them searching for a specific code can be time-consuming. You can see the symbol in front of you, but have no idea what it’s called.
This is where Shapecatcher comes in. This utility allows you to draw a character using your mouse or tablet. It identifies possible matches for the symbol and gives you the symbol’s name and Unicode number.
Have you encountered issues handling different languages when cataloguing? Is there a free language tool you’d like to share? Tell us about it in the comments!
Credits: Image of Pieter Bruegel the Elder’s painting The Tower of Babel courtesy of the Google Art Project. Many thanks also to my colleagues Judy Harris and Vivian Zhang for sharing their language challenges and tools.