Leslie Owens' Forrester Report: How To Build A High-Octane Taxonomy For ECM And Enterprise Search Systems

We all know that building controlled vocabularies in enterprise settings is not simply a nice (fun?) intellectual effort. The point is to deliver value to the enterprise, whether it’s aiding in realizing revenue or ensuring full utilization of knowledge assets. In “How to Build a High-Octane Taxonomy for ECM and Enterprise Search Systems” (Forrester Nov 2008 free with registration), Leslie Owens presents a strong discussion of exactly how to go about building that “taxonomy” for your enterprise.

First and foremost, you should never engage in a “taxonomy” project in a vacuum. Ultimately, it’s all about context and knowing that context! Owens echoes this belief. We work closely with our clients to first clearly define and document the business objectives and ensure the project’s objectives are aligned with the enterprise’s objectives. The work must be done within the context of a specific set of business objectives and goals. Then we work with our clients to identify the systems which will utilize the controlled vocabularies. What are their constraints? How will they process the information? How might they display the information and surface the controlled vocabularies for the users? Speaking of users, they are the third aspect of context which we have front of mind as we work with our clients to initiate these long-term efforts. Who are they, what is their level of knowledge about the content and its use, why are they using the systems to be enhanced, how do they do their work and how do they think about the content base(s), etc.

Owens presents a good, but academic, discussion regarding the techniques for building controlled vocabularies. Ultimately, it doesn’t matter which techniques are employed or in what combination you use them as long as you’re effective and the outcome is a controlled vocabulary which can be deployed, maintained and re-used for long-term business value. In our experience, most engagements involve utilizing a hybrid of techniques in an iterative process. It’s not worth getting caught up in labeling the technique (even for those of us often referred to as “the word police”).

If you’re considering initiating a “taxonomy” program within your enterprise, or you’re already well down the road, we strongly recommend reading Owens’s report. You will glean good information to guide you in your process. One last thought I would leave you with; no matter how “simple” you think this effort is, it should not be undertaken believing it is a “once and done” project. You are initiating a long-term program which should always remain as an active component in your ECM, search, knowledge management, or whatever larger appropriate program.

Enterprise Finding with the Semantic Web

[note:this post was originally posted on my personal blog ]

Very similar to the possible benefits of the Semantic Web in what i like to call the research 'finding' environment in corporate enterprises and perhaps one of the most illustrative examples of the value of the Semantic Web i have seen in a while, this post titled Semantic Web in Education by Jason Ohler a professor of Educational Technology and Distance Learning at the University of Alaska paints a illustrative picture of some of the values end-users/consumers can derive from the Semantic Web. Flip it from an education research environment that Olhler is addressing to an enterprise finding environment and you have an interesting use case for why enterprises should look closely at the promise of the semantic web.

"One vision of a well-developed semantic web includes a search feature that would return a multimedia report rather than a list of hits. The report would draw from many sources, including websites, articles from scientific repositories, chapters in textbooks, blog dialogue, speeches posted on YouTube, information stored on cell phones, gaming scenarios played out in virtual realities-anything appropriate that is accessible by the rules of Web 3.0. The report would consist of short sections that coalesce around knowledge areas that emerged naturally from your research, with keywords identified and listed conveniently off to one side as links.

The information in the report would be compared, contrasted, and collated in a basic way, presenting points of agreement and disagreement, and perhaps associating these with political positions or contrasting research. Because the web knows something about you, it also alerts you to local lectures on related topics, books you might want to read, TV programs available through your cable service, blog discussions you might find relevant, and even local groups you can contact that are also focused on this issue. Unlike a standard report, what you receive changes as the available information changes, and you might have wiki-like access to add to or edit it. And because you told your agent that this topic is a high priority, your cell phone will beep when a significant development occurs. After all, the semantic web will be highly inclusive, providing a common language for many kinds of media and technologies, including cell phones. The net result, ideally, is that you spend less time searching and sifting and more time absorbing, thinking, and participating."

Starting in February, Christine Connors and I will be conducting a three part Webinar titled 'Discover the Semantic Web' that will address some of these enterprise specific opportunities to leverage the Semantic Web.

Image|Flickr|AlexBarros

Taxonomy Development Discussion at ALA

Our global taxonomy consultants are always willing and ready to participate in conferences, local meetings and any other get togethers that they can get to and they often are asked to participate and lead discussions/presentations around different topics. (of course when they are not working on client engagements!!)
ALA At this year's ALA Midwinter 2009 (American Library Association) meeting as part of the Networked Resources and Metadata Interest Group (NRMIG) there will be a discussion on taxonomy development on Sunday, January 25, from 8-10 a.m.The discussion will be led by our very own Laura Dorricott.
Laura DorricottLaura is a Project Delivery Manager in our Dow Jones Taxonomy Services group and has many years of experience doing taxonomy and indexing projects including Taxonomy/thesaurus assessment, development and design.

Over on the Metadata Blog , the official blog of the Networked Resources and Metadata Interest Group (NRMIG), they are requesting input in advance on topics that should be covered- so if you are attending or have some 'remote' thoughts for Laura feel free to add your comments either here or there!

A request was already added by Diane Hillmann that she would " like to hear Laura discuss the differences she sees in the way libraries view vocabularies and vocabulary development and the way commercial entities view these issues. Is there something we can learn from those differences? I'd also like to hear a bit about how the commercial sector evaluates return on investment for this kind of development.".

Sounds like a great discussion and i can't wait for Laura's report from the field!

Global Report on the State of Blogging Librarians

Many readers of the SynapticaCentral blog are not professional librarians - but some of us are so i am sharing with you something that i just published on my personal blog that you might find interesting around the state of blogging Librarians around the world.

Librarians are no strangers to the blogging world and some of the earliest blogs i personally started to followed where indeed written by Librarians. The Librarian in Black blog by Sarah Houghton-Jan a Digital Futures Manager for the San José Public Library is definitely one that i have been keeping an eye on in a Library topic folder in my RSS reader over the years and always find interesting and valuable information.

Recently Sarah wrote a post on her blog that caught my attention about a new book published by LibWorld titled "Library Blogs Worldwide" in which she has written a chapter covering the United States Library blogging community (page 187).

The book was published via the Infobib LibWorld project and is available as a free download or a paperback from Lulu online publishers.

It offers thirty commentaries by local librarians on the state of library-related blogs in 29 countries! The forward by Walt Crawford provides a good summary of the volume and diversity that global library blogs offer.

In Sarah's chapter she asks the question "which came first, the librarian blog or the library blog?" She writes that the librarian did because they "began sharing information with each other and as more of us saw the power of the blogging medium, we began adopting the same techniques at work.". I agree with that answer and it probably mimics a lot of other industries and professions who have adopted blogging as part of their work 'duties' in the same manner.

Another good resource for comprehensive lists of blogs is the Blogging Libraries Wiki and an output of the LibWorld report is available from the LibWorld delicious account where they have tagged all the blogs mentioned in the book by country.

We should think about putting together a comprehensive book that cover global bloggers who are taxonomists, ontologists and linguistists !

Library of Congress Reports on Flickr Pilot

[This post is cross-posted on my personal blog]

Last month the Library of Congress released their report on their ongoing Flickr project that i have been very interested in and have written about as the project progressed. From their blog post on the report:

"Only nine months into the Library of Congress’ pilot project placing Library photos on the Web site Flickr, the photos have drawn more than 10 million views, 7,166 comments and more than 67,000 tags, according to a new report from the project team overseeing the lively project."

“The popularity and impact of the pilot have been remarkable,” said Michelle Springer, project manager for digital initiatives in the Office of Strategic Initiatives, who said total views reached 10 million in October. The site is averaging 500,000 views a month, she said, adding that Flickr members have marked 79 percent of the photos as “favorites.”

A summary of some of the outcomes:

  • Increasing awareness of the digital photograph collection the Library of Congress (LC) has which has been available for years on the Library's website turning to not only an engaged audience but a lot of referral traffic to the Library's Website. "Feedback of this nature suggests that as a result of this project the Library is reaching new audiences—people who did not or could not find this material on our own site, and people who never thought to look here. "
  • Gain a Better Understanding of Social Tagging and Community Input (see below for more details)
  • Pilot helped the LC staff gain experience with Web 2.0 online interactions with 'patrons'

Since the beginning of the project i have been very interested in learning about some of the outcomes that the project would provide in regards to user tagging versus applied controlled vocabulary through traditional bibliographic cataloging. In the report the share that they used the Flickr API to do deeper analysis of the tagging that was done by the community (see pages 19-24 of the full report) based on nine categories that provided some interesting insight focused on issues commonly cited in comparisons of social tagging vs. assignement of controlled vocabulary terms(page 28). The categories analyzed were:

I. LC description-based (words copied from the Library-provided record): e.g., titles,
names, subjects, etc.

II. New descriptive words (words not present in the Library-provided description):

  • Place: e.g., cities, counties, countries, natural feature names
  • Format (physical characteristics of the original photos). Sample tags: LF, large format, black and white, bw, transparencies, glass plate
  • Photographic technique. Sample tags: shallow depth of field
  • Time period. Sample tags: wartime, WWII, 1912
  • Creator name: e.g., photographer’s name

III. New subject words (words not present in the Library-provided description):

  • Image (items seen in the image itself). Sample tags: cables, trees, apples, windows, hat, yellow
  • Associations/symbolism (phrases and slogans evoked by the image). Sample tags: Rosie the riveter, Norman Rockwell, We can do it!
  • Commentary (revealing the tagger’s value judgments). Sample tags: Sunday best,
  • proud, dapper, vintage.
  • Transcription (transcribing words found in items such as signs, posters, etc., within the photo)
  • Topic (terms that convey the topic of the photo). Sample tags: architecture, navy, baseball, story
  • Humor (tags intended to be humorous rather than descriptive) Sample tags: UFO, flying saucer

IV. Emotional/aesthetic responses: (personal reactions of the tagger). Sample tags: wow,
pretty, ugly, controversial

V. Personal knowledge/research (tags that could only have been added based on knowledge or research by the tagger, and that could not have been gleaned solely from the description provided or examination of the photo): For example, the tag murder used on a portrait of someone who was later murdered or tags added for the specific county when that information was not part of the description.

VI. Machine tags (added by the community not Library-supplied): e.g., geotags and Iconclass tags

VII. Variant forms (representing terms already tagged but in a different form, such as synonyms (e.g., WW2, WWII, World War II, worldwarii) or plural/singular differences (e.g., transparency/transparencies)

VIII. Foreign language (tags in foreign languages/scripts, whether they are translations of English-language tags, or new tags)

IX. Miscellaneous (tags that are not readily understood, that provide corrections to LC descriptions or to other taggers (e.g., not peaches), or tags later removed

Some of the Future Tag Analysis Interests (page 29) are also quite interesting such as actually incorporating popular concepts or variants into the LC's own controlled vocabularies (yeah something i advocate in the hybrid approach!), bringing the tags into the LC's search environment, populate bibliographic records with tags (although that have already added the Flickr URL to the "additional version available" field (MARC field 530) in some catalog records which leads users to the appropriate Flickr page that might provide historical information etc. on the image that is vaulable- see sample on page 36).

In the report they also share some of the experiences the staff learned from using Web2.0 tools in interacting with patrons that might be different from the traditional reference desk exchanges (page 37).

The good news? Skip to page 38 of the full report to see the recommendations and conculsions including details of headcount that is necessary for the program to continue and expand. But the report ends with the following good news:

"It should come as no surprise, then, that the Flickr team recommends that this experiment in Web 2.0 cease to be characterized as a pilot and evolve to an expanded involvement in this growing community (and other appropriate social networking opportunities that may arise) as resources permit. The benefits appear to far outweigh the costs and risks. "

The entire set of tags that have been applied can be seen alphabetically or as a tag cloud of the 150 most popular tags.

Many thanks to the Library of Congress staff for taking on this project and continuously sharing their progress through their blog as well as other resources (see Appendix C) and to the authors of the project report: Michelle Springer, Beth Dulabahn, Phil Michel, Barbara Natanson, David Reser, David Woodward, and Helena Zinkham!

Hybrides à la Barbosa: ebook on Taxonomies and Folksomies now Available in French

2008 treated my ebook on Folksonomies and Taxonomies extremely well and lead to some great conversations with colleagues and clients about the 'advantages' of user tagging when approached via hybrid routes in the Enterprise that i will be sharing here with you in a future posts.

In addition to the inclusion in many publications, including DMReview I was also interviewed for a ReadWriteTalk Podcast about why i wrote the book. Alot of great feedback was received not only about the content and the message of the ebook but the gorgeous layout and format that our design team put together and for a treat our marketing department also had these great aprons made for the Taxonomy Bootcamp sessions.

This year a translated version of the ebook is out in French titled: Le Livre De Cuisine De La Taxonomie Et De La Folksonomie which i am extremely excited about because it reaches out to a whole new market for my European colleagues. (although i admit i do not speak French!)

Hope you enjoy it- Merci!

A CMS is not a Taxonomy Management Tool but a CMS Needs a Good Taxonomy

Today on a phone call, I used a point that i often use- "you can build the most 'beautiful' taxonomy ever but if you have nothing to use it for- it is not going to do you any good". One of the common uses we see for a taxonomy is to use it in conjunction with a Content Management System (CMS) and many of our existing clients have our Synaptica tool integrated into their CMS systems.

Recently at Taxonomy Bootcamp, Stephanie Lemieux from Earley & Associates and Charlie Gray from Motorola presented a great session on 'Integrating Taxonomy with a CMS for Dynamic Content' in which on slide 12 Stephanie pointed out:

~~~~~~~
Important note....
A CMS is not a taxonomy management tool
-Most requirements will not be met by the CMS, even the big players
-External tool needed to manage taxonomy versioning, scope notes, associative relationships, and more
-CMS taxonomy management is very SLOW…
---1 term with 5 synonyms & 5 translations = 3 minutes
-If the taxonomy is more than 1000 terms, an excel spreadsheet will quickly become unmanageable
---Worse if you are doing multi-lingual

~~~~~~~

The presentation went on to discuss other key aspects of taxonomy development for content management that i would encourage you to review. The reasons above that were presented as an 'important note' are just some of the reasons that many customers with robust CMS implementations use Synaptica to centrally manage their taxonomies.

In addition to the obvious core requirements in taxonomy creation and management that Synaptica covers, we also make available a little known add-on to the core product named the "Synaptica Indexing System"(IMS).

IMS is an add-on component designed to be used with the core Synaptica taxonomy and metadata management tool – and enables the human indexing of content against vocabularies stored and managed in the Synaptica system.

The Indexing Management System (IMS) can quickly be integrated with any content authoring/management tool that is already in place within your enterprise. IMS allows the content manager/indexer to search and browse the vocabularies that are stored and managed in Synaptica , dynamically building a “pick list” of indexing terms that are relevant to that piece of content.

Once the indexer completes the selection of indexing terms the IMS system passes those terms from Synaptica to the CMS to be stored as metadata. IMS can also simultaneously capture summary information about the piece of content and send it back to Synaptica to build a record within the Synaptica system itself. When IMS posts terms to the CMS it can also automatically expand the user-selected terms using related terms from the Synaptica vocabulary system. [Please see Workflow on the second page of this Spec sheet.]

In addition, editors can also submit candidate terms directly from the CMS system that will kick-off the established governance workflow for candidate terms- essentially producing a user tagging process for your key editorial staff without having to log into the Synaptica system directly to submit candidate terms.

So back to my point- the best taxonomy in the world is useless without a purpose and by having your content manager/indexers utilize a corporate wide central taxonomy that is stored in a centralized place like Synaptica, you ensure consistency and accuracy in indexing and identifying content across the enterprise.

I am always surprised when customers are blown-away by the IMS add-on and that they had never heard of that type of functionality and just today the client pointed it out why- we do have any marketing material for IMS on our product sites for this very valuable feature...so we need to fix that!

If you would like a demo of the IMS module or would like to learn more about how our other clients are using it to integrate into their CMS systems, please drop me a line daniela.barbosa@dowjones.com

Happy anniversary to me, happy anniversary to me...!

I have recently celebrated my one year anniversary with Dow Jones. It has been quite a year! I wear several hats here and that has given me the opportunity to meet a great deal of people in all areas of the business. How does it feel after a year? It feels great - I continue to be impressed with the caliber of talent exhibited by my colleagues. The domain knowledge, the business savvy, the passion for their work - it is very exciting and motivating to be surrounded by these people. 1200 Hats

Yes - I love working with the core Dow Jones teams: the product champions, the technical staff, the marketing, sales and strategy teams for Factiva, Newswires, the Wall Street Journal. Yes, I also think it's pretty cool to talk to folks in other parts of NewsCorp: MySpace, Slingshot, Fox Interactive.

Today though I want to highlight some of the people I work most closely with. I'll start with some you haven't met yet on our blog - my internally focused team of Metadata Managers who, with their teams, keep our content organized: Frances, Annika and Bouriana. Three very bright and talented women who have a significant impact on the structure of Dow Jones' Intelligent Indexing, they quietly and diligently work to improve the quality of our content indexing to ensure the most relevant documents are returned in Search and Discovery. They are the champions of new branches of our taxonomies, builders of our ontologies, curators of our primary intellectual assets. And everyone here wants to build on their work - it's a significant part of our metadata platform. Huzzah ladies - and thank you for your dedication!

Then there's Marti Heyman. Who you'd have met by now if Daniela had her way! (Only partially teasing here Marti!) Marti and I joined Dow Jones at the same time to fill the shoes of two incredible folks - Dave Clarke and Trish Yancey - who were moving on after seeing to the smooth integration of their company, Synapse, after it's acquisition by Dow Jones. I got the product side, Marti got the consulting side of Taxonomy Services. It's been my pleasure to have known Marti for several years. For a long time it's been a small world, this group of corporate taxonomists, and we've had the pleasure of speaking together, chatting on TaxoCoP calls, and now working together to take this organization to the next level, taxonomically speaking. Marti's depth of knowledge, experience, and willingness to roll up her sleeves continues to impress me. I also love that she gets a few bees in her bonnet! (Perhaps someday we'll have her tell you about why you can't use ROI as a success metric for taxonomies!)

Marti's team has been a great joy to work with too - Ian and Dan have some of the most sophisticated knowledge of practical applications of cataloging and classification I've encountered outside the academic and library world. They are a phenomenal resource for our consulting clients. And how can you not love someone who puts up thousands of Christmas trees - as Laura and her family do each and every year - with every ornament cataloged?! Now, that's a true taxonomy geek!

Of course, this being a blog for Synaptica, I cannot overlook a team that practically runs itself: Jim S., Jim D., Sean and Daniela are the folks who make Synaptica what it is. Jim and Sean are the core of our technical team, and have the ability to deliver excellent code and great customer service. Mostly, I love that they don't groan too much when Daniela and I dream up some crazy new idea! They are usually right there with us, and I appreciate their creativity and willingness to try new things with the product. Jim S. is the pillar of the team, our Product Manager, Customer Champion, Pre-Sales Support, Trainer, Chief Cook and Bottle Washer! He takes great pride in his work and is one of the best PMs it has been my pleasure to work with. What can I say about Daniela? I daresay most of you know her already. One of the next Robert Scobles, Data Portability advocate, Super Librarian, She Geek. Daniela is our Business Development Manager, and in the last year she has done more good for Synaptica and Taxonomy Services than I ever could have hoped for. She is a true customer advocate, true Dow Jones advocate, and isn't afraid to do what it takes to get the job done. I've said how glad I am to work with her before, and I'll say it again: she is a force to be reckoned with - work with her if you can!

There are so many other wonderful people here, I'm looking forward to getting to know them better. We have an incredible team, and I encourage you to reach out to them to talk shop, to talk tech, to talk business. We are one of the few companies with capabilities that run the full spectrum of content management: indexing & classification, taxonomy management, ontologies, content creation, integration, processing & delivery, archiving and user interaction; and we enjoy our work immensely. We look forward to hearing from you!

Flickr image by daintytime