Top 10 Reasons the Semantic Web is A Lot Like Love

This post is one day late since yesterday was Valentines Day- but nonetheless an entertaining one that i had to share because not only is it clever, i think it also has a lot of truths in it that outline some of the current challenges that the Semantic Web community is working with. Via Scott Brinker's "Chief Marketing Technologist Blog, Top 10 reasons the semantic web is a lot like love":

10. It means different things to different people.

9. Those in it can bore everyone else to death talking about it.

8. Cynics insist there's no such thing.

7. It's straightforward in theory, messy in practice.

6. A few misinterpreted words can really screw things up.

5. You can invest a lot and not get any return.

4. Some people claim it, but don't really have it.

3. Some people have it, but don't want to disclose it.

2. The more people involved, the more complicated it gets.

1. When it works, beautiful relationships are established.

Looks like Brinker will be giving a talk on semantic marketing and moderating a roundtable on semantic advertising at the 2009 Semantic Technology Conference, on June 14-18 in San Jose- so we look forward to meeting him there!

Powerhouse Museum

As promised during the webinar, the correct link for the museum! You can search and browse a portion of the collection held in the Powerhouse Museum in Sydney, Australia. Browse around! Search, read, zoom - add tags! This is a site at which to have fun, learn new things, spark new thoughts.

View of Low Waterfall Over Rocks The team at the museum have made great use of readily available technologies and user generated content to augment the records and formal taxonomies that existed at the museum. Sites such as Flickr, tools such as OpenSearch, and licensing and use models from Creative Commons are combined to provide access to the data by as wide a variety of users as possible - users who become potential visitors; potential patrons. The design and interaction models are simple and elegant; the search and browse mechanisms are powerful; the images and image manipulation tools allow you to see great detail and experience the pieces wonderfully from afar.

If you are so inclined, I highly recommend you read through the "About" tab, and follow the links from there to the papers and blogs about how they've built the site. It's an interesting read, and a great way to be inspired - yet again - by those serving in institutions that preserve and promote our global heritage.

View of Low Waterfall Over Rocks, Powerhouse Museum Collection, The Commons on Flickr

Notes from A Session on Taxonomy Development and Digital Projects

20 degrees, light snow, 8:00 on a Sunday morning….and I’m about to do a presentation and hopefully lead a discussion on taxonomy development and digital projects for the Networked Resources and Metadata Interest Group at the ALA Midwinter conference . As I entered the area, there was only one other person and I thought that perhaps my worst fears had come true – that only the group chair and I would be present! Soon however, additional people began to arrive and by the time we wrapped up the presentation and discussion there were about 40 people present. My presentation covered definitions and examples of controlled vocabularies, from simple lists up to and including ontologies. Examples and pros and cons of each type were presented and discussed. The uses of controlled vocabularies in search and navigation were also presented and discussed. The presentation concluded with more in depth information on term structure, term relationships, notation and other general considerations when developing taxonomies and controlled vocabularies. The slide deck I used is embedded below and it is also available from the ALA conference wiki .

Following the presentation we had about a 30 minute discussion on aspects of the presentation as well as how we use and develop controlled vocabularies for clients in the Dow Jones Taxonomy Services group. There was a good discussion on the differences in the way libraries use vocabularies and do vocabulary development and the way commercial enterprises use controlled vocabularies and taxonomies. Thanks to all the attendees for your participation! please contact me with questions at laura.dorricott@dowjones.com

Upcoming Webinar: Part 1 Discover the Semantic Web

Please join us for Part 1 of a 3 part Webinar titled Discover the Semantic Web.

Date : Thursday, February 12, 2009
Times: 10:00 a.m.EST/ 7amPST OR 2:00 p.m. EST / 11am PST
Duration: 60 minutes

This webinar will focus on how organizations consume, digest, and share news and information. What we are seeing in the corporate space is that the Semantic Web is no longer 'ahead of its time' and therefore it has the potential to rapidly change how your organization manages, delivers, consumes, shares and produces content.

During Part I of this series you can learn how Semantic Web Technologies enable you to:

* Re-use valuable information to save costs in today's budget-cutting environment
* Facilitate easier collaboration and sharing of critical information across your business
* Increase search relevancy and surface the most valuable information needed to remain competitive

Christine Connors and myself will be conducting this free webinar and we welcome everyone who is interested in the subject to join us.

Who are we?

Christine Connors is the global director, semantic technology solutions for the Enterprise Media Group at Dow Jones and Company. In this position, she manages a worldwide team that is responsible for the development of taxonomies and metadata that are used to add value to Dow Jones news and financial information products, and a US-based team of software developers who support the Synaptica® software application. She also supports Dow Jones consulting practices, based in the Americas, Europe and Asia, which deliver end-to-end information access solutions based on taxonomy, metadata and semantic technologies.
You can learn more about Christine Connors or you can read some of her other blog posts on Synaptica Central .

I am the business development manager responsible for Dow Jones Taxonomy Services and Synaptica here at Dow Jones. I work with both small and large companies to deploy information strategies, including helping customers develop and manage their controlled vocabularies- from developing strategies and sharing best practices to doing custom taxonomy development.

More about me here or you can read some of my other blog posts on Synaptica Central or on my personal blog .

Leslie Owens' Forrester Report: How To Build A High-Octane Taxonomy For ECM And Enterprise Search Systems

We all know that building controlled vocabularies in enterprise settings is not simply a nice (fun?) intellectual effort. The point is to deliver value to the enterprise, whether it’s aiding in realizing revenue or ensuring full utilization of knowledge assets. In “How to Build a High-Octane Taxonomy for ECM and Enterprise Search Systems” (Forrester Nov 2008 free with registration), Leslie Owens presents a strong discussion of exactly how to go about building that “taxonomy” for your enterprise.

First and foremost, you should never engage in a “taxonomy” project in a vacuum. Ultimately, it’s all about context and knowing that context! Owens echoes this belief. We work closely with our clients to first clearly define and document the business objectives and ensure the project’s objectives are aligned with the enterprise’s objectives. The work must be done within the context of a specific set of business objectives and goals. Then we work with our clients to identify the systems which will utilize the controlled vocabularies. What are their constraints? How will they process the information? How might they display the information and surface the controlled vocabularies for the users? Speaking of users, they are the third aspect of context which we have front of mind as we work with our clients to initiate these long-term efforts. Who are they, what is their level of knowledge about the content and its use, why are they using the systems to be enhanced, how do they do their work and how do they think about the content base(s), etc.

Owens presents a good, but academic, discussion regarding the techniques for building controlled vocabularies. Ultimately, it doesn’t matter which techniques are employed or in what combination you use them as long as you’re effective and the outcome is a controlled vocabulary which can be deployed, maintained and re-used for long-term business value. In our experience, most engagements involve utilizing a hybrid of techniques in an iterative process. It’s not worth getting caught up in labeling the technique (even for those of us often referred to as “the word police”).

If you’re considering initiating a “taxonomy” program within your enterprise, or you’re already well down the road, we strongly recommend reading Owens’s report. You will glean good information to guide you in your process. One last thought I would leave you with; no matter how “simple” you think this effort is, it should not be undertaken believing it is a “once and done” project. You are initiating a long-term program which should always remain as an active component in your ECM, search, knowledge management, or whatever larger appropriate program.

Enterprise Finding with the Semantic Web

[note:this post was originally posted on my personal blog ]

Very similar to the possible benefits of the Semantic Web in what i like to call the research 'finding' environment in corporate enterprises and perhaps one of the most illustrative examples of the value of the Semantic Web i have seen in a while, this post titled Semantic Web in Education by Jason Ohler a professor of Educational Technology and Distance Learning at the University of Alaska paints a illustrative picture of some of the values end-users/consumers can derive from the Semantic Web. Flip it from an education research environment that Olhler is addressing to an enterprise finding environment and you have an interesting use case for why enterprises should look closely at the promise of the semantic web.

"One vision of a well-developed semantic web includes a search feature that would return a multimedia report rather than a list of hits. The report would draw from many sources, including websites, articles from scientific repositories, chapters in textbooks, blog dialogue, speeches posted on YouTube, information stored on cell phones, gaming scenarios played out in virtual realities-anything appropriate that is accessible by the rules of Web 3.0. The report would consist of short sections that coalesce around knowledge areas that emerged naturally from your research, with keywords identified and listed conveniently off to one side as links.

The information in the report would be compared, contrasted, and collated in a basic way, presenting points of agreement and disagreement, and perhaps associating these with political positions or contrasting research. Because the web knows something about you, it also alerts you to local lectures on related topics, books you might want to read, TV programs available through your cable service, blog discussions you might find relevant, and even local groups you can contact that are also focused on this issue. Unlike a standard report, what you receive changes as the available information changes, and you might have wiki-like access to add to or edit it. And because you told your agent that this topic is a high priority, your cell phone will beep when a significant development occurs. After all, the semantic web will be highly inclusive, providing a common language for many kinds of media and technologies, including cell phones. The net result, ideally, is that you spend less time searching and sifting and more time absorbing, thinking, and participating."

Starting in February, Christine Connors and I will be conducting a three part Webinar titled 'Discover the Semantic Web' that will address some of these enterprise specific opportunities to leverage the Semantic Web.

Image|Flickr|AlexBarros

Taxonomy Development Discussion at ALA

Our global taxonomy consultants are always willing and ready to participate in conferences, local meetings and any other get togethers that they can get to and they often are asked to participate and lead discussions/presentations around different topics. (of course when they are not working on client engagements!!)
ALA At this year's ALA Midwinter 2009 (American Library Association) meeting as part of the Networked Resources and Metadata Interest Group (NRMIG) there will be a discussion on taxonomy development on Sunday, January 25, from 8-10 a.m.The discussion will be led by our very own Laura Dorricott.
Laura DorricottLaura is a Project Delivery Manager in our Dow Jones Taxonomy Services group and has many years of experience doing taxonomy and indexing projects including Taxonomy/thesaurus assessment, development and design.

Over on the Metadata Blog , the official blog of the Networked Resources and Metadata Interest Group (NRMIG), they are requesting input in advance on topics that should be covered- so if you are attending or have some 'remote' thoughts for Laura feel free to add your comments either here or there!

A request was already added by Diane Hillmann that she would " like to hear Laura discuss the differences she sees in the way libraries view vocabularies and vocabulary development and the way commercial entities view these issues. Is there something we can learn from those differences? I'd also like to hear a bit about how the commercial sector evaluates return on investment for this kind of development.".

Sounds like a great discussion and i can't wait for Laura's report from the field!

Global Report on the State of Blogging Librarians

Many readers of the SynapticaCentral blog are not professional librarians - but some of us are so i am sharing with you something that i just published on my personal blog that you might find interesting around the state of blogging Librarians around the world.

Librarians are no strangers to the blogging world and some of the earliest blogs i personally started to followed where indeed written by Librarians. The Librarian in Black blog by Sarah Houghton-Jan a Digital Futures Manager for the San José Public Library is definitely one that i have been keeping an eye on in a Library topic folder in my RSS reader over the years and always find interesting and valuable information.

Recently Sarah wrote a post on her blog that caught my attention about a new book published by LibWorld titled "Library Blogs Worldwide" in which she has written a chapter covering the United States Library blogging community (page 187).

The book was published via the Infobib LibWorld project and is available as a free download or a paperback from Lulu online publishers.

It offers thirty commentaries by local librarians on the state of library-related blogs in 29 countries! The forward by Walt Crawford provides a good summary of the volume and diversity that global library blogs offer.

In Sarah's chapter she asks the question "which came first, the librarian blog or the library blog?" She writes that the librarian did because they "began sharing information with each other and as more of us saw the power of the blogging medium, we began adopting the same techniques at work.". I agree with that answer and it probably mimics a lot of other industries and professions who have adopted blogging as part of their work 'duties' in the same manner.

Another good resource for comprehensive lists of blogs is the Blogging Libraries Wiki and an output of the LibWorld report is available from the LibWorld delicious account where they have tagged all the blogs mentioned in the book by country.

We should think about putting together a comprehensive book that cover global bloggers who are taxonomists, ontologists and linguistists !

Library of Congress Reports on Flickr Pilot

[This post is cross-posted on my personal blog]

Last month the Library of Congress released their report on their ongoing Flickr project that i have been very interested in and have written about as the project progressed. From their blog post on the report:

"Only nine months into the Library of Congress’ pilot project placing Library photos on the Web site Flickr, the photos have drawn more than 10 million views, 7,166 comments and more than 67,000 tags, according to a new report from the project team overseeing the lively project."

“The popularity and impact of the pilot have been remarkable,” said Michelle Springer, project manager for digital initiatives in the Office of Strategic Initiatives, who said total views reached 10 million in October. The site is averaging 500,000 views a month, she said, adding that Flickr members have marked 79 percent of the photos as “favorites.”

A summary of some of the outcomes:

  • Increasing awareness of the digital photograph collection the Library of Congress (LC) has which has been available for years on the Library's website turning to not only an engaged audience but a lot of referral traffic to the Library's Website. "Feedback of this nature suggests that as a result of this project the Library is reaching new audiences—people who did not or could not find this material on our own site, and people who never thought to look here. "
  • Gain a Better Understanding of Social Tagging and Community Input (see below for more details)
  • Pilot helped the LC staff gain experience with Web 2.0 online interactions with 'patrons'

Since the beginning of the project i have been very interested in learning about some of the outcomes that the project would provide in regards to user tagging versus applied controlled vocabulary through traditional bibliographic cataloging. In the report the share that they used the Flickr API to do deeper analysis of the tagging that was done by the community (see pages 19-24 of the full report) based on nine categories that provided some interesting insight focused on issues commonly cited in comparisons of social tagging vs. assignement of controlled vocabulary terms(page 28). The categories analyzed were:

I. LC description-based (words copied from the Library-provided record): e.g., titles,
names, subjects, etc.

II. New descriptive words (words not present in the Library-provided description):

  • Place: e.g., cities, counties, countries, natural feature names
  • Format (physical characteristics of the original photos). Sample tags: LF, large format, black and white, bw, transparencies, glass plate
  • Photographic technique. Sample tags: shallow depth of field
  • Time period. Sample tags: wartime, WWII, 1912
  • Creator name: e.g., photographer’s name

III. New subject words (words not present in the Library-provided description):

  • Image (items seen in the image itself). Sample tags: cables, trees, apples, windows, hat, yellow
  • Associations/symbolism (phrases and slogans evoked by the image). Sample tags: Rosie the riveter, Norman Rockwell, We can do it!
  • Commentary (revealing the tagger’s value judgments). Sample tags: Sunday best,
  • proud, dapper, vintage.
  • Transcription (transcribing words found in items such as signs, posters, etc., within the photo)
  • Topic (terms that convey the topic of the photo). Sample tags: architecture, navy, baseball, story
  • Humor (tags intended to be humorous rather than descriptive) Sample tags: UFO, flying saucer

IV. Emotional/aesthetic responses: (personal reactions of the tagger). Sample tags: wow,
pretty, ugly, controversial

V. Personal knowledge/research (tags that could only have been added based on knowledge or research by the tagger, and that could not have been gleaned solely from the description provided or examination of the photo): For example, the tag murder used on a portrait of someone who was later murdered or tags added for the specific county when that information was not part of the description.

VI. Machine tags (added by the community not Library-supplied): e.g., geotags and Iconclass tags

VII. Variant forms (representing terms already tagged but in a different form, such as synonyms (e.g., WW2, WWII, World War II, worldwarii) or plural/singular differences (e.g., transparency/transparencies)

VIII. Foreign language (tags in foreign languages/scripts, whether they are translations of English-language tags, or new tags)

IX. Miscellaneous (tags that are not readily understood, that provide corrections to LC descriptions or to other taggers (e.g., not peaches), or tags later removed

Some of the Future Tag Analysis Interests (page 29) are also quite interesting such as actually incorporating popular concepts or variants into the LC's own controlled vocabularies (yeah something i advocate in the hybrid approach!), bringing the tags into the LC's search environment, populate bibliographic records with tags (although that have already added the Flickr URL to the "additional version available" field (MARC field 530) in some catalog records which leads users to the appropriate Flickr page that might provide historical information etc. on the image that is vaulable- see sample on page 36).

In the report they also share some of the experiences the staff learned from using Web2.0 tools in interacting with patrons that might be different from the traditional reference desk exchanges (page 37).

The good news? Skip to page 38 of the full report to see the recommendations and conculsions including details of headcount that is necessary for the program to continue and expand. But the report ends with the following good news:

"It should come as no surprise, then, that the Flickr team recommends that this experiment in Web 2.0 cease to be characterized as a pilot and evolve to an expanded involvement in this growing community (and other appropriate social networking opportunities that may arise) as resources permit. The benefits appear to far outweigh the costs and risks. "

The entire set of tags that have been applied can be seen alphabetically or as a tag cloud of the 150 most popular tags.

Many thanks to the Library of Congress staff for taking on this project and continuously sharing their progress through their blog as well as other resources (see Appendix C) and to the authors of the project report: Michelle Springer, Beth Dulabahn, Phil Michel, Barbara Natanson, David Reser, David Woodward, and Helena Zinkham!

Hybrides à la Barbosa: ebook on Taxonomies and Folksomies now Available in French

2008 treated my ebook on Folksonomies and Taxonomies extremely well and lead to some great conversations with colleagues and clients about the 'advantages' of user tagging when approached via hybrid routes in the Enterprise that i will be sharing here with you in a future posts.

In addition to the inclusion in many publications, including DMReview I was also interviewed for a ReadWriteTalk Podcast about why i wrote the book. Alot of great feedback was received not only about the content and the message of the ebook but the gorgeous layout and format that our design team put together and for a treat our marketing department also had these great aprons made for the Taxonomy Bootcamp sessions.

This year a translated version of the ebook is out in French titled: Le Livre De Cuisine De La Taxonomie Et De La Folksonomie which i am extremely excited about because it reaches out to a whole new market for my European colleagues. (although i admit i do not speak French!)

Hope you enjoy it- Merci!