SLA Tech Zone: Taxonomy and SharePoint -- A Powerful Combination

If you are planning to attend the upcoming SLA Annual Conference in Washington, DC, then you won't want to miss the SLA Tech Zone workshop Taxonomy and SharePoint--A Powerful Combination.


SharePoint helps your organization connect people to business critical information and expertise in order to increase productivity and reduce information overload. It achieves this by providing your employees with the ability to find relevant content in a wide range of repositories and formats. Understanding and using taxonomies within a SharePoint implementation to help users find content, is an essential part of ensuring a successful SharePoint deployment. Taxonomies can range from quite simple to very complex. In this session, we will cover the basics of evaluating what you can do to create a simple taxonomy that will yield the most benefits for your SharePoint implementations. You will have a chance to learn a range of Best Practices, from the basics of building a taxonomy to the hands-on skills of deploying that taxonomy within a SharePoint site.


This workshop is a suitable as either a quick-start or refresher in taxonomy managment for SharePoint. There are three sessions:

  • Monday, 15 June 2009 9:00AM - 10:30AM (Ticketed Event #640)
  • Monday, 15 June 2009 3:30PM - 5:00PM (Ticketed Event #660)
  • Tuesday, 16 June 2009 11:30AM - 1:00PM (Ticketed Event #805)

Price: US $35 member / US $35 non-member / US $35 student member


For details and registration information, see the SLA 2009 site.

Classifying Images Part 3: Depicted Content

Welcome back to my occasional image classification series.

The last time I raised the topic of image classification I discussed the basic attributes of images. This time I want to focus on the thornier issue of the content, or concepts, depicted in them.

There is a danger of treating an image like a piece of text and classifying its attributes: Who created it? When? What techniques were used? Then writing a title or caption and leaving it at that. Sometimes little more need be done to a document than record this kind of information, especially with free text searching, but lots more needs to be done to most images.

Image findability

Image findability is the process of using search and browse to access the images required. A major aspect of image findability relates to the things depicted in them. Image users often search for images based on the generic things in them and also the proper names of these things. Classifying images based on depicted content means considering anything and everything that is and can be depicted in an image. When considering this I like to focus my efforts on understanding the images I'm dealing with, the users who are trying to find and work with the images, and the ways in which these people need to search and browse for the images they need. After an assessment of these areas I then tailor my approach.

Broadly speaking people searching for depicted content are looking for a number of types:

  • Places: cities, towns, villages, streets...
  • Built works: parks, skyscrapers, cottages, walls, doors, windows...
  • Topography: mountains, valleys...
  • Groups and organisations: air forces, choirs, police departments...
  • People: roles, occupations, ethnicity and nationality: mothers, doctors, Caucasians, French, Germans...
  • Actions, activities and events: running, writing, laughing, smiling, birthdays, parties, book signings, meetings...
  • Objects: a myriad of items...
  • Animals and plants: common and scientific names...
  • Anatomy and attributes of people, animals and plants: arms, legs, adults, leaves, trunks, paws, tails...
  • Depicted text shown in images - often signs or writing shown in images...

Many of these generic types can also have proper named instances:

  • Proper names of people, places, buildings, topography, organisations, animals etc

When dealing with depicted content I've found some of the biggest issues to be:

  • Identification - knowing what is in an image
  • Focus and specificity - knowing what to include and what to exclude
  • Consistency - applying the same term in the same way for the same depicted content

Identification - knowing what is in an image

Depicted content is a relatively black and white area - a dog is depicted so a dog is tagged. However, it might sound a little weird, but working out what is actually in an image can be a lot harder than you think.

Take a look at the image "Do You Know What This Is?" by Sister72

This depicted content is fairly simple to see, but understanding what you're looking at is not that easy. Even if you know roughly what you're looking at, do you know what it's actually called?

One tip is to group similar images together when you're classifying them. Also, always start by assembling as much information as possible before you begin to classify images. It is especially important to gather together the information you have from the creator or custodians of the images.

Also important, when you have the luxury, is to get the image creator to add key metadata about the image at the point of creation, or soon after.

Focus and specificity

Knowing what to include and what to exclude, what to mention and what to ignore, is also much harder than it sounds.

Firstly, some image users will want a piece of depicted content tagged whenever it appears in an image, others will only want it tagged when the image shows a very good representation of that content, and of course many people will want something in between the two extremes.

Different users have different requirements. You need to understand the domain in which you're working and see the classification of depicted image content as supporting the needs of your users.

For example, Would you tag everything in this 'Messy Room' image?

What would you miss out and why?

Looking at the image of "Mountain Goats", from Thorne Enterprises

Would you tag this with goats as well as mountains? Would this be helpful?

Let's look at four images depicting windows:

'Window to the World'?,

'Portuguese Window'?, '

What Light Through Yonder Window Breaks'?



Looking at these, it soon becomes clear that even deciding to apply a simple term like 'Windows' is not always easy.

Would you apply 'Windows' to the image of the cat looking out of the window? Is a window actually depicted in that image? If the image wasn't tagged with 'Windows' how else would anyone find an image of a cat looking out of a window?

The other three images show windows as parts of buildings. but is a building always depicted? Deciding when to apply a building type or the name of a building can be hard. Should you do this every time a part of a building is shown? Only when the whole building is shown? When enough of the building is visible? Or when a section of the building that to most people would represent the build is visible? For example, what part of the Empire State Building would you consider to depict that building? Rarely does anyone see it all - how much is enough? Would you treat the images of windows in a similar way and classify them all with a building type of 'Houses', or would you ignore the structure and focus on the parts - the window, the roof?


Achieving consistent application of terms to images revolves partly around clear term definitions, well defined application rules and guidelines, and a robust quality assurance process.

Term definitions are very important. Defining the meaning of a term, and ensuring the people choosing which term to assign understand that meaning, can be crucial to term application. For example, creating a term such as 'Bow' without defining its meaning is not going to make it easy to apply.

Application rules that are well considered, thorough and clear are also very useful. Even a simple concept often needs some form of guidance linked to it. I remember a while ago needing two terms, 'Indoors' and 'Outdoors' to allow users to find images of people who were outside and inside - a simple concept you might think, one that people often need, and one that's easy to apply - who'd need guidelines for that? However, it soon became clear that guidelines were needed after I received a series of interesting questions: Is being on a train indoors? Should studio shots always be considered indoors? Does every shot of a person have to have indoors or outdoors assigned to it? If not, when should this term be used and when not? Is this a focus issue? If so, how much of a location needs to be seen before Indoors or Outdoors is used. A clear set of application guidelines followed an interesting meeting!

Strong quality assurance processes are very valuable. People make mistakes and images generate interesting issues. Appointing staff to review a percentage of classification work based on clear guidelines, and then sharing findings with the people who assigned the terms to the images, is an important way of assessing how well the image classification is progressing and keeping a classification team synchronised.

Today I’ve talked a lot about content depicted in images, next time I’ll focus on abstract concepts which are related to an images ‘aboutness’.

Content Based Image Retrieval - Google and Similar Image Search

I was very interested to see Google experimenting with visual similarity in still images, what I usually call Content Based Image Retrieval or CBIR.

Google Labs have just launched an image search function based on visual similarity - Google Similar Images. This new offering allows searchers to start with an initial image and then find other images that look like their example picture.

I've been reviewing these type of systems on and off since the early '90s. They've always offered much, but I never saw any evidence that the delivery matched the hype.

I've always found that using pictures instead of text to find images works best on simple 2d images: carpet patterns, trademarks, simple shapes, colours and textures. Finding objects in images was always a struggle, and looking for abstract concepts: fear, excitement, gloom, isolation, solitude.. was never been more than a vague possibility. Over the years a lot of work has been done in this area, and the search results I've seen have started to improve, but this technology is still young, and in my personal opinion still rarely delivers what most users want, need and expect.

Looking at Google Similar Images, I wonder how much of the back-end is pure content based image retrieval (CBIR), how much is using metadata in some way, and how the two are interacting? One thing that appears to be helping to often show a tight first page of results, is simply pulling the same image from different sites. I also noticed that the 'similar images' option is not available for all images - which makes me wonder why? Have some images been processed in ways that others haven't?

Google Similar Image - 5

Diving right into the experience, I entered a query for a place in the UK and didn't see any image results with the 'Similar Images' option. I wonder whether this is to do with the presence of the results on UK websites?




I persevered, and found some interesting images and got some interesting results.

Google Similar Images 1 - beachI started with a fairly standard image of a beach scene, always a favourite with testers. As you can see I got a pretty good first screen back. However, the 5th and 6th image on the top row show no sea or beach, neither do the first three images on the second row.


Google Image Search 2 - Pole

I moved on to an image of what looks like equipment at the top of a pole.

The results were much more mixed: studio shots of objects, fighting people, trucks etc. No images were returned that I would consider similar to the example picture.

Google Similar Images 3 - clock face

Interesting results came from a similarity query on a clock face.  A couple of the first results hit the mark, then the results set degenerated into image similarity based more on the colour and the black background than anything else.


Google Similar Images 4 - roadMy last attempt, before morning coffee called, was an image of a country road. I was hoping that the clear roadway might produce a pretty precise results set. However, I was a little disappointed by what I saw.

The first results page only produced one vague road on the bottom row, with most of the similarity seemingly related to colours instead of objects.

From my less than scientific dip into this Google Labs offering, it looks like the highlighted images on the Google Similar Images home page produce good results - better results than I've seen other systems come up with. Many other image queries are sure to also produce results which may well impress. However, many of the results I saw did not match the initial level of accuracy I saw from the highlighted home page pictures.

I don't want to be picky, this is still a prototype after all, and well done to Google for introducing a wider audience to this type of image search. Hopefully, after more work, the results will increasingly make more sense to people, the access points offered to depicted content and conceptual aboutness will improve and more images will be more findable for more people.

Until that time, visual search without text will help with image findability, but text, metadata, and controlled vocabulary applied to images by people is for me still king, and will continue to offer the widest and deepest access to images for a long time to come.




Taxonomy is key to Effective ECM

I recently attended a seminar on the 10 Steps to Business Efficiency with Content, Collaboration and Process given by the good people at AIIM ( all about ECM strategies and best practices. This was a free seminar, well organized and well attended by a broad spectrum of representatives from all types of organizations, large and small, new and old industries. The topics of discussion too ranged from the most effective way to digitize archival assets; to applications to better allow for federated search across various data repositories; and then there was certainly a lot of discussion around what has become the most ubiquitous of ECM type applications, Microsoft SharePoint.

There were of course the usual quotes and statistics from AIIM, Forrester and Gartner regarding information proliferation and management today: The amount of data being produced is doubling every 18 months; 80% of this data is unstructured and 90% of that is entirely un-managed.

An interesting quote that I will paraphrase here was attributed to Thomas Washington , "The pursuit of knowledge in an age of information overload is less about the process of acquisition than it is about a proficiency of tossing things out." And regarding the storage of all of this information another interesting fact was thrown out: while 1 GB of storage may now cost an average of 20 cents, it costs $3,500 to review that same 1 GB of data and start to make sense of it in the context of your business. (AIIM)

As I listened to the various presentations and vendors I was struck by one thing: none seemed to offer a unified solution for using taxonomy more effectively to structure, classify and categorize the content that was going into these vast data repositories. Certainly it was agreed that there was value to such a process, but it is something that many organizations have still not recognized as absolutely necessary to fundamentally improve the tagging, organization and discovery of information within these huge libraries of data, documents, and other media.

It is our opinion that the integrated use of taxonomy applied to ECM applications, as well as across the rest of the enterprise, using a centralized and standardized set of vocabularies for navigation, search, discovery, meta-tagging and many other applications is a necessity in moving towards a unified means of data normalization and discoverability. To achieve this we offer services to get companies started as well as tools like Synaptica with out-of-the-box integrations to tools like SharePoint, but also more generic means of integrating with external applications via simple APIs and Web Services.

As the proliferation of data only increases over time and the means of digitizing archival records or utilizing native electronic formats becomes more efficient, storage becomes less a matter of cost and more a matter of management. The efficient means of identifying, tagging, categorizing and sorting information will be key to the effective operation of any organization.

A couple months back, my colleague also wrote up the 10 Rules of Successful ECM Implementation after attending an AAIM seminar that we have found quite useful in talking to business and technology owners about content access strategies.

We see many of our customers at the forefront of addressing these issues and working with them, we continue to work towards providing better and easier ways for data managers and end users alike to find what they are looking for. We look forward to sharing some of these use cases as well as hear from you on your successes and struggles!

Image| Flickr | ul Marqa

Working on the Problems of Information Overload

Our colleagues at Factiva have decided to start their own blog, Let's Talk Knowledge - congrats gang! Might I humbly say that they were inspired by Synaptica Central, and thank YOU, our readers, for helping make this enough of a success as to inspire others here at Dow Jones!

Ken kicks off this new endeavor with some interesting thoughts on "Information Overload." The problem, he posits, is not that we have too much information, but have not invested enough in the tools to manage and analyze the information. IT investments have focused on creating, storing and distributing information (and, I would argue, retrieving it), but NOT in how we analyze or synthesize it. As Daniela and I have argued here on Synaptica Central in posts on data visualization and our Semantic Webinar, that is truly an area that needs work!

Let's be clear though: it's not about the latest widgets! It's not about the pretty, sparkling thing that just flashed in the browser window! It's about data that can be used and re-used in any form - be it sparkly, conservative, mobile or consumed by machines. Putting time and effort into crafting the models for this data can reduce confusion, reduce time spent towards rules-writing or custom query building, and allow a greater diversity in a product portfolio from the same data set: not just in delivery channels, but in customer focused channels as well. Employees are more productive, prouder of their work and motivated to innovate. Customers get what they need - they reduce information overload as well as relevancy overload - and are much happier and more loyal for it.

We're working hard here at Dow Jones to integrate and evolve our data models to full take advantage of the semantic web. Taxonomies and thesauri are becoming ontologies; lightweight ontologies are being transformed to take advantage of the full power of RDF, OWL and SKOS. Welcome to the blogosphere Ken, thanks for your thoughts, and we look forward to continuing to innovate with you and your team!

Why Taxonomy Matters to Sales Organizations

Here within the Dow Jones Enterprise Media group we deal with almost every kind of enterprise employee catering to their specific information delivery and management needs. From the PR Communications teams who are monitoring what people are saying about their brands across various mediums; Sales and Marketing that need to keep on top of their clients, competitors and industries; Researchers and Knowledge Managers who need depth and breath for their research and of course Investment Bankers and Traders who rely on timely and accurate content. These are just a few of them and if you are not aware of all the diverse tools and solutions we offer, our new Dow Jones Solutions site now gives you a comprehensive look by different roles, business needs and products.

Lucky for us in the Synaptica and Taxonomy Services team, we typically get to work across all different types of clients whose needs can vary drastically specifically on how they are leveraging controlled vocabularies. In future posts, I will be highlighting some of those other segments but for this post i am going to highlight "Why Taxonomy Matters" to Sales Professionals and Relationship Managers.

Have you ever tried to have a conversation about taxonomy with a sales or sales operation person in your company? Well if the conversation didn't turn to taxidermy you are probably lucky! So here are some things you can bring up next time you are in a meeting with your Sales teams and are trying to convince them that Taxonomy Matters to them- try these on for size:

Product Alignment – To better target the right customer, a taxonomy solution provides the backbone to build a faster pipeline

Territory Alignment – Segmenting companies based on region, industry, company size as well as other available categories will increase the likelihood to find the right audience

Field Sales – Combining the accurate information from product and territory alignment with the proper event triggers, sales forces will be able to pinpoint when to discuss a particular product to a potential customer

Want some more use cases for why taxonomy matters to sales? contact us or use the contact form from this blog.

Taxonomy and SharePoint : Exercises and Video From Session

Last night we conducted our Taxonomy and SharePoint a Powerful combination session for our clients in Asia Pacific . The first time we did this session was at the SLA (Special Library Conference) Annual Conference with a sold out Tech Zone we conducted . Then last September we did a webinar version that was very well attended and received great feedback. The feedback and requests for more sessions have been so great that this June at SLA, I will be doing another advanced on site session on the topic.

Based on feedback from our colleagues in Asia Pacific that SharePoint is taking off in their region which has prompted a lot of questions on taxonomy, we coordinated another live session last night specifically for that local audience with my colleague K Biju.

We are now pleased to also provide you with a recorded version of the learning session (view full screen below). These are the live exercises that i went through during yesterdays webinar as well as the download sheet of the exercises we recommend you download and follow along to learn about how to implement taxonomies in SharePoint in your own environment.

We hope you find this video and exercises useful and look forward to your feedback. Please feel free to leave you comments below or contact me at

PLEASE: view in full screen by clicking full screen button below.

Dow Jones Webinar Taxonomy and SharePoint - A Powerful Combination : Asia Pacific

Last fall our Taxonomy Services group conducted a very successful Webinar focused on Taxonomy and SharePoint. Based on requests from our customers, next week we are conducting a similar session for our customers in Asia Pacific. Much like what my North American and European colleagues share with me about their regions, SharePoint adoption seems to have reached a tipping point in Asia Pacific and thus, this session will be timely and very relevant to our Asia Pacific customers.

So if you are located in the Asia Pacific region and would like an introduction to Dow Jones’ taxonomy services and a better understanding of the use and benefits of taxonomies within SharePoint, you can attend our upcoming Webinar. This practical session will demonstrate ways in which you can combine the simplicity of Sharepoint and richness of taxonomy to solve your complex information challenges.

During this session you will learn some of the basic ways to manage controlled vocabularies using standard out of the box features that you can use immediately as well as learn about our Synaptica integration into SharePoint .

Date: Wednesday 25th March

Time: 10.00am Singapore, Hong Kong

11.00am Japan
1.00pm Sydney
2.00pm New Zealand

We welcome everyone who has implemented or is looking to implement a SharePoint solution or in fact anyone who is keen on this subject to join us by registering for this session by sending an email to

I look forward to 'seeing' you there!

Semantic Webinar: Part 1 Discover : The Attendees

Over 1,300 people registered for Part 1 of the Semantic Web Webinar Series: Discover the Semantic Web and both Christine Connors and myself were delighted with the attendance and the feedback we received!

Like Christine explained during our introduction, as attendee registrations were coming through we regularly reviewed the list, wanting to make sure that we were writing for our actual audience. What we found were strong numbers in three types of business role in particular – and happily for us, the three categories we most wanted to engage as part of this series as described below.

To further validate the attendees and get a feeling for where our enterprise customers are, during the webinar the first poll that we conducted was a simple one to record where attendees were in regards to their experience with semantic web. The following results are the aggregate of the two sessions:

When you think about Semantic Web, what is the first thought that pops into your head?
61.2 % > I am new to the Semantic Web and Interested in Learning More
26.4% > I am currently knowledgeable on the Semantic Web and would like to expand this knowledge
12.4% > I am currently involved in Semantics-related project in my company

Being that part one of the series was titled DIscover, the target level of expertise was perfect!

Looking through the final list of registered attendees, I took their titles and made the following Wordle to provide a visualization of the titles of the attendees. The word 'Manager' certainly appeared a lot (many times with additional information like "Manager of Information Center") as well as Director hence the larger presence of those terms, but here is what it looks like:

webinar attendees

The three categories we most wanted to engage as part of this series were the following categories and the majority of the attendees could be bucketed into them:

Library and Information Science Professionals
We had a good number of folks joining with library and information science backgrounds. Christine and I share that background, and Dow Jones is a great supporter of info pros . We believe that Librarians have a critical role to play in the future of the semantic web. The ability to provide access to comprehensive, annotated, carefully curated data in the highly collaborative environment that is the web has never been more vital. We can’t think of a profession more suited to the task. Nor can we think of anyone in our semweb social network who doesn’t love librarians!

Information Architects
Next had information architects – user experience designers, interaction designers, taxonomists – Big IA, Little IA – however you choose to define them people who are passionate about delivering easy to use interfaces between humans and information. And the bottom line? WE NEED YOU! The semantic web presents a new paradigm for data and tools. We need creative, practical people to think about new patterns for interacting with data on the web.

Business Champions
Finally, we had the business champions – executives, analysts, program managers. You may have heard that the semantic web is a solution in search of a problem. And as we will be covering in this Series of webinars, the semantic web is a potential solution for existing problems. We’ve seen this shift before: we didn’t need email – we had fax machines to move documents quickly while maintaining a paper trail. But we are willing to bet that you wouldn’t like it if they took away your inbox (well, permanently anyway!). Business Champions are doing their organizations a disservice if they simply dismiss a semantic solution. As we discussed in the webinar, put in in the mix of options to consider; occasionally it won’t yet work, sometimes it will be a stretch, but often it will be a perfect fit.

Missed Part 1: Discover? The Slide deck is embedded below and the recorded Webinar is also available on demand .

What's next?

We are preparing for Part II [registration info coming soon]-

Send us your requirements! Your use cases! Your challenges! We’re ready and waiting!

All Posts related to this 3 part series are available under the Semantic Web Webinar Series Topic of Interest.

A Map for the Semantic Web

A very interesting Flickr image came across my desktop this morning. It's a cool visual posted by Steve Jurvetson showing many of the different parts of the semantic web. I missed it the first time around, but the tiny copyright statement at the bottom of the original size indicates Tim Berners-Lee as the IP holder. Should be especially interesting to Tolkien fans! Mat to the Semantic Web

If you're interested in learning more about the semantic web, this is an interesting place to start - especially if you are a visual thinker. Do some searching on the topics mapped out here - Vannevar Bush, Logic (First and Second order logic), protocols and markup languages, RDF, OWL, GRDDL - whatever strikes your fancy. Let me know what you think!

Map to the Semantic Web on Flickr