November Equals Pumpkin, Taxonomy Bootcamp and Enterprise Search Summit

Note: This post was orginally published on my personal blog: "chitchatting about information delivery"

I looked at the calendar yesterday and the realization that November is upon us hit me pretty hard. With November comes a lot of pumpkin (here is my new pumpkin addiction recipe for this year) and a few speaking spots before i go out on maternity leave (yeah yeah i can't believe it's already time either!).

Just like last year and the year before i will be attending and presenting at Enterprise Search Summit and Taxonomy Bootcamp which i am really looking forward to. At Enterprise Search Summit West i have been asked to participate on a panel titled "Is Semantic Technology Real?" moderated by Rob Gonzalez from Endeca.

Is Semantic Technology Real?
10:45 am – 11:45 am  

Moderator: Rob Gonzalez, Platform Product Manager, Endeca Technologies Michael J. Cataldo, CEO, Cambridge Semantics Daniela Barbosa, Business Development Manager, Dow Jones Client Solutions, Dow Jones & Company Lorenzo Thione, Founder / Principal Program Manager, Powerset / Bing Microsoft, Inc.

Semantic technology is all the rage, sometimes even dubbed “Web 3.0.” However, many people—especially those making technology decisions for enterprises—wonder whether semantic technology has meaningful applications in the enterprise. Based on hands-on experience working with semantic tools, this panel of experts will establish the boundaries between reality and hype and help you understand what enterprises can gain from semantic technology in the here and now.

At Taxonomy Bootcamp, i have been asked to be part of a panel that Wendi Pohs is running titled "From the Lighthouse: Visioneering Taxonomies’ Future which promises to be an engaging panel on the future of taxonomies. I have been in forward looking mode for the last few months so i have been thinking about this subject quite a bit.

From the Lighthouse: Visioneering Taxonomies’ Future
4:00 pm – 5:00 pm Moderator: Wendi Pohs, Chief Technology Officer, InfoClear Consulting Daniela Barbosa, Business Development Manager, Dow Jones Client Solutions, Dow Jones & Company Jenny Benevento, Information Architect/Taxonomist, Sears Holding Company Gia Lyons, Social Business Software Consultant, Strategic Consulting, Jive Software Steve Ardire, VP Strategy & Business Development, Early Stage Semantic Technology Startups

Join Wendi Pohs and this panel of experts as they peer into the future of taxonomies. Each panelist concentrates on a specific area, including semantic management tools, consumer-driven taxonomies, social networking software, and emerging semantic technologies. We’ve asked these speakers to both enlighten and challenge you, so bring your thinking caps and questions.

It is not too late to register and you can use these codes for some discounts- Enterprise Search Registration with SPK5 code and Taxonomy Bootcamp registration with SPK5 code.

My sessions last year featured a Synaptica Case Study at Taxonomy Bootcamp titled "Proquest: Finding a Common Language: Bringing Complex and Disparate Vocabularies" (video available here) and at Enterprise Search Summit a presentation on Centralized Taxonomy Management for Enterprise Information Systems (video available here)

Looking forward to seeing everyone there including the Dow Jones Taxonomy Team (Dow Jones is a KMWorld sponsor- note: my speaking spots are not sponsored spots but by invintation of the moderators) and the Synaptica team who are exhibiting and sponsoring Taxonomy Bootcamp!


Image|Flickr| minipixel

ProQuest Celebrates Synaptica version 7.1!

Longtime customer ProQuest has a little celebration in honor of their installation of Synaptica's version 7.1.

ProQuest Taxonomy Team

Let's get a better look at that cake!

Synaptica v7.1 cake

Thank you Paula McCoy (not pictured) for providing these great shots. And be sure to catch Paula and Dave Clarke's presentation on Autoclassification vs. Human Indexing at Taxonomy Boot Camp next month.


Synaptica to be a sponsor of Taxonomy Boot Camp 2009

Synaptica LLC is proud to announce that it will be a Platinum sponsor at this year's Taxonomy Boot Camp 2009 in San Jose, California. Synaptica's new CEO, Dave Clarke and Paula McCoy of ProQuest will also be giving a joint presentation on the morning of Friday, November 20th at 8:30 AM. The presentation will compare and contrast auto-classification vs. manual content indexing and classification and how Synaptica can be a valuable part of both methods. If you haven't already, sign up today and join us in San Jose. And if you are coming, look for our table outside of the presentation area where we would love to fill you in on the recent changes with Synaptica and new developments to the software and company alike. We look forward to seeing you there!



Patrick Lambe's Survey on the Future of Taxonomy Work

Patrick Lambe has been analyzing the knowledge, skills and experience needs of the taxonomy profession for a while and as part of this his work he is conducting a survey on the present and future of taxonomy work and the needs of taxonomy professionals. Patrick is the author of a great taxonomy development book titled "Organising Knowledge: Taxonomies, Knowledge and Organizational Effectiveness" and an active writer on the topic of taxonomists and taxonomy development on his Green Chameleon blog.

Patrick, like some members of our own Dow Jones taxonomy team will also be Taxonomy Bootcamp in San Jose this November. Acording to his orginal request for survey responses, participants in the survey will also get a report of the results (which will include additional research beyond the survey).  Patrick writes: "For those of you who believe that taxonomies still have a future, this might make interesting reading, and for those of you who believe a la Theresa Regli that “taxonomies are dead”, we’d like to hear from you on why!"

You can take the survey at


Need to Create Good Work Fast? Simple - Get a New Computer

I have a problem. I have six pieces of work to write in a couple of weeks and I'm under pressure. I need the work to be spot on, of the highest quality and created in the shortest space of time.

The answer to my problem? Buy a new computer.

Does this sound strange to you? Can you see how improved output comes from a new computer?

I was sceptical, but the Sales guy said a new computer was the answer. I asked him to explain and he told me how the time I was wasting messing with my old computer was at the heart of my problem. All those lost minutes fixing crashes, worrying about blue screens, battling with slow performance, scanning for adware, spyware and worse. Forget all that was the message I was getting, move to the promised land of a newer, faster computer and your problems are solved. After a bit more chat I was sold. My new computer would save me time and that extra time would be spent devoted to my key tasks, which in turn would lead to better quality work and faster work at that. Saving time was even money in the bank for me to set against the cost of the computer - so it wasn't even as expensive as I'd thought.

At this point I excused myself, had a coffee, and thought it through one more time. Did it make sense that a new computer was my solution? The light quickly dawned, of course it didn't. A new computer wasn't the solution and time saving was not my key issue. How did the Sales guy know that time saved would be time I'd actually spend on my document tasks? How did he know the processes and tasks I'd been performing with my current computer were not valuable experiences - not to be lightly ignored. Why did he make no attempt to understand me and my circumstances and simply sell me the one size fits all Sales line that so many people still hear today?

I soon realised than I'm better off assessing my goals and objectives. What is it I need to do? For whom? Why? And when? Then I need to ensure I'm prepared and enabled to achieve them. Is my broadband connection operating? Is it fast enough? Is the right software up and running? Can I access the libraries I need?

I would also benefit from improving my time planning and management skills. I need to focus on my key tasks. What is it I need to do? What problems am I having here? I also should not forget my deliverables. What do I need to produce and how do I get there?

All these areas, when addressed in the right way, will enable my tasks and improve my outcomes. Granted, this is a little harder to sell than a new computer equals better work and a wonderful life, but surely I'm worth that extra effort and it's certainly what I need to hear.

Many of us encounter this scenario frequently. How many times have you watched a Sales presentation built around saving time? Usually a calculator is involved and sometimes members of the audience are asked to volunteer key pieces of information - "How much time do you spend searching for information in a day?", "What's your hourly rate?", "How hard do you find tracking down the information you need?" "Could you be more productive if you saved some of this time?" Very often 'time saved' is then calculated and that 'time saved' directly equated to business advantage. Very often there is little or no thought put into the needs or objectives of individual businesses or any injection of common sense into the Sales pitch.

A Dow Jones information assessment looks for the real issues and pain points our clients experience, and works with them to solve their problems and enable improved outcomes. If you have an information management issue you need assistance with, speak to us and let us work with you to get to the heart of your needs. You never know you might even save enough money to afford that new computer you've always wanted!

Passionate Geographers

I noticed a very interesting initiative recently Project Geograph: Photograph Every Grid Square.

This project is working towards collecting and making available images depicting the geography of every square kilometre of the British Isles. This ambitious project seems to be progressing very well, with many good quality images loaded to the website.

Already over 8,900 contributors have submitted nearly 1,500,000 images, with an average of 5 images associated to each geographic square across England, Wales, Scotland and Ireland.  This is a great resource, preserving in amazing detail what the British Isles looked like at the start of the 21st Century.  This is also a wonderful way to learn about the geography of these amazing islands and to dig deeply into their hills, valleys, towns and villages.  This is also a superb source for genealogists looking at how a particular part of the British Isles looks today.

Back in 2007 I attended the Blogs and Social Media Conference 2.0 in London.  One presentation which has stayed in my mind since then, was Lee Bryant's, "Engaging with Passionates". In his exceptional presentation Lee described a ground-breaking social networking case study and talked about the energy that can be released when organisations successfully tap into a group of people who are truly passionate about a given topic.

I think you'd be hard pressed to find a better example of the power of passionates than the Geograph Project.  Looking at the number of contributors, the amount of the British Isles covered, and the quality of the photography and metadata created, makes a clear point - find people who are passionate about a topic, people who are committed to a hobby or interest, engage them in the right way and they will deliver time and again.

I wish everyone associated with the Geograph Project all the luck in the world, may they stay passionate and committed to what they do, and may their project benefit from their commitment.

Oh, and if you like what you see, submit a photograph, or start a similar initiative.



Report from Digital Asset Management (DAM) Conference - London, 1 July

I spent Wednesday 1st July at the Henry Stewart DAM Conference in London.

In my slot I talked about, "Tagging Images for Findability - Making Your DAM System Work for You."  I used my 30 minutes to raise the issue of organising images using metadata and controlled vocabulary to connect the images to the people who want to use them.  I spent a little time looking at the ways to use text to categorise images and the advantages and disadvantages that brings.  I devoted a lot of the presentation to raising issues to watch out for when tagging images, in particular specificity and focus in image depictions, abstract concepts and image 'aboutness' and the deceptive simplicity of visually simple images.

A far braver presentation than mine was given by Madi Solomon. Madi ditched the PowerPoint presentation to facilitate a refreshing debate on metadata.  Questions from the floor came thick and fast.  Madi did a great job of presenting 'on the edge' and drew out the experiences of many of the attendees and the challenges they were facing.

Also of note at the conference was a very informative presentation from Theresa Regli on 'Evaluating and Selecting Technologies' and a stimulating piece from Mark Davey on the old chestnut of ROI and Digital Asset Management Systems.  Mark took a pretty dry subject and a slot directly after a good lunch and succeeded brilliantly in making it entertaining, informative and practical. Take a look at his excellent presentation Digital Asset Management ROI - the basics. I think this is a key resource for anyone interested in return on investment in the DAM space and it's fun to watch too.

I had a great day at DAM London and I hope my fellow delegates found the presentations as helpful and enlightening as I did.




Report from the ISKO Content Architecture Conference - 22-23 June, London, UK

I spent Monday and Tuesday of this week at the fascinating ISKO Content Architecture Conference.

On Monday I gave a presentation on, "Still Digital Images - the hardest things to classify and find."
My presentation looked at the image market and the ways in which images can be annotated - or is that processed, classified, categorized, tagged, keyworded… We need a controlled vocabulary to controlled the vocabulary of controlled vocabulary! 

SLA Tech Zone: Taxonomy and SharePoint -- A Powerful Combination

If you are planning to attend the upcoming SLA Annual Conference in Washington, DC, then you won't want to miss the SLA Tech Zone workshop Taxonomy and SharePoint--A Powerful Combination.


SharePoint helps your organization connect people to business critical information and expertise in order to increase productivity and reduce information overload. It achieves this by providing your employees with the ability to find relevant content in a wide range of repositories and formats. Understanding and using taxonomies within a SharePoint implementation to help users find content, is an essential part of ensuring a successful SharePoint deployment. Taxonomies can range from quite simple to very complex. In this session, we will cover the basics of evaluating what you can do to create a simple taxonomy that will yield the most benefits for your SharePoint implementations. You will have a chance to learn a range of Best Practices, from the basics of building a taxonomy to the hands-on skills of deploying that taxonomy within a SharePoint site.


This workshop is a suitable as either a quick-start or refresher in taxonomy managment for SharePoint. There are three sessions:

  • Monday, 15 June 2009 9:00AM - 10:30AM (Ticketed Event #640)
  • Monday, 15 June 2009 3:30PM - 5:00PM (Ticketed Event #660)
  • Tuesday, 16 June 2009 11:30AM - 1:00PM (Ticketed Event #805)

Price: US $35 member / US $35 non-member / US $35 student member


For details and registration information, see the SLA 2009 site.

Classifying Images Part 3: Depicted Content

Welcome back to my occasional image classification series.

The last time I raised the topic of image classification I discussed the basic attributes of images. This time I want to focus on the thornier issue of the content, or concepts, depicted in them.

There is a danger of treating an image like a piece of text and classifying its attributes: Who created it? When? What techniques were used? Then writing a title or caption and leaving it at that. Sometimes little more need be done to a document than record this kind of information, especially with free text searching, but lots more needs to be done to most images.

Image findability

Image findability is the process of using search and browse to access the images required. A major aspect of image findability relates to the things depicted in them. Image users often search for images based on the generic things in them and also the proper names of these things. Classifying images based on depicted content means considering anything and everything that is and can be depicted in an image. When considering this I like to focus my efforts on understanding the images I'm dealing with, the users who are trying to find and work with the images, and the ways in which these people need to search and browse for the images they need. After an assessment of these areas I then tailor my approach.

Broadly speaking people searching for depicted content are looking for a number of types:

  • Places: cities, towns, villages, streets...
  • Built works: parks, skyscrapers, cottages, walls, doors, windows...
  • Topography: mountains, valleys...
  • Groups and organisations: air forces, choirs, police departments...
  • People: roles, occupations, ethnicity and nationality: mothers, doctors, Caucasians, French, Germans...
  • Actions, activities and events: running, writing, laughing, smiling, birthdays, parties, book signings, meetings...
  • Objects: a myriad of items...
  • Animals and plants: common and scientific names...
  • Anatomy and attributes of people, animals and plants: arms, legs, adults, leaves, trunks, paws, tails...
  • Depicted text shown in images - often signs or writing shown in images...

Many of these generic types can also have proper named instances:

  • Proper names of people, places, buildings, topography, organisations, animals etc

When dealing with depicted content I've found some of the biggest issues to be:

  • Identification - knowing what is in an image
  • Focus and specificity - knowing what to include and what to exclude
  • Consistency - applying the same term in the same way for the same depicted content

Identification - knowing what is in an image

Depicted content is a relatively black and white area - a dog is depicted so a dog is tagged. However, it might sound a little weird, but working out what is actually in an image can be a lot harder than you think.

Take a look at the image "Do You Know What This Is?" by Sister72

This depicted content is fairly simple to see, but understanding what you're looking at is not that easy. Even if you know roughly what you're looking at, do you know what it's actually called?

One tip is to group similar images together when you're classifying them. Also, always start by assembling as much information as possible before you begin to classify images. It is especially important to gather together the information you have from the creator or custodians of the images.

Also important, when you have the luxury, is to get the image creator to add key metadata about the image at the point of creation, or soon after.

Focus and specificity

Knowing what to include and what to exclude, what to mention and what to ignore, is also much harder than it sounds.

Firstly, some image users will want a piece of depicted content tagged whenever it appears in an image, others will only want it tagged when the image shows a very good representation of that content, and of course many people will want something in between the two extremes.

Different users have different requirements. You need to understand the domain in which you're working and see the classification of depicted image content as supporting the needs of your users.

For example, Would you tag everything in this 'Messy Room' image?

What would you miss out and why?

Looking at the image of "Mountain Goats", from Thorne Enterprises

Would you tag this with goats as well as mountains? Would this be helpful?

Let's look at four images depicting windows:

'Window to the World'?,

'Portuguese Window'?, '

What Light Through Yonder Window Breaks'?



Looking at these, it soon becomes clear that even deciding to apply a simple term like 'Windows' is not always easy.

Would you apply 'Windows' to the image of the cat looking out of the window? Is a window actually depicted in that image? If the image wasn't tagged with 'Windows' how else would anyone find an image of a cat looking out of a window?

The other three images show windows as parts of buildings. but is a building always depicted? Deciding when to apply a building type or the name of a building can be hard. Should you do this every time a part of a building is shown? Only when the whole building is shown? When enough of the building is visible? Or when a section of the building that to most people would represent the build is visible? For example, what part of the Empire State Building would you consider to depict that building? Rarely does anyone see it all - how much is enough? Would you treat the images of windows in a similar way and classify them all with a building type of 'Houses', or would you ignore the structure and focus on the parts - the window, the roof?


Achieving consistent application of terms to images revolves partly around clear term definitions, well defined application rules and guidelines, and a robust quality assurance process.

Term definitions are very important. Defining the meaning of a term, and ensuring the people choosing which term to assign understand that meaning, can be crucial to term application. For example, creating a term such as 'Bow' without defining its meaning is not going to make it easy to apply.

Application rules that are well considered, thorough and clear are also very useful. Even a simple concept often needs some form of guidance linked to it. I remember a while ago needing two terms, 'Indoors' and 'Outdoors' to allow users to find images of people who were outside and inside - a simple concept you might think, one that people often need, and one that's easy to apply - who'd need guidelines for that? However, it soon became clear that guidelines were needed after I received a series of interesting questions: Is being on a train indoors? Should studio shots always be considered indoors? Does every shot of a person have to have indoors or outdoors assigned to it? If not, when should this term be used and when not? Is this a focus issue? If so, how much of a location needs to be seen before Indoors or Outdoors is used. A clear set of application guidelines followed an interesting meeting!

Strong quality assurance processes are very valuable. People make mistakes and images generate interesting issues. Appointing staff to review a percentage of classification work based on clear guidelines, and then sharing findings with the people who assigned the terms to the images, is an important way of assessing how well the image classification is progressing and keeping a classification team synchronised.

Today I’ve talked a lot about content depicted in images, next time I’ll focus on abstract concepts which are related to an images ‘aboutness’.