VideoSurf - a new way to search for video?

If you have been keeping up with my posts on this blog you won't be surprised to learn that today I spent my lunch hour exploring a video search offering that's new to me called VideoSurf. I was so interested in this new search tool that I interrupted my usual run of image indexing articles, and my lunch hour, to do some research and write up this post.

In a September press release VideoSurf claimed its computers can now, "see inside videos to understand and analyze the content." I would encourage anyone who has an interest in this area to take a look at the company's website, give it a whirl and see what they think.

Watch Vampire Videos Online - VideoSurf Video Search

In my experiences video search engines have relied on a combination of the metadata that is linked to the video clips, scene and key frame analysis, and automatic indexing of sound tracks synched with the video.

For example, sound tracks, synchronised to video content, can be transformed to text and indexed and then can be linked to sections of videos by looking for gaps in the video to identify scenes, with various techniques also used to create key frames, that attempt to represent a scene. These techniques are backed up with metadata to accompany a video clip.

If you have worked in the industry you know that video metadata is expensive to create. Most of what people see online is either harvested for free from other sources, or limited in size and scope. Such metadata may cover the title of a video clip, text describing the clip, clip length .etc. It may even include some information about the depicted content in the video or even abstract concepts which try to specify what a clip is about. Though this level of video metadata is the most time consuming and complex to create - it also offers the fullest level of access for users.

Audio tracks can be also be of great use and many information needs can be met by searching on audio in a video. There are however limitations; for example many VERY SCARY scenes have little dialogue in them, and depend heavily on camera-work and music to give the feeling of fear, how easy is it to find these scenes based on dialogue alone, or even based on 'seeing inside a video'. How can you look for 'fear' as a concept?

Content based image retrieval, looking at textures, basic shapes, and colours in still images, has yet to offer the promised revolution in image indexing and retrieval. In some contexts it works quite well, in many contexts end-users don't really see how it works at all. So adding a layer to video search that tries to analyse the actual content, pixel for pixel is an interesting development.

To my mind, a full set of access paths to all the layers of a video still demands the use of fairly extensive metadata, especially for depicted content and abstract concepts. Up to now, metadata has always been the way to find what an image, whether it's still or moving, is conceptually about, and what can be seen in individual images and videos. Even when that metadata is actually sounds, turned into text and stored in a database.

Is VideoSurf's offering really any different from what's gone before?

Is this system, which seems to be using Content-Based Image Retrieval (CBIR technology to some extent, a significant advance?

Reviewing some of the blog posts people have published it seems many others are interested in VideoSurf's offering as well.

For an initial idea as to how VideoSurf works, try taking a look at James McQuivey's OmniVideo blog post, "Video search, are we there yet?-. As James describes in the article, one pretty neat aspect of what VideoSurf can do is to match faces, enabling you to look for the same face in different videos, thus reducing the need to have the depicted person mentioned in the metadata exclusively. However, this clearly isn't much help if the person you're looking for is mentioned but not depicted, in which case indexed audio would help, or if the person is not well depicted, for example the person is only depicted from the side or the back. However, quibbles aside, if this works, then this is a pretty useful function in itself.

Here are some of the other bloggers who have be writing their thoughts on Video Surf. For example:

Clearly, we're on the right track and there is a lot of interest in the opportunities and technologies around video search. However I think that there is a long way to go before detailed and automatic object recognition is of any meaningful use to people. As far as I can see, it's still not there with still or moving digital images. Metadata for me is still the 'king' of visual search. There however are a growing number of needs that automatic solutions can already resolve and a growing case for solutions that work by offering a combination of automatic computer recognition of image elements, metadata schemes and controlled vocabulary search and browse support.

I'd love to know what people think, about VideoSurf and other services that provide video search.

How the Semantic Web Will Change Information Management: Three Predictions from fumsi

fumsi is a digital and print publication that provides resources and tools for people who "find, use, manage & share information" . They are part of the FreePint family of resources for professionals in the Information Management field. If you watch or subscribe to the Synaptica Central RSS feeds (right menu) you probably saw the recent pointer to the rich write-ups by James Kelway also published on fumsi on Creating User Centred Taxonomies. Jame's personal Blog User Pathways is also another must read blog if you want to learn about information management from a information architecture, interaction design, and user experience perspective which i believe is extremely important to do in today's user driven information experiences.

This Sunday morning's reading lead me to catching up on my multiple feeds and one that caught my attention was this article in fumsi by Silver Oliver who has a background in Library Science and is currently an Information Architect at the BBC titled How the Semantic Web Will Change Information Management: Three Predictions

Prediction number 1: a move from the pull to the push search paradigm, or more ‘context-aware’ applications

Today's information consumption, still starts mostly with information seeking and retrieval- processes that in today's fast moving, overloaded information companies and cost saving conscience enterprises are simply not sustainable in order to be competitive. If you happened to be a defrag this year and listened to my presentation on Pulling the Threads on User Data you heard me speaking about the need for context aware applications and standards to make data portable- ultimately leading to one of Silver's first predications that "The Semantic Web could assist in this area, by publishing data in a way that smart applications can take advantage of and so improve smart context aware recommendations. The right thing, at the right place and at the right time".

Prediction number 2: the battle of the identifiers or the age of pointing at things

Recently here on Synaptica Central, Christine Connors- Director of Semantic Technologies at Dow Jones, published a post that touched on this subject titled "Taxonomies are a Commodity " in which she ended her post with the following:
"I actually like the fact that taxonomies have become commoditized. Why? Competition drives improvement - in quality, in focus, in security and in usability. These are areas that the semantic web community needs to focus on - in my experience, security and usability need attention NOW. Good fences make good neighbors, and when we've got good fences, we can make more links and learn to trust. Icing on the cake!"

Prediction number 3: the changing role of the information professional

Silver ends this prediction with the following statement: "The skills of information professionals will be essential in populating and managing the Web of data and, to make this happen, we must make the shift from thinking repository-scale to thinking Web-scale."
Back in January 2008, i wrote a post over on my personal blog titled " Sexy Hot Trends for 2008 and Beyond- Librarians" where i highlighted some of the opportunities I saw for people with library science degrees (and no you don't need to be female and wear purple tights!- i just love that Super Librarian image!). So i obviously agree with Silver's prediction- the skill sets and experiences that information professionals can bring to the Semantic Web can be huge and I certainly hope that the Semantic Web community continues to cross-populate even more with the InfoPro communities- here at Dow Jones we are committed to doing our part to make sure that happens. Working with our InfoPro Alliance Group (headed by Anne Caputo the new SLA president ) we are looking to provide some Webinars in the new year to address Semantic Web issues that need to be addressed in Enterprise- by Information Professionals as well as other parts of the organization- so watch this space for more info as we finalize those sessions!

Image|Flickr|Leo Reynolds

Super Librarian Image above is from from the NJ State Library which includes the great Super Librarian Comic Book . You can also buy Super Librarian gear if you are so inclined.

Synaptica Announces SharePoint Integration

Our Synaptica product enhancement strategy is to continuously develop useful and innovative ways for our clients to use Synaptica for their taxonomy and metadata management needs. So it wasn't a surprise when some of our clients asked us to provide an 'out-of-the-box' integration point into SharePoint- we know first hand about the issues with managing taxonomies in Sharepoint from our own internal experiences as well as multiple client engagements over the last few years.

Microsoft SharePoint has over one-hundred million licenses in place and its adoption continues to grow globally. In 2007 an IDC survey of 300 companies found 61% were deploying SharePoint enterprise-wide, and that 28% of those using SharePoint in specific departments were expected to expand usage to the enterprise within the next 12 months- and a year later things don't seem to be slowing down.

With that kind of adoption and penetration across so many industries, it is impossible to ignore the impact that SharePoint is having as a portal for information and document sharing both internally and externally to the enterprise. As a result, Synaptica is proud to announce an integration with SharePoint that addresses some of the known pain points that users have when trying to successfully use taxonomies within SharePoint to tag, search and discover documents and other content.

In this short video overview we take you through the core elements of our Synaptica: SharePoint Integration :

View Video directly

With this Synaptica integration you can:

1) Import a complete vocabulary into SharePoint as a list: This feature provides for the import, and update, of a vocabulary (taxonomy, thesaurus, authority file, etc.) into SharePoint creating a new list which may then be applied as a column to be linked to content within a document library. As the vocabulary is updated within Synaptica, one may update the list stored in SharePoint to make sure that the most current information is being stored and applied as metadata to documents and content.

2) Provide Dynamic access to Synaptica allowing users to tag content : Employing Web Services this feature allows SharePoint to access a Synaptica system through the use of either a keyword search, or a navigable "tree browse" to allow users to find and locate specific terms and apply them as metadata. This dynamic access makes sure that SharePoint users are employing standardized terminology to tag content, where at the same time these vocabularies may be used across the enterprise and in other applications.

3) Provide Dynamic access to Synaptica for search and discovery: A SharePoint Web Part allows users to search or browse real-time through Synaptica vocabularies - using the same terms that have been applied to tag the content. This feature also can "direct" users to the proper terminology, as opposed to their having to guess at how a piece of content might have been tagged using an uncontrolled, free-text method.

With this initial iteration of the Synaptica: SharePoint integration, we hope to solve some of the biggest problems we hear about with users trying to better organize, tag and discover content within a SharePoint portal. We will be looking at expanding the integration over time and adding improved features as we learn more about how we can assist our customers and SharePoint users with these integral tasks.

For more information about this new integration and to see if you qualify for a free two week trial of Synaptica with the SharePoint Integration please contact daniela.barbosa@dowjones.com or use this Contact us form to submit your details.

10 Rules of Successful ECM Implementation

Last week I attended AIIM’s ECM seminar on Automating Document-centric Processes – Is SharePoint Enough?href> It was a really interesting and informative event, with a few general sessions, several presentations of case studies, and product demonstrations from various vendors in the ECM realm.

AIIM President John Mancinihref> closed the seminar with his 10 rules for successful ECM implementation:

  1. Build a strategy.
    When implementing an ECM solution, winging it is a bad idea. Especially if you are implementing a solution as viral as SharePoint, you should have a well-defined strategy. You should define business requirements, think about governance, analyze content systems, and identify points of integration. Formulating a strategy will save money and increase the likelihood of a successful project.
  2. Not all content is alike.
    You should think about the nature of the content you are trying to manage. Is it office-based content, transactional content, or persuasive/creative content? You need to pick a solution that matches your content.
  3. Prepare for eDiscovery.
    Sector-based regulations aren’t just a flash in the pan. Just because your business hasn’t had to deal with eDiscovery yet doesn’t mean you won’t have to in the future.
  4. Good enough is better than nothing.
    Doing something to get your content under control is better than doing nothing at all. You don’t have to start with the perfect solution.
  5. Ripping out and replacing is not usually a good starting point.
    This is especially true for more mature ECM organizations. If you have multiple repositories, you have to deal with them and think about policy structure around the information. Think about how you can provide access to information in those various repositories. Look for a vendor who will help with the integration challenge.
  6. Acknowledge the reality that this is a hybrid world.
    Paper is still part of the equation. Although we would like for everything to be digital, that is not the reality. Don’t get hung up on wanting everything to be digital—sometimes digitizing information can be too resource-intensive and unnecessary. Evaluate your strategy.
  7. Be militant about ROI and deployment times when thinking about projects.
  8. Consider alternate delivery models in your ECM approach.
    There will possibly be fewer IT people in the near future because of the economy. Consider hosted solutions as away to lower risk for management.
  9. Spend some time on standardizing the front-end of your processes.
    Consider things such as are you figuring out how to digitize things that should have been digital to begin with?
  10. Once you have something digital keep it that way.
    Why have a digital process all the way until you have to sign a document? Rather than moving from digital to analog and back to digital, consider processes that will keep content digital.

I found this list to be very relevant to some of the work I've been doing lately. Often I talk to clients who are implementing an ECM solution, but they haven't formulated a clear strategy yet. Organizations usually have content stored in several repositories, and employees don't know how to access that information, assuming they even know it exists. That's why we suggest an assessment prior to implementing a new solution. An assessmenthref> can be conducted internally if the resources are available, or our Taxonomy Services teamhref> can perform one for you. An assessment will help you identify your various content repositories and develop a strategy to access that siloed information.

Synaptica Has Got Its Head in the Clouds

The way companies are using software has been shifting- and if your head hasn't been in the clouds over the last few years i am sure you have noticed the shift to SaaS (Software as a Service) offerings and more services moving to the "cloud'. From The Economist's recent 14-Page Special Report on Corporate IT titled 'Let it Rise' focused on cloud computing, Microsoft's recent Azure announcement indicating an even bigger investment to moving services to the 'cloud', the recent discussions around Tim O'Reilly's post Web 2.0 and Cloud Computing , and of course discussions about the economics of cloud computing in today's world it is evident that these models- which are not really 'new'- are here to stay.

It is a little known fact- one that i am trying hard to ensure the marketplace knows, but Synaptica is available as a hosted application with complete access to most of all the features that are available (this includes access to robust Web Services). And just like the recent buzz in the marketplace, having access to Synaptica as a 'service' is something that recently we have been getting more and more requests about.

Who has interest in a Taxonomy and Metadata Management tool as a hosted model? Well it is not for everyone who has a need for a tool like ours, but for those who are interested it really varies. For example:

  • Small to Medium, Corporate libraries or Product Manager/Marketing groups who are managing various taxonomies and do not have a lot of IT resources for bringing a tool in-house but can really benefit from a centralized taxonomy management tool that can be accessed via the internet securely by their global colleagues that work on the vocabularies collaboratively
  • Companies that have an urgent need for a tool but don't have the resources to bring it in-house quickly at that specific point and chose a hosted model as a first phase to get their taxonomy development and deployment done
  • Companies that perhaps have an technology architecture that is based on the LAMP Stack that Synaptica at this point can not fit nicely into
  • Start-ups who are building a consumer service that requires a tool to manage their controlled vocabularies (e.g. product categories, navigation taxonomy etc.) but who do not have the IT infrastructure to host an application like Synaptica (e.g. most of their stuff is already in the 'cloud')

So with our hosted model, we can provide at whatever tier a company is at- an affordable and secure way to manage an important part of their business.

And the best part? Well coming in at the low-end, with access to a Synaptica hosted annual license (with full access to all editorial and administrative features including Web Services), you can basically choose to either use one of the premier taxonomy management tools in the marketplace or if you are so inclined- you can instead choose to spruce up your office by buying a Hyacinth Macaw Parrot, or perhaps you can buy one of your employees a nice baby shower gift like this blinged out Baby Pram or even update your office outside picnic patio area with the Kalamazoo Bread Breaker Two Dual-Fuel grill - yes, it really is your choice.

A Project Taxonomy Can Avoid Hours of Frustration

Here at the Synaptica Central Blog most of our posts are focused on developing and managing complex taxonomies which is what our taxonomy consultants are usually doing at client sites during the week...of course unless they are busy blogging here ;-).

There are certainly different levels of complexity depending on the Client, but the business needs are typically robust enough that at one point the customer also looks for a tool to manage those vocabularies and Synaptica fits the bill. Typically this is because of the need to maintain relationship between terms in a thesaurus (like BT (Broader Term), Narrow Term (NT)) that are hard to manage in a spreadsheet or relationships between different vocabularies which many of the thesauri management tools in the marketplace do not allow. They may also have a need to integrate these vocabularies into other systems like search engines, CMS/DMS, DAMs etc. beyond sending excel sheets around their company which can be quite painful.

We have also however seen some pretty cool uses of the tool like Jim's recent post about Thinking Outside of the Synaptica Box about our own in-house usage. Our clients see the power of the tool and adopt it for their own needs- many times bring users into the fold that never thought that they would be creating and maintaining a "taxonomy"!

This post on Project Management from the Developer's Perspective : Project Taxonomy by Stacey Mulcahy on the O'Reilly InsideRIA blog reminds me of some of the unique ways that customers are using the Synaptica tool for.

In her post, Stacey does an awesome job of explaining how "Adopting a project taxonomy is one of the simplest pro-active ways to avoid hours of frustration caused by miscommunication. Once team members, regardless of discipline and role, utilize a shared vocabulary, interactions become more meaningful and ultimately more productive as more time is spent in communicating the message and less time clarifying its context."

Things like Synaptica's "MyWeb Views" allow Admins to quick created 'Read Only' Views for the whole organization to be on the same page- for example with a link to images likes Stacey suggests in her post so everyone gets on the same page as to what a specific term means- for the organization as a whole- or possibly only for that specific project that the team is working on.

It is a must read post- and if you are thinking about the different ways controlled vocabularies are being used in your enterprise and already have Synaptica in house and just want to get others in your organization to benefit from the tool- look at your Project Managers and let them know that you have a tool in house that can simplify the way they manage their taxonomies with their project teams to avoid hours of possible frustration.

Image|Flickr|RACINGMIX

E-commerce, Comercio Electrónico, Commerce en Ligne, Elektronischer Handel ...

According to a recent global survey conducted by The Nielsen Company about trends in online shopping, over 85 percent of the world’s online population has used the Internet to make a purchase.

Finding or not finding products and services on e-commerce sites is key to success regardless of what language an online shop operates in. The conversion rate of a search; i.e. the rate of how many products will actually be bought through searches, is one of the central measures of how successful an e-commerce site is.

The end-user expects an interface that is intuitive and easy to use as well as a navigation and search that directs him or her to relevant products and services. How the user's search terms are actually associated with the "right" search results is of no interest to the online shopper, but is a complex issue that all e-commerce sites and online shops have to deal with.

Having worked with many e-commerce customers in Europe, I have come across a lot of the complexities that optimizing the search capabilities of a site can bring and that an end-user will literally only see the tip of the ice-berg of.

From content, controlled vocabularies, search metrics and process questions that need to be addressed, having the right tools to optimize a search is probably the simplest but no less important question.

Often, search engines focus on what they are made for: Searching. Managing vocabularies for search improvement is usually not one of the areas that vendors specialize in or focus on. The most relevant features we encounter that are often not covered by search engines are:

  • Central management of vocabularies (products, services, colours, materials, and other filters) to ensure that there is one version in place from which extensions can be built if needed
  • Allow for different users to contribute to a controlled vocabulary through different levels of access rights, so for example working directly with content editors to share input
  • The possibility to add comments to terms (why has x been introduced as a synonym to y)
  • Being able to monitor the progress and changes that have been made
  • Being able to retrieve historical information
  • Creating Audience Centric Views
  • just to name but a few!

Next to many other aspects, being able to manage controlled vocabularies in an efficient and effective way is one of the prerequisites to optimize the search capabilities of an e-commerce site. Not only will it help drive online sales, because users will find the most relevant products and services, but it will also contribute to a positive shopping experience so that new shoppers will return.

Image|Flickr|isriya

An Overview of Semantic Technologies at Dow Jones

An overview of how the Dow Jones Enterprise Media Group uses semantic technologies / solutions for our own organization and for customers. This brief presentation was given at MIT to the Cambridge Semantic Web meetup on October 14, 2008.

Classifying Images Part 2: Basic Attributes

Last month i asked the question "What is the Hardest Content to Classify?" and promised additional posts on the subject based on my background of 13 years developing taxonomy and indexing solutions for still images libraries, so I am continuing my thoughts in this post focusing on the basic attributes of image classification.

In my opinion, images are the hardest content items to classify, but luckily for sanities sake not all image classification is equally demanding.

The easiest elements of image classification relate to what I'm going to call image attributes metadata. This area, for me, covers all the metadata about the image files themselves, rather than information describing what is depicted in images and what images are about.

Metadata aspects in this area cover many things and there are also layers to consider:

1, The original object
-- This could a statue, an oil painting, a glass plate negative, a digital original, or a photographic print

2, The second generation images
-- The archive image taken of the original object, plus any further images, cut-down image files, screen sizes, thumbnails, images in different formats, Jpeg, Tiff etc

The first thing to think about is the need to create a fully and useful metadata scheme, capturing everything you need to know to support what you need to do. This may be to support archiving and/or search and retrieval.

Then look at what data you may already have or can obtain. Analyse data for accuracy and completeness and use whatever you can. Look to the new generation of digital cameras to obtain metadata from them. Ask image creators to create basic attribute data at the time of creation.

You'll be interested in the following metadata types:

- Scanner types
- Image processing activities
- Creator names
- Creator dates
- Last modified names
- Last modified dates
- Image sizes and formats
- Creator roles - photographers, artists, sculptures
- Locations of original objects
- Locations at which second generation images were created
- Unique image id numbers and batch numbers
- Secondary image codes that may come from various legacy systems
- Techniques used in the images - grain, blur etc
- Whether the images are part of a series and where they fit in that series
- The type of image - photographic print, glass plate negative, colour images, black and white images

This data really gives you a lot of background on the original and on the various second generation images created during production. Much of this data can either be obtained freely or cheaply, lots of it will be quick and easy to grab and enter into your systems. It should also be objective and easy to check.

My next post will cover dealing with depicted content in images. Please feel free to leave comments or questions on the subject.

Image|Flickr|Daniel Y. Go

Synaptica Central : Dow Jones Video Library


Video might have killed the Radio Star but in today's video streaming world it certainly is helping distribute knowledge and that is why we are publishing a video page to augment our blog postings.

Very often i talk to clients and they are in need of information to learn about key concepts or even just to share a third party view with their colleagues about specific topics around controlled vocabularies that I know someone on the team has presented or written about. It could be for example providing a white paper about Audience Centric Views, a video overview of Taxonomy Management Tools and how to use these tools to collaborate around developing controlled vocabularies or a real life case study of an existing client using Synaptica. In the past, I have kept these references in a .txt file on my desktop that I reference when I need to, but since this blog is being used as a resource for both us internally here at Dow Jones as well as the community, i figured it would be a good time to start a Video Library of our Dow Jones public resources.

So without any further ado- our Dow Jones Video Library has been published.

This is just the start of turning Synaptica Central into a must go to resource for our community, so please watch this space for additional resource pages from recommended white papers, industry standards references, must see videos, must listen to podcasts and must read books!

Have suggestions of things we should make sure we add to our resource pages? Please leave them in the comments or drop me a note at daniela.barbosa@dowjones.com

Image|Flickr|traed mawr