Can You Grab the Long Tail of Search with Text Analytics?

What Is the Long Tail?

The long tail of search is a concept typically used in retail and search engine marketing represented by a graph detailing the distribution of search queries. In a long tail of search graph, high-volume searches tend to be single-word terms. These terms constitute the highest number of the same search queries but a small percentage of the overall search volume. On this top end of the graph, search queries have the lowest amount of traffic and conversion of searches to actual purchases. On the other end of the graph are more detailed, multi-word phrases which typically make up about 70% of all search volume. While each search query has fewer numbers of exactly the same query construction, they tend to have a much higher rate of conversion to purchases and make up the bulk of overall purchase volume.


Why is the long tail of search important? For marketers, targeting the individual terms and multi-word phrases allows them to use the appropriate search keywords to drive consumers to their sites. If a consumer lands on the right page, they are more likely to make a purchase. For example, if a user is looking for a tent, the search keyword “tent” is, at best, going to take the consumer to a top-level landing page about tents. However, a more specific query should bring a user to a more specific product page, often leading to a purchase.

Search Complexity

The interesting thing about the long tail of search is its inherent complexity. While the search queries increase in the number of keywords, they become more specific and unique but also more complex. Using the same example of searching for a tent, typing in “tent” into a search engine is going to return an overwhelming amount of results not matching my specific expectations. However, if I type in “4-person 3 season tent”, I’m going to get a much narrower set of search results made up of more specific product landing pages. Not only is this search query more specific, it is more complex and more prone to variance. For example, this search query could also be: “4p 3 season tent”, “Four person three season tent”, “4p three season tent”, etc.

That’s just one search query. Now consider all of the other ways this could be expressed which may be close enough to direct a consumer to the same location even if the search query is not exactly the same. If I type in “tent for more than three people and two seasons”, I’ve changed the exact wording, but I have asked for the same, or nearly the same, search parameters. Why is this important? As online retailers have known for quite some time, there is money in quantity, even if that quantity is made up of millions of different unique requests rather than a large number of the same vague requests.

Text Analytics and the Long Tail

The long tail of search can also apply to internal organizational search. In a corporation, the content available, unlike the Internet, is finite and does not have the benefit of algorithms such as page ranking as you would find an an Internet search engine. This is just one inherent difference making internal organizational search a challenge.


To address this challenge, I used text mining and text analytics software in a project I worked on to analyze the search queries on an organization’s Intranet to determine what users were searching for across all searches. Analyzing the long tail of search within an organization can help to determine what employees are actually looking for and making sure their queries match the content available. Much like the Intranet search long tail, graphing the frequency of search terms used inside the organization resulted in a very few, often-repeated searches. It also revealed a much greater number of more complex and varied searches often looking for the same information.


By using text analytics on all search queries and clustering the concepts appearing across all of the individual keywords, a much different picture of searcher needs appeared. When clustered, the topics spread across all search queries suddenly showed a very distinct array of topics important to users, many of which weren’t in the top search queries. While a list of the top searches is interesting, it rarely captures the broad spectrum of search queries present across the long tail.


The results of this analysis leads to some specific actions. First, cross-check the analysis results with existing tagging terminology in a corporate taxonomy to make sure the concepts exist. Second, make sure content is tagged with the most important concepts. Finally, selectively pick important content matching users needs and boost them in the search results so the most often queried subjects provide relevant and important content.


Although the needs of online retailers and internal employees differ, the end result of getting the user directly to the information they want to have are the same. Treating the search long tail as important in your organization and performing text analytics methods on search queries can lead to actionable results.