The Evolution of Speech Analytics

1 Nov, 2007

By: Jeff Gallino

With each succeeding generation, the technology has evolved into a more effective means of collecting actionable intelligence from calls – information that enterprises can use to effect business improvements in critical areas such as customer satisfaction, agent quality, sales performance and marketing effectiveness.

Despite the rapid emergence of a powerful new generation of speech analytics technology, there are still solutions on the market today that employ earlier, less-effective methods for analyzing call content.

Or perhaps this is because of the rapid change. Enhancements and improvements have emerged so quickly, some solutions have simply failed to keep up. For this reason, any enterprise considering an investment in speech analytics needs to have a basic understanding of the evolution of the technology and the methods driving the solutions – specifically in the context of what no longer works and what has superseded those early and ineffective efforts.

The bottom line is that every call that is not fully analyzed, given the level of sophistication of technology available today, represents a completely avoidable information drain. The enterprise will inevitably bear the cost of this drain, in the form of lost opportunities for business improvement. Preventing such an outcome requires implementing the most advanced speech analytics solution available – and identifying that means knowing what characterizes both more- and less-advanced options.

The Evolution of Speech Analytics at a Glance

The accompanying diagram traces the evolution of speech analytics from the early manual monitoring methods that preceded the first speech analytics tools to the newest generation of the technology.

The evolution has been characterized by increasing levels of process automation, which have enabled a growing ability to dig deeper into more call content and extract more – and more meaningful – information.

Before Speech Analytics: Manual Monitoring

Speech analytics was originally developed as an automated alternative to manually monitoring calls in the contact center – a process with several obvious shortcomings.

Manual monitoring is entirely random. If 10 percent of callers to an airline, for example, are mentioning a competitive ticket cost, but a contact center manager randomly monitors calls that don’t happen to be among them, an opportunity to quickly zero-in on a competitor’s price and match it or beat it could be lost.

Manual monitoring is subjective. What an enterprise learns from the calls it monitors depends on who’s doing the monitoring and what they deem important to note – or not.

Manual monitoring is a poor use of management resources. Assigning contact center staff to listen to calls takes them away from more productive work.

Speech analytics emerged as a response to these problems. The early solutions certainly represented an improvement over manual monitoring – but not much of one, based on current standards and technology available now.

The First Generation: Word Spotting

While unquestionably an improvement over manual monitoring in that it speeds and automates the monitoring process, word spotting has significant limitations as a tool for collecting intelligence from contact center calls. As indicated in Figure 1, word spotting is exactly what the term suggests: a search for specific words or phrases, in a sampling of calls, to identify relevant content.

There are three major problems with word spotting as a means of gathering intelligence from contact center calls.

You only find what you’re looking for. Because word spotting is search-based, it is inherently biased, delivering results based only on the word you were looking for. For example, you may be looking for content that will help you find out more about a problem with agent performance – when you are, in fact, having much bigger problems with a billing error that’s generating calls from angry customers.

Just because you find something doesn’t mean it was what you were looking for. Another pitfall of word spotting is that you are looking for words without being able to define the context for them. For example, suppose an airline searches for the word “reservation.” There would be no way of knowing whether calls with this word in them are to make reservations or cancel reservations or even just check to see if a flight delay is going to make a caller miss a restaurant reservation!

As with manual monitoring, the process is random and the results, therefore, incomplete. Processing limitations make it impractical to conduct word spotting on more than a small sampling of calls. So even if you could tell what calls were about based on a word or phrase, you would still have no way of knowing if your findings were representative of all the calls coming in.

The following is an example of a call based on word spotting that fails on several counts. A financial institution uses the search terms “interest rate” and “money market account” in an attempt to discern whether or not a promotion it has launched for a high rate on money market accounts is reaching customers. Here’s what happens.

AGENT: How may I help you?

CUSTOMER: Hi. Can I check my balance?

AGENT: Sure, what’s your account number?

CUSTOMER: I don’t know, but my phone number is 555-1212.

AGENT: Okay…, your balance is $750.25. Did you know that you can get a better interest rate with our money market account?

CUSTOMER: No… how much does that cost?

AGENT: Only an extra $5.95 a month.

CUSTOMER: Nah, I can’t afford that.

AGENT: It is free if you maintain a balance of over $1,000.

CUSTOMER: Clearly, that isn’t how much I have right now!

AGENT: Okay, is there anything else I can do for you today?

CUSTOMER: No, thanks.

AGENT: Have a great day. Goodbye.

Word spotting will find the references to “interest rate” and “money market account,” but it would be wrong to assume that the caller was asking about opening an account in response to the rate promotion. What’s worse, word spotting offers no way to detect the blunder the agent made by trying to a sell a customer on the free-with-minimum-balance aspect of the money market account. There is information in this call that could reveal a problem with agent performance. But, using word spotting, there is no way to extract that relevant information from the call. An opportunity to remediate the problem by providing additional agent training and doing some outreach to the dissatisfied customer is, therefore, irretrievably lost.

The Second Generation: Categorizing Content

Although some companies still use word spotting, others have progressed to the second generation of speech analytics shown in Figure 1. This involves categorizing the content from calls into specific relevant topics (sometimes also known as topic identification). The problem is that this technology still relies on searching for particular words and phrases, which means you only find what you are looking for. That may be helpful, but it still eliminates the possibility of uncovering unexpected information. Not only that, it generally limits you to searching for exact phrases and doesn’t allow for variations on a theme.

Here’s what I mean. Let’s return to the financial institution that was used as an example of a company employing a word-spotting solution, and look at the pitfalls of an approach driven by topic identification. In this example, instead of finding random words, they’re actually finding phrases that signify calls relating to, in this case, the topic of balance queries.

AGENT: How may I help you?

CUSTOMER: Hi. Can I check my balance?

AGENT: Sure, what is your account number?

CUSTOMER: I don’t know, but my phone number is 555-1212.

AGENT: Great! Okay… your balance is $750.25.

It seems like a success, doesn’t it? The search for the phrase “check my balance” picked up on the reference, signifying a balance query, just as intended. But what if the caller had said “balance check” instead of “check my balance”? The reference would have been missed. The problem is that there are a limited number of topics and phrases available to users of topic identification. Solutions based on this method typically cover only a few dozen categories to be defined, with a dictionary function that allows only a limited number of words and phrases.

Finally, topic identification is no better than word spotting or even manual monitoring in one important way, and that’s the use of only a small sampling of calls. Again, monitoring and processing all the calls that come into a contact center would cost too much and take too long to make it a viable approach. But by looking at only a relatively small number of random calls, the company is basically throwing away valuable intelligence from the rest of the content it has recorded.

The Next Generation: Discovering True Meaning

One simple and obvious fact underlies the inadequacy of speech analytics methods that are based on searching for or categorizing words and phrases: Conversations consist of more than words. As long as a solution ignores the context of the words – the stress and tempo that characterize the way they’re spoken, for example, or even the silence in between them – it can never accurately and completely convey the meaning of a call.

The basic realization that conversation is not just words has driven the next generation of speech analytics solutions – those that take into account all the dimensions of a recorded call. That includes all the words that are spoken, not just the ones you think you need to find, and the context in which they are spoken. This makes it possible to discover what people really mean, not just what they say. This process of discovery distinguishes next-generation speech analytics from previous generations in several important ways.

The solution analyzes every aspect of a call, including speech, acoustics such as stress and tempo, and metadata (such as relevant customer information from CRM or ERP systems).

The solution constructs patterns of meaning based on various indicators of call qualities, such as competitive language on the customer side and up-sell language on the agent’s side.

The solution aggregates patterns of meaning against specific areas of business improvement – such as customer satisfaction, agent quality, sales performance and marketing effectiveness – which can be charted over time.

Remember the call to the financial institution that I described earlier? In Figures 2 and 3, you’ll see examples of how a next-generation solution can characterize that call in words, acoustic indicators and conceptual tags – and what they mean or indicate.

One final point about the next generation of speech analytics that sets it apart from earlier solutions: It is capable of analyzing all calls, not just a select few. Today’s distributed computing technology makes this possible by shifting processing loads based on demand, to overcome the limits imposed by having a limited number of servers or a limited amount of processing power.

Increased Technology Sophistication = Increased Business Value

As speech analytics has evolved over time, it has become increasingly sophisticated technologically – and increasingly valuable as a tool for business improvement. The latest generation of speech analytics, such as CallMiner Eureka!, is the only one that makes it possible to discover exactly what’s going in all contact center conversations and to apply that information to making improvements in key areas ranging from customer satisfaction to marketing effectiveness. Companies that continue to use solutions based on word spotting or topic identification are missing an unprecedented opportunity to garner valuable information about customers’ wants, needs, problems and issues – in the customers’ own words.

About the Author

Jeff Gallino