Problem with open ended survey questions…To answer or not to answer? Who will read it? Will it make a difference? If I ask an open question, how will I read and respond to them all?  The Analyst Toolbox can provide the solution you need for your next VOC survey and with it, provide the taxonomy for customer feedback across all your collection points.

We have all taken surveys with multiple choice, yes or no, good or bad and the ever popular “open-ended” questions.  Questions which ask, “What else do you want to tell us?” “How can we do better?” “Tell us why you couldn’t do what you wanted to do on our website?”

Sometimes we take the time to answer.  Sometimes we don’t.

Aside from the time it takes us to answer, maybe we don’t answer these open-ended questions because we think “Is anyone really going to read my comments?” “Will the company act on them or do they just want credit for asking?”

For those seeking customer feedback, open-ended questions offer a real opportunity. In part, because most companies fail to analyze, act and respond to what customers are telling them in these questions.

Surveys limited to “close-ended” questions (e.g. “select all that apply” and “do you agree with the following statements”), while providing useful information, miss the opportunity to uncover unknown and unanticipated problems.  Giving customers the opportunity to also relate in their own words what is good or bad can lead to real insights.

But uncovering the nuggets buried in the hundreds if not thousands of open-ended responses, each of each can vary between a few to hundreds of words, has been a challenge.

Until now.

Unstructured Data

Open-ended responses are usually delivered by survey vendors in “raw” unstructured form, either in documents or spreadsheets, looking something like this:

To make sense of these responses, the analyst or team of analysts must read and then classify them into relevant buckets or “themes.”  Then they summarize these themes in charts and typically report out in Excel or PowerPoint.

Maybe they have tied the summarized open-ended responses back to the rest of the survey.  Or maybe not.

In any event, this is a time consuming, inefficient and inconsistent process.  Made even more onerous by an ongoing customer satisfaction/feedback survey, marked by hundreds if not thousands of responses every month.

Fortunately, advances in text analytics are allowing for quicker, more efficient and more accurate processing of text, including open-ended survey responses.  By quantifying this unstructured data, these advances have also facilitated the discovery of actionable insights in BI tools and have sped the delivery of the results to the operations people that can act on them.

State of Text Analytics

While many companies still use manual coding and some crowd sourced services, machine processing of unstructured text data started with simple keyword/Boolean search. You still see published “insight” which simply aggregates word counts or uses a “bag of words” technique to approximate themes. This remains by far the most common methodology in use today, and is a good start, certainly more scalable than an army of human coders.

Boolean operators are very precise in that they apply a set of math/logic operations on words.  If a keyword(s) is in a bit of text, then it returns a “hit”.  As you work with the expression, you can eliminate false positive results with exclusions (i.e. if it has “delivered” then “yes”; but “no” if it also has “on time” assuming you are looking for delivery problems).

For the analyst, this is a time consuming and iterative process as the operators need to be finely tuned over time to exclude noise.  One analyst told us that to set up 60+ Boolean operators on a news feed took 2-3 months to tune (example below).

Boolean operators also don’t include misspelled words so exact hits can be missed.  It is also subject to the analyst’s bias, in that the Boolean string is created based on what the analyst thinks is in the open-ended responses.  If the word isn’t in the string, then that “hit” will be missed completely.  Since it is a 0 or 1 result, there is no “almost” capability.

A newer and common automated approach today uses NLP (natural language processing). But it has many of the same issues as above in that it attempts to reverse engineer language. If only people wrote like engineers!

With large data sets for training it has become quite good in many applications.  However, it has the same problems as Boolean with text quality.  So, it works best on well written or professionally written documents.

But extend it to fragmented text, run on paragraphs and the large variety of writing styles/quality found in open-ended responses, it will likely under perform.  Also, like Boolean, it doesn’t tell you if there is a concept like what you wanted but just missed the cut, even though it does have more flexibility.

Machine learning/artificial intelligence (AI) algorithms have taken text analytics to a new level.

Algorithms like ours, process each paragraph and line of text much the way our brains do, learning the patterns of language, the “key” words, their importance and the words most closely associated with them. The AI provides commands to extract as an array those key words and associations, their direction and values (strengths).

This is not a totally hands-off approach however.  Operationally, the analyst trains a set of “agents” based on the themes he/she wishes to uncover.  For example, an online seller will focus on certain themes critical to customer satisfaction such as performance of the online application, product delivery, customer service and the like.

Once trained or “fingerprinted”, these agents “read” and score the thousands of open-ended responses for how similar they are to the theme on which the agent was trained.  Training is based on much more than keywords.  Examples of text (phrases, sentences, even whole paragraphs) reflective of the targeted theme are essential to a strong performing agent.  The greater the similarity between an open-ended response and the training theme, the higher the response’s score.

An additional advantage of this approach is that the analyst can use the score, since it is continuous, to determine how closely the open-ended responses are matching the targeted theme.  This allows the analyst to finely adjust the “cut-off” score used to classify a given open-ended response as a match to the targeted theme.

Additionally, once trained, agents are available for use on multiple surveys (e.g. tracking surveys) or similar feedback in emails or customer reviews.  The agents can also be grouped based on functional categories such as product performance, e-commerce website, customer service, shipping, returns, etc.

So, if you think of the groupings of agents as clusters of concepts (fingerprints) in your brain, you can apply them in different combinations depending on the task/context in front of you.  In other words, you can create your own context and not be forced to adapt to a rigid taxonomic structure.

Ok, but now what?

Each open-ended response has been scored by 10, 20 or 100s of agents.  How do we make sense of and use this to extract actionable insights?

From Themes to Insights

What we have done is structure and score a ton of unstructured data.  Since the open-ended responses are tied to a survey respondent id, the agent scores can be tied in with the rest of the survey data and accessed with powerful reporting tools.

So, rather than building reports in PowerPoint, let’s load it into our favorite BI tool and develop some visuals.

The primary advantage of presenting the “themed” open-ended response data in a dashboard is that we can now visually see and track what customers are saying in their own words.  Which themes are the most prevalent?  What is the sentiment of the open-ended response associated with each theme?  Are they all negative? How many are positive? Neutral? If we are running continuous customer satisfaction/feedback surveys, how do these themes track over time?

Filtering on specific themes allows us to see and read the specific open-ended responses that have been classified as belonging to that theme (as shown above). Built correctly, the system should give the analyst full drill down capability from high-level charts to the text of a relevant open-ended response.

Moreover, since we have now linked the themed open-ended responses back to the rest of the survey, we can use the rest of the survey questions as filters.  For example, what themes are expressed by “net promoters”? By “net detractors”? Business users vs. non-business users?  Do the themes expressed vary geographically (e.g. shipping/delivery issues)?

Accelerating the Call to Action

Segmenting the open-ended responses in this manner now enables the relevant responses to be sent to the appropriate departments for follow-up.  All responses having to do with website performance can be sent to the IT department, for example.  Responses having to do with shipping can be peeled off and sent to logistics.  Filtering by state or region allows for even finer segmentation and issue follow-up/resolution so reporting can be mapped to your organization.

While some of this could have been done without AI technology, the power is in the integration of multiple capabilities in an agile responsive solution.  Now this is accomplished in a manner of seconds rather than days or weeks.  Open-ended responses are fully integrated with the rest of the survey.  And the process can be repeated quickly, and efficiently as new survey responses are received.

The 50,000-foot view

We have progressed from a situation where an analyst(s) spends countless hours reading and classifying thousands of survey open-ended responses, pounding Excel and PowerPoint each month to create and distribute reports.

To a new paradigm where the machine does the classification, freeing the analyst to focus on the insight and the corrective action that benefits the organization. Survey creators can now ask customers what they really think without concern for the processing, coding and reporting challenges.

So next time you are hesitating to answer an open-ended survey question, tell them what you really think!

Give the machines a chance to read it.

If it’s up to us, it will be “read.”

So a human who cares about your issue will now have time and better insight on how to improve your customer experience.

Kevin Duffy-Deno

Tom Marsh