Topic Modeling (1)

Topic modeling is an unsupervised machine learning technique that’s capable of scanning a set of documents, detecting word and phrase patterns within them, and automatically clustering word groups and similar expressions that best characterize a set of documents.

Examples of Topic Modeling

Topic modeling could be used to identify the topics of a set of customer reviews by detecting patterns and recurring words. Let’s take a look at how an ‘unsupervised’ technique would group the below review for Eventbrite, for example:

“The nice thing about Eventbrite is that it's free to use as long as you're not charging for the event. There is a fee if you are charging for the event –  2.5% plus a $0.99 transaction fee.”

By identifying words and expressions such as free to use, fee, charging, 2.5% plus 99 cents transaction fee, topic modeling can group this review with other reviews that talk about similar things (these may or may not be about pricing).

How Does Topic Modeling Work?

Topic modeling involves counting words and grouping similar word patterns to infer topics within unstructured data. Let’s say you’re a software company and you want to know what customers are saying about particular features of your product. Instead of spending hours going through heaps of feedback, in an attempt to deduce which texts are talking about your topics of interest, you could analyze them with a topic modeling algorithm.

By detecting patterns such as word frequency and distance between words, a topic model clusters feedback that is similar, and words and expressions that appear most often. With this information, you can quickly deduce what each set of texts are talking about. Remember, this approach is ‘unsupervised’ meaning that no training is required. 

Topic Modeling refers to the process of dividing a corpus of documents in two:

  1. A list of the topics covered by the documents in the corpus

  2. Several sets of documents from the corpus grouped by the topics they cover

Use Cases & Applications

From sales and marketing to customer support and product teams, topic modeling and topic classification can help eliminate manual and repetitive tasks, as well as speed up processes in a simple and cost-effective way. 

1. Customer Service

  • Automatically tagging customer support tickets according to topic, or recognizing patterns and delivering results in the form of frequently occurring words and expressions

  • Automatically triaging and routing conversations to the most appropriate team. For example, tickets tagged Billing Issues or Refunds, or containing expressions such as ‘credit card transaction’, ‘subscription error’, and so on, would be sent to the accounts department. Likewise, queries tagged with Bug Issues and Software, or containing expressions such as ‘strange glitch’ and ‘app isn’t working’ would be sent to the dev team

  • Getting insights from customer support conversations

2. Customer Feedback

  • Automatically analyze open-ended responses (unstructured data) such as NPS responses, customer surveys and product reviews, all valuable sources of information that can help shape your product or service, and encourage business growth

In the next post, we will go through various approaches for topic modelling to understand the state of the art of topic modelling.