Topic modeling is an unsupervised machine learning technique that's capable of scanning a set of documents, detecting word and phrase patterns within them, and automatically clustering word groups and similar expressions that best characterize a set of documents.
Labelwise is a data annotation tool that provides templatized workflows to streamline the data labelling process and create quality datasets. In this blog, we will learn how to build a topic modelling dataset using Labelwise. Taking the case of customer queries, where the queries can be classified into following categories.
Department: transactional_account, customer_support, card, fraud, loan
Inquiry Category: question, feedback
Topic modeling can be used to identify the topics of these customer queries by detecting patterns and recurring words. We would be creating a collaboration job in Labewlsie where collaborators will label customer queries in above categories, and create the dataset which can be utilized to build Machine Learning models for topic modeling.
Building a topic modelling dataset with Labelwise
1. Project Creation
Click the ‘Add Project’ button on the homepage and choose the Text Classification template from the given template options, and give the name and description of the project.
2. Job Creation
a. Job Overview
Click the ‘Create Data Labelling Job’ button and give the job name, description and instructions for the job and select the duration of the job.
b. Attach Assets
Drag and drop the csv asset file to be labelled and submit the file to create the asset for the job. Click the ‘Attach’ button to attach this asset to the job.
c. Configure Labels
Click the ‘Label Editor’ button to add classification of ‘Department’ and ‘Inquiry Category’
d. Add the collaborators (labelers) to the job with the ‘Add Collaborator’ button. Give the name and email of the collaborator and click save.
e. In the last step, ‘Add Reviewer’ to the job who will review the data labelled by the collaborator. This step is optional. Click the ‘Submit’ button to create the Data Labelling job.
3. Data Labelling
Collaborators can login to the collaborator app to start the labelling post job creation.
As the collaborators label the data, the dataset gets created, which can be viewed in the dataset tab.
5. Dataset Download
To download the dataset, click on the ‘Download Dataset’ button on the top right corner. Select the download format, review selection and click the ‘Download’ button to download the labelled dataset file.
The dataset can be downloaded in either json or csv format. This labelled dataset file can be used to create Machine Learning Models for Topic Modeling.
Labelwise gives you full control on your data, with all the security considerations and standards. The simple guided workflow helps in creating quality labelled datasets faster. Labelwise also has highly scalable automatically managed infrastructure which allows seamless collaboration among team members and stakeholders making it an ideal data annotation tool.