Introduction to Sentiment Analysis
Sentiment Analysis is the process of determining whether a piece of writing is positive, negative, or neutral. Sentiment analysis is widely applied to the voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine.
What is sentiment analysis used for?
Here are a few areas:
Sentiment analysis for customer service
Sentiment analysis for market research and analysis
Check out here where sentiment analysis will be useful.
Let’s understand with an example:
As an instance, let’s imagine we have a product that we are planning to sell in the market. We have found the competitors and use the textual feedback of their users to understand the weaknesses and strengths of the competitors.
Let’s assume the feedbacks for the product are -
- This product is simply great
- This product is awesome.
- I do not recommend this product at all.
- Stay away from this product
- I absolutely love this product. Buy this product.
- This product is simply great. Buy this product
- This product is awesome. It’s simply great.
Imagine there are thousands of feedbacks collected from different sources (campaigns, twitter, e-commerce website, etc..) and we want to classify the feedback and evaluate the overall sentiment about this product.
Using sentiment analysis, we can use the text of the feedback to understand whether each of the feeds is neutral, positive, or negative. We can compute an algorithm that can give a score to each of the feedback. From the results, we can easily determine what the public is interested in and what they want to change.
Sentiment Analysis using Deep Learning
As with many other fields, advances in deep learning have brought sentiment analysis into the foreground of cutting-edge algorithms. I will introduce you to three different types of deep neural networks which can be used for sentiment analysis:
Basic Neural Network with embeddings
Before diving into neural network architecture, let’s understand some basic concepts of NLP data preprocessing and mechanisms of converting words into vector space.
Machine learning algorithms are not inherently smart, they became intelligent only upon being trained on clean data.
Always remember: “Your model will only ever be as good as your data.”
A clean dataset will allow a model to learn meaningful features and not overfit on irrelevant noise.
In real-world, text is very noisy, As a data science engineer, we have to check data quality and clean it up if necessary using different data cleansing technique:
It is one of the simplest and most effective forms of text preprocessing. Lowercasing all your text data. Ex:- Convert all characters to lowercase, in order to treat words such as “skyl”, “SKYL”, and “SkyL” as the same.
It is the process of reducing inflection in words to their root form. It’s useful for dealing with standardizing vocabulary. Standardizing vocabulary is a very crucial step while mapping words to their numerical representation.
It uses a crude heuristic process that chops off the ends of words in the hope of correctly transforming words into its root form. So the words “trouble”, "troubled" and "troubles" might actually be converted to troubl instead of trouble because the ends were just chopped off.
It’s very similar to stemming, where the goal is to remove inflections and map a word to its root form. The only difference is that lemmatization tries to do it the proper way. It doesn’t just chop things off, it actually transforms words to the actual root.
For example, the word “better” would map to “good”
Stop words are a set of commonly used words in a language. Examples of stop words in english are “a”, “the”, “is”, “are” and etc. The intuition behind using stop words is that, by removing low information words from the text, we can focus on the important words instead. Stop words data cleaning techniques are commonly applied in search systems, text classification applications, topic modeling, topic extraction, and others.
It is about removing characters, digits, and pieces of text that can interfere with your text analysis. Noise removal is one of the most essential text preprocessing steps.
It includes punctuation removal, special character removal, numbers removal, HTML formatting removal, domain-specific keyword removal (e.g. ‘RT’ for retweet).
A highly overlooked preprocessing step is text normalization. Text normalization is the process of transforming text into a canonical (standard) form.
on the way
For example, the word “gooood” and “gud” can be transformed to “good”, its canonical form. Another example is mapping of near-identical words such as “stopwords”, “stop-words” and “stop words” to just “stopwords”.
Text normalization is important for noisy texts such as social media comments, text messages, and comments to blog posts where abbreviations, misspellings and use of out-of-vocabulary words (oov) are prevalent.
Embedding Selection (pre-trained or learn to embed)
Now, we have cleaned the data, using different data cleansing techniques. But we cannot directly feed these human-readable texts to algorithms to find patterns in data.
Machine Learning models take numerical values as input.
Models working on images, for example, take in a matrix representing the intensity of each pixel in each color channel.
Our dataset is a list of sentences, so in order for our algorithm to extract patterns from the data, we first need to find a way to represent it in a way that our algorithm can understand, i.e. as a list of numbers.
To solve this problem word embeddings come to rescue, it’s a type of word representation that allows words with similar meaning to have a similar representation.
The text embedding converts text (words or sentences) into a numerical vector.
Using Word Embeddings
You have some options when it comes time to using word embeddings on your natural language processing project.
1. Learn an Embedding
You may choose to learn a word embedding for your problem.
This will require a large amount of text data to ensure that useful embeddings are learned, such as millions or billions of words.
2. Reuse an Embedding
It is common for researchers to make pre-trained word embeddings available for free, often under a permissive license so that anyone can use them.
For example, both word2vec and GloVe word embeddings are available for free download. These can be used on your project instead of training your own embeddings from scratch.
You have two main options when it comes to using pre-trained embeddings:
- Static, where the embedding is kept static and is used as a component of your model. This is a suitable approach if the embedding is a good fit for your problem and gives good results.
- Updated, where the pre-trained embedding is used to seed the model, but the embedding is updated jointly during the training of the model. This may be a good option if you are looking to get the most out of the model and embedding on your task.
Skyl.ai provides both the mechanisms to learn the embeddings from scratch or use the best pre-trained embeddings to train the model.
Basic Neural Network
The first deep learning model that we are going to develop is a simple deep neural network.
We will use the embedding layer as a primary layer followed by a (single and multiple) dense layer. Since we are directly connecting our embedding layer to a densely connected layer, we flatten the embedding layer. Finally, we add a dense layer with a softmax activation function.
To compile our model, we can use the Adam optimizer, categorical_cross_entropy function as our loss function and accuracy, precision, recall and F1 as evaluation metrics.
Text classification with Convolutional Neural Network (CNN)
Convolutional neural network is a type of network that is primarily used for 2D data classification, such as images. A convolutional network tries to find specific features in an image in the first layer. In the next layers, the initially detected features are joined together to form bigger features. In this way, the whole image is detected.
Convolutional neural networks have been found to work well with text data as well. Though text data is one-dimensional, we can use 1D convolutional neural networks to extract features from our data.
The first layer (embedding layer) embeds words into low-dimensional vectors. The next layer performs convolutions over the embedded word vectors using multiple filter sizes. For example, sliding over 3, 4 or 5 words at a time. Next, we max-pool the result of the convolutional layer into a long feature vector, add dropout regularization, and classify the result using a softmax layer.
Because this is an educational post I decided to simplify the model from the original paper to just give you the idea how CNN can be used for text classification tasks.
Text Classification with Recurrent Neural Network (bi-LSTM)
Recurrent neural network is a type of neural network that is proven to work well with sequence data. Since text is actually a sequence of words, a recurrent neural network is an automatic choice to solve text-related problems. In this section, we will use an bi-LSTM (Bi-directional Long Short Term Memory network) which is a variant of RNN, to solve sentiment classification problems. Learn more about LSTM here.
Text as a sequence is passed to a RNN. The embeddings matrix is passed to embedding_layer. The output of the embedding layer is passed to the LSTM layer. This model has n LSTM cells. Here we ignore the hidden states of all of the cells and only take the output from the last LSTM cell. The output is passed to a Dense layer then Dropout and then Final Dense layer is applied and classifies the result using softmax layer.
Image Source: researchgate.com
Build a Twitter Sentiment analysis model using Skyl.ai
Skyl.ai is a no-code platform to train and deploy AI and machine learning models. In the remaining section, we will be building a twitter sentiment analysis model.
Feature set selection
In no time, a feature set was created for model training using the Skyl.ai platform. Skyl.ai provides two options for defining and training test sets. A user can either split the dataset, or explicitly extract from the dataset. Skyl.ai also provides a summary of the feature set you are creating so you can analyze whether your feature set is properly balanced and ensure there are no biases or bad data.
Below is the summary of the train and test set on a tweets dataset.
We have a feature set ready, while configuring the training, we can select the feature set to be used for the training.
Skyl.ai provides relevant and optimized algorithms to train ML models depending on the ML template. Skyl.ai provides out of box algorithms, and just has to select from the drop-down. These are the available algorithms to train the sentiment analysis model.
I will be choosing Bidirectional LSTM architecture for our model, we can further customize the architecture and tune the hyperparameters based on algorithm selection like embeddings selection, number of LSTM cells, loss function, optimizer, dropout, activation function, etc.
Skyl.ai provides a lot of options to choose pre-trained embeddings ( Universal sentence encoder, Elmo, Bert, Glove, etc..) and easy to configure options to make them trainable or not while training our dataset by simply enabling the trainable 'toggle button’ to true which helps to update the embedding jointly during the training of the model.
Configure hyper tune parameter for training
Skyl.ai provides very easy to experiment with the model of different batch sizes, epochs, LSTM unit & learning rate.
We run the training with the following hyperparameters
- Batch size: auto
- Learning rate: 0.001
- Number of epochs: 25
- LSTM cell: 100
Loss Function and Optimizer
Categorical cross-entropy function as our loss function which is well suited to classification tasks. The model uses the categorical cross-entropy to learn to give a high probability to the correct sentiment and a low probability of the other sentiments.
Picking the right optimizer with the right parameters can help you squeeze the last bit of accuracy out of your neural network model. Skyl.ai provides an easy selection of optimizers and loss functions. We will be using Adam optimizer to minimize the loss and categorical cross-entropy as a loss function.
A real metric is needed to assess the model's performance, We will use precision, recall and accuracy metrics to evaluate the performance of the model.
After training has been completed, The platform generates reports on test sets based on performance metrics we have selected while configuring the training.
By observing these performance metrics, we can easily come up with the conclusion whether a model is trained well, or underfit/overfit.
If evaluation metrics are not satisfactory, the platform provides lots of options to get the best model out, Here are the few things which we can do:
- Transfer learning using base models such as NNLM, Glove, Universal sentence encoder etc.
- If the train set is large, set the trainable parameter to true.
- Using a different pre-trained embeddding (ELMO, BERT, etc.) for word representation if the model overfits.
- Reduce weights amount. Overfitting occurs mainly because the model is too complex. We tried to reduce the model's complexity by reducing LSTM dimensions
- Further model fine-tuning (epochs, layer dimensions, learning rate, etc.
Now we have fine-tuned the model, and performance metrics are satisfactory on the test set, skyl.ai platform provides one-click deployment of the model in production, deploy the model, go to the ‘Deployment’ tab and select the ‘Deploy’ button.
After deploying the Model, Inference API will be available which can be integrated into your application.
Sentiment analysis is one of the most common natural language processing tasks. In this article, we saw how to train models without coding using our ready-made, fine-tuned, state-of-the-art neural network architecture.