Back to Blog

How NLP Works? A Simple to Understand Guide

How NLP works - A simple to understand guide

What is Natural Language Processing (NLP), how does it work, and how can businesses use NLP? Natural Language Processing (also known as NLP) is a branch of artificial intelligence which aims at teaching computers to understand human language.

Natural language processing is a combination of linguistics (the study of language which tries to understand form, meaning and context in language) , computer science, artificial intelligence and information engineering.

Table of Contents

A brief history of Natural language processing 

The first attempts at NLP were translation attempts which began in the 1930s.

In 1950, Alan Turing published a paper on "Computing Machinery and Intelligence" where he proposed the Turing test as a test to distinguish a human from a machine. In the Turing test, a human judge has a text conversation with a set of respondents without knowing who is on the other end. If a computer can pass this test, it would be declared "intelligent". Turing had predicted that by the year 2000, a computer with 100 MB of RAM will be able to pass his test. Even though, today, computers have much more memory, few have succeeded in passing the Turing test. The ones that have passed the test, have found smarter ways than relying on memory.

Natural language processing AM Turing

When the first work began in NLP during the 1950s, researchers thought that in the future, they will have machines that could speak independently without human intervention. One of the earliest NLP computer program was SHRDLU which was developed in 1960. It could take human input in English and perform actions like "pick up a red cube", "place it on top of the green cube".

There have a been a lot of advancements in NLP and the truth is that even though machines have come a long way in terms of guiding space ships and performing complex surgeries, they still struggle to understand basic human language.

Why is natural language processing so important?

Humans communicate with each other majorly in terms of

  1. Voice.
  2. Text.

A lot of a products and applications are built around these communication modes. For example, talking over the phone and communicating through email. Imagine a situation where a computer can respond to your emails or phone calls just as you would do. Wouldn't it be great? It would save a lot of time and effort. If you are a Business handling thousands of calls and emails everyday, NLP systems can save a lot of money for your organization.

In the medical field, doctors could guide multiple machines to perform surgeries using NLP more accurately and effectively thus saving precious lives.

Related: Download the checklist that lays out everything you should consider  before implementing a machine learning project or workflow →

How NLP works?

NLP needs some training data for the computer to "learn from". For example, if you are building a NLP chat bot, an archive of the actual conversations between your chat agents and customers will be used as the training set for your chat bot. The chat bot will try to make meaning out of these conversations, try to find patterns and use it to answer future questions on its own.

Here are some techniques that power Natural language processing

1. Pre-processing

Not all languages are similar. Languages such as Chinese, Thai and Japanese are not delimited by spaces which is the case with English. Such languages require a text segmentation to be applied.

2. Sentence segmentation

Let's take the following sentence " Alan Mathison Turing was an English mathematician, computer scientist, logician, cryptanalyst, philosopher and theoretical biologist.Turing was highly influential in the development of theoretical computer science. "

This gives us two sentences-

  1. Alan Mathison Turing was an English mathematician, computer scientist, logician, cryptanalyst, philosopher and theoretical biologist.
  2. Turing was highly influential in the development of theoretical computer science.

Now each of these sentences provide a different meaning. In NLP, sentence boundary detection algorithms are used to find sentences even when the input text has no punctuation.

3. Tokenization

After breaking the input into sentences, the next step is to break the sentences into tokens. Tokenization is the process of breaking a sentence into seperate words or tokens. Here is the result of tokenization of our test sentence.

Tokenizer in NLP How NLP works- Tokenization

4. Parts of speech tagging (POS Tagging)

In this step, each token is analyzed and classified into different parts of speech (a noun, verb, adjective and so on).

POS tagging How NLP works? POS tagging

5. Lemmatization

Lemmatization is the process of finding the root word of a given word. For example, the lemma (or the root word) of play, playing, played is play. While a stemmer tries to remove letters from the word and find the root word, a lemmatizer will try to find the root keyword keeping the meaning of the word in context. For example, for the words "am", "are","is", the the lemmatizer would return "be".

lemmatization nlp How NLP works- lemmatization

6. Stop Words

Stop words are words such as "the", "for", "is" etc... which don't add any value to the meaning of a sentence. TF-IDF (term frequency inverse document frequency) algorithms can help in identifying stop words. Here is our sample text without stop words.

Stop words NLP-1Stop words in NLP

7. Dependency Parsing

It is the task of recognizing a  sentence and assigning a syntactic structure to it. It aims at figuring out the relation between words in a sentence. For example to answer the  question “Who is the president of the United states ?”, we need to figure out its subject, objects, attributes to help us  figure out what the user wants.

dependency parser NLP

8. Named Entity recognition

Named Entity Recognition, also known as entity extraction classifies  named entities that are present in a text into pre-defined categories  like “person”, “company”, “place”, “cities”,  “date”,  “time” etc.

named entity recognition-1Named entity recognition

9. Co-reference resolution

It is the task of finding all expressions that refer to the same entity in a text. Any English text contains a lot of words such as "he","she", "it"  etc. It would be helpful to the NLP system if we can know to what subject these pronouns are referring to.For example in the following sentence, you can see that "I" and "he" refers to Nader. This is especially helpful while answering questions like " who is the president of the United States?"

Coreference resolution

Visit Skyl.ai to learn more about different natural language processing and computer vision projects.

how-to-ensure-that-your-machine-learning-project-is-successful

    

Comments