What is Natural Language Processing (NLP), how does it work, and how can businesses use NLP? Natural Language Processing (also known as NLP) is a branch of artificial intelligence which aims at teaching computers to understand human language.
Natural language processing is a combination of linguistics (the study of language which tries to understand form, meaning and context in language) , computer science, artificial intelligence and information engineering.
Table of Contents
A brief history of Natural language processing
The first attempts at NLP were translation attempts which began in the 1930s.
In 1950, Alan Turing published a paper on "Computing Machinery and Intelligence" where he proposed the Turing test as a test to distinguish a human from a machine. In the Turing test, a human judge has a text conversation with a set of respondents without knowing who is on the other end. If a computer can pass this test, it would be declared "intelligent". Turing had predicted that by the year 2000, a computer with 100 MB of RAM will be able to pass his test. Even though, today, computers have much more memory, few have succeeded in passing the Turing test. The ones that have passed the test, have found smarter ways than relying on memory.
When the first work began in NLP during the 1950s, researchers thought that in the future, they will have machines that could speak independently without human intervention. One of the earliest NLP computer program was SHRDLU which was developed in 1960. It could take human input in English and perform actions like "pick up a red cube", "place it on top of the green cube".
There have a been a lot of advancements in NLP and the truth is that even though machines have come a long way in terms of guiding space ships and performing complex surgeries, they still struggle to understand basic human language.
Why is natural language processing so important?
Humans communicate with each other majorly in terms of
A lot of a products and applications are built around these communication modes. For example, talking over the phone and communicating through email. Imagine a situation where a computer can respond to your emails or phone calls just as you would do. Wouldn't it be great? It would save a lot of time and effort. If you are a Business handling thousands of calls and emails everyday, NLP systems can save a lot of money for your organization.
In the medical field, doctors could guide multiple machines to perform surgeries using NLP more accurately and effectively thus saving precious lives.
How NLP works?
NLP needs some training data for the computer to "learn from". For example, if you are building a NLP chat bot, an archive of the actual conversations between your chat agents and customers will be used as the training set for your chat bot. The chat bot will try to make meaning out of these conversations, try to find patterns and use it to answer future questions on its own.
Here are some techniques that power Natural language processing
Not all languages are similar. Languages such as Chinese, Thai and Japanese are not delimited by spaces which is the case with English. Such languages require a text segmentation to be applied.
2. Sentence segmentation
Let's take the following sentence " Alan Mathison Turing was an English mathematician, computer scientist, logician, cryptanalyst, philosopher and theoretical biologist.Turing was highly influential in the development of theoretical computer science. "
This gives us two sentences-
- Alan Mathison Turing was an English mathematician, computer scientist, logician, cryptanalyst, philosopher and theoretical biologist.
- Turing was highly influential in the development of theoretical computer science.
Now each of these sentences provide a different meaning. In NLP, sentence boundary detection algorithms are used to find sentences even when the input text has no punctuation.
After breaking the input into sentences, the next step is to break the sentences into tokens. Tokenization is the process of breaking a sentence into seperate words or tokens. Here is the result of tokenization of our test sentence.
How NLP works- Tokenization
4. Parts of speech tagging (POS Tagging)
In this step, each token is analyzed and classified into different parts of speech (a noun, verb, adjective and so on).
How NLP works? POS tagging
Lemmatization is the process of finding the root word of a given word. For example, the lemma (or the root word) of play, playing, played is play. While a stemmer tries to remove letters from the word and find the root word, a lemmatizer will try to find the root keyword keeping the meaning of the word in context. For example, for the words "am", "are","is", the the lemmatizer would return "be".
How NLP works- lemmatization
6. Stop Words
Stop words are words such as "the", "for", "is" etc... which don't add any value to the meaning of a sentence. TF-IDF (term frequency inverse document frequency) algorithms can help in identifying stop words. Here is our sample text without stop words.
Stop words in NLP
7. Dependency Parsing
It is the task of recognizing a sentence and assigning a syntactic structure to it. It aims at figuring out the relation between words in a sentence. For example to answer the question “Who is the president of the United states ?”, we need to figure out its subject, objects, attributes to help us figure out what the user wants.
8. Named Entity recognition
Named Entity Recognition, also known as entity extraction classifies named entities that are present in a text into pre-defined categories like “person”, “company”, “place”, “cities”, “date”, “time” etc.
9. Co-reference resolution
It is the task of finding all expressions that refer to the same entity in a text. Any English text contains a lot of words such as "he","she", "it" etc. It would be helpful to the NLP system if we can know to what subject these pronouns are referring to.For example in the following sentence, you can see that "I" and "he" refers to Nader. This is especially helpful while answering questions like " who is the president of the United States?"
Visit Skyl.ai to learn more about different natural language processing and computer vision projects.