In this series of NLP, we introduce to the basics of Natural Language Processing and how to leverage NLP with a python library – NLTK. By the end of the series, we will include a few applications of NLP along with the explanation for the same.
What is Natural Language Processing?
Natural Language Processing, NLP refers to a field in computer science that deals with the interaction between computer and human languages. NLP aims at allowing computers to interpret human linguistics at various levels. NLP plays an important role in various applications. To name a few – Virtual Assistants, Speech To Text, Machine Translations, AutoCorrect, QnA, and much more. Although various NLP based applications exist, it is still a major task to build new applications, particularly when dealing with a non-English language.
Why is NLP difficult?
-
Ambiguity
- “dog ate a bone” and “bone dog a ate”: Here, both sentences have the same words with the same frequency. However, the position of word occurrence provides meaning to the first sentence and not to the other.
- “Jack saw Ben with a telescope on a mountain”: In the given sentence, is it Jack with a telescope or Ben? Similarly, who is on the mountain?
- “I went to the bank”: The word bank may refer to a financial bank as well as a riverbank.
-
Corpora
A Natural Language has a rich form and structure which introduces an ambiguity in text data. Ambiguity refers to different meanings for the same content. Few examples of ambiguity:
Similarly, you can try to find all the meanings in the sentence: “I made her duck”; there are 5 known meanings to the sentence!
The major challenge of Natural Language Processing is resolving such ambiguities.
Another major challenge for NLP for other languages is the lack of required corpus. Deep Learning based NLP solutions highly rely on data and it’s quality provided during training. However, models trained on a limited amount of data do not provide great results. As a result, building NLP models for non-English is a cumbersome task and has a large scope of future work.
Classification of Natural Language Processing
Natural Language Processing is broadly classified into the following:
A. Natural Language Understanding: NLU deals with understanding a given text and interpreting its meaning. It converts human language into a structured format that is usable by a computer.
B. Natural Language Generation: NLG deals with generating a Natural Language from the available amount of textual data. It involves making sentences that are understood by a human.
Stages of NLP
- Lexical Analysis
- Syntax Analysis
- Semantic Analysis
- Discourse
- Pragmatics
The stages of NLP are briefly discussed in the next blog: Stages of Natural Language Processing (NLP)
Real-World Applications of NLP
-
Personal Assistants
- Speech to Text to convert input voice into a textual format (for example: remind me to buy groceries at 7 pm)
- Process the text to interpret its meaning (setting up a reminder)
- Extract required Named Entities (object – groceries, time – 7 pm)
- Perform necessary action (set the reminder)
- Provide feedback to the user (for example: “your reminder has been set)
-
Machine Translation
-
Text Summarization
-
Auto-Correct and Suggestions
-
Chatbots
We all have used or heard about Siri, Google, Alexa and Cortana. All these personal assistants extensively use NLP for their working and interaction with the user. A personal assistant consists of the following tasks:
Thus, NLP backs up a huge part of personal assistants.
A widely used application of NLP is Machine Translation. It refers to taking input in one natural language and providing the output in another natural language such that the input and output mean the same. A Machine Translation application mainly deals with handling the differences in a given language. For example, the English language does not require a gender for objects in a sentence; whereas, the same sentence in French would require a gender for the object.
English: “You are my friend”
French: “Tu es mon ami”, “Tu es ma amie” (depending on the gender of the friend)
Moreover, for the same example: “Vous êtes mon ami” and “Vous êtes ma amie” are also valid (‘vous’ is the plural or formal form for ‘tu’).
Although many languages can be translated, it is difficult to build accurate translators for languages that have fewer resources. In addition to using it for basic translations (Google Translate), it is possible to process rich pieces of literature and textual data that is available in multiple languages to make them accessible in all languages.
Text Summarization deals with accepting a passage as input, detecting the main points and generating a summary. It allows converting huge textual contexts into small pieces without losing information. These summaries can provide the same information in less amount of time. Similarly, text summarizations can automatically generate headlines or titles for articles or journals.
AutoCorrect models are trained using huge data for a language. Such models can detect human typing errors and automatically correct or suggest possible alternatives to the written text. These models are highly used in text editors, email services, search engines, etc. Although sometimes an unintended auto-correct may mess up what you want to convey and possibly lead into a mess, advanced auto-corrects adapt to your writing style for a better writing experience.
Another major use of NLP is Chatbots. Various business platforms provide chatbots to assist the users without having a dedicated customer care support on the other end. Chatbots can be developed to guide the user with interaction or to automate the business itself. They are highly used in the domain of healthcare and e-commerce.
Further in the series: Stages of Natural Language Processing (NLP)