Named Entity Recognition in Natural Language Processing

Named Entity Recognition, NER, is a process in NLP for finding relevant information, called as Named Entities. It classifies these entities into pre-defined classes like name, location, time, organization, etc. NER helps in building applications based on Information Extraction.

NER allows extracting the necessary keywords from the text.

For example, consider a sentence “I have a flight to New York at 5 pm”. Here, we have 2 entities:

New York: Location
5pm: Datetime

We can use this information to answer questions like how, where, with whom, etc. Additionally, NER also performs Chunking. Notice how “New York”, being 2 tokens are considered as one during NER. Although, it is possible to recognize only common classes. If one requires a specific class that is uncommon, one may need to build their own NER model.

Named Entity Recognition

Stanford CoreNLP

Stanford’s CoreNLP provides an online interface for various Natural Language Processing tools. One such tool is Named Entity Recognition.

Enter the input text to annotate
Select the required annotations
Select the language
Submit

This will generate results for the input based on selected annotations. This is a handy tool to work with NER manually.

Stanza

Although CoreNLP provides quick results, it is a web interface. Hence, to automate the task of NER or to integrate with the code, Stanford provides a package named Stanza (stanfordnlp.github.io). This package is available in Python, built on top of PyTorch.

Firstly, install PyTorch based on your system.

Once you have installed PyTorch, you can install Stanza through pip.

pip install stanza

Before using the library, you need to download the English model. This is done through Python shell or saving the code into a python file and running it.

import stanza
stanza.download('en')

Now we can use the library for our Named Entity Recognition.

import stanza
nlp = stanza.Pipelinnlp = stanza.Pipeline(lang='en', processors='tokenize,ner')

text = "I have a flight to New York at 5pm"
doc = nlp(text)
print(doc.entities)

Output

[{
   "text": "New York",
   "type": "GPE",
   "start_char": 20,
   "end_char": 28
 }, {
   "text": "5pm",
   "type": "TIME",
   "start_char": 32,
   "end_char": 35
 }]

Here, GPE refers to Country, State or City. In addition to GPE, some other common tags are PER (person), LOC (Location), TIME (DateTime) and ORG(Organization). The extended Stanza model provides up to 18 classes, however, these 4 classes are enough for most applications. Refer to Stanza documentation for a list of all available classes and additional usage.

With this, we end our series on A Complete Introduction to Natural Language Processing. Further, we will be adding blogs on various applications on NLP.

References

Deep Mehta

Deep Mehta is a Machine Learning Engineer, Web Developer and Technical Blogger, currently pursuing Masters in Computer Science from New York University. In addition to being one of the founders of byteiota.com, he is an enthusiast in the domain of Artificial Intelligence. When he isn't working, he is either reading or writing a blog.