Machine LearningNatural Language Processing

WordNet in Natural Language Processing

WordNet

WordNet is a lexical database of semantic relations between words in more than 200 languages” – Wikipedia. It groups words into synsets. Synset is a group of words that reflects the same meaning in a given text. In simpler terms, a WordNet is similar to a thesaurus that groups words based on their meanings.

WordNet comes as a part of NLTK corpus. It provides relations between various words. This knowledge can be used to build applications based on Informational Retrieval.

WordNet Relations

Homonymy

Homonyms are words that are spelt and pronounced the same but have different meanings based on the context.

For example: right – correct, direction

  1. Right – correct: It turns out that I was right
  2. Right – direction: Take a right from the next junction to reach a cafe.

Similarly, some other homonymy groups:

  • Pen – writing instrument, to write (verb), holding area for animals
  • Arm – body part, division of a company
  • Bat – an animal, cricket bat
  • Fly – to fly (verb), an insect

Polysemy

Polysemes are words with the same spelling and very relatable meanings; similar to homonymy, but specific to a concept.

For example, consider the following meanings for the word “bank”:

  1. financial institution
  2. bank of the river
  3. building belonging to a financial institute
  4. to rely upon (verb)

In the above examples, 1, 3 and 4 depict a common theme and are polysemes.

Synonymy

Synonyms are words that spell and sound different but have similar meanings.

For example:

  1. small – little
  2. big – large
  3. intelligent – smart
  4. positive – optimistic

Hyponymy

Hyponyms are a set of words that show a relationship between a generic term. The words may or may not be directly, however, refers to the same context.

For example: red, yellow, black, blue – all refer to a general lexical representation for color; i.e, Red is a hyponym of Color.

Similarly examples:

  1. Apple is a hyponym of Fruit
  2. Tomato is a hyponym of Vegetable

Further in the series: Named Entity Recognition in Natural Language Processing

References

Deep Mehta
Deep Mehta is a Machine Learning Engineer, Web Developer and Technical Blogger, currently pursuing Masters in Computer Science from New York University. In addition to being one of the founders of byteiota.com, he is an enthusiast in the domain of Artificial Intelligence. When he isn't working, he is either reading or writing a blog.

You may also like

Leave a reply

Your email address will not be published.