Hypernym Identification

This repository contains code for implementation of "Learning Term Embeddings for Hypernymy Identification" [Yu et al, 2015].

Overview

Hypernym is defined as a word with a broad meaning constituting a category into which words with more specific meanings fall. For example, animal is a hypernym of dog. Correspondingly, dog is hyponym of animal.

The main idea of this paper is to use word embeddings to represent words and train a SVM classifier on top of the embeddings and L1 norm of the difference between the embeddings.

Usually word embeddings (like word2vec and Glove) are trained so as to bring highly co-occuring words together. The word embeddings of cat, dog and paws will all be close to each other. Thus they be can not be used to discriminate between hypernymy and other non-hypernymy relations (like meronyms, cohoponyms). Also as hypernymy relation is non symmetric, one embedding for each word will not suffice. Therefore we need to train two sets of embeddings.

We train embeddings such that the hypernym embedding of hypernym word becomes close to the hyponym embedding of hyponym. This automatically results in hypernym embedding of cohypernyms and hyponym embedding of hyponyms becoming closer. We train the embeddings in such a way that frequently occurring hypernym hyponym pairs are placed more importance.

Requirements

PyTorch 0.3
NumPy
scikit-learn

Usage

python3 test.py animal dog

Instructions

To train embeddings download Probase dataset and place it in data folder as probase.

Datasets used

Probase dataset was used for training embeddings.
BLESS dataset was used for training SVM classifier.

Files description

train_embeddings.py contains the code for training the term embeddings.
train_model.py contains the code for traninig SVM classifier on top of the term embeddings.
models.py contains the classifier class.
test.py is a script that allows to check if two words follow hypernymy relation or not.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
models.py		models.py
test.py		test.py
train_embeddings.py		train_embeddings.py
train_model.py		train_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hypernym Identification

Overview

Requirements

Usage

Instructions

Datasets used

Files description

About

Releases

Packages

Languages

License

abhishek0318/Hypernym-Identification

Folders and files

Latest commit

History

Repository files navigation

Hypernym Identification

Overview

Requirements

Usage

Instructions

Datasets used

Files description

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages