Relation Extraction and Entity Extraction in Text using NLP
By Nikhil Srihari, Disha Mehra, Mengdi Huang, Tanay Varshney, Abhishek Sawarkar and Davide Onofrio
Introduction
Identifying entities and their relations in text is useful for multiple NLP tasks, for example creating Knowledge Graphs, Text Summarization, Question Answering, etc. With this 10 minute read, we will “get our hands dirty with code” and demonstrate an example for training deep neural networks to perform the task at hand.
Extracting the meaning from a sentence in any language requires thorough semantic analysis. We must analyze the grammatical structure of the sentence as well as identify relationships between individual words in a particular context. Individual words referring to the various subjects and objects in a sentence, like names of places, person of interest or more, are referred to as “Entities”. To be fluent in any language, the importance of understanding the aforementioned is paramount.
For instance, consider building a smart Question Answering AI tasked with answering questions about images, using captions describing the images (generating the image captions itself is out of the scope of this blog).
As an example, let us consider Figure 1 and it’s caption “A person is riding a surfboard on the water”. We want our Question Answering system to be able to answer questions on this image using this caption. For it to answer any question, it needs to “understand” the caption — In “a person is riding a surfboard on the water”, it needs to first identify “person”, “surfboard” and “water” as entities and then understand that the relationship amongst them is described by “riding” and “on”.
This process of identifying entities and their relations in text, is streamlined yet complex, and we will be discussing a simple implementation of the language understanding models.
Here is the demo video showcasing the solution we will be building here today:
To keep this blog concise, we restrict our exploration under the following:
- Only direct references to entities are captured:
We do not capture indirect or inferred references to entities. For example, consider “The boy is next to the dog, and he is holding a pen.”. In this text, the pronoun “he” is not detected as a reference to the entity “boy”. - Description of entities (both qualitative and quantitative) isn’t captured:
In the text “2 boys are sitting on a big red bench”, “2” is a quantitative description of the entity “boys” while “big” and “red” are the qualitative descriptions of the entity “bench”. These descriptions aren’t taken into account. - Limiting Relations to exact substrings:
We only capture relations that exist as exact substrings from the input text. - Only direct relations are captured:
Inferred relations aren’t considered. For example, in “The dog is next to the girl”, we capture the fact that the entity “dog” is “next to” the entity “girl”. But we don’t infer that the girl is also next to the dog and capture this information. As another example, consider the text “The boy is to the right of the girl”. We capture the fact that the entity “boy” is to the “right of” the entity “girl”. But we don’t, from this, infer that the entity “girl” is to the “left of” the entity “boy”.
For the comfort of the reader, this blog is organized as follows:
1. Architecture Overview
2. Dataset
3. Entity Detection: Training and Inference
4. Relations Detection: Training and Inference
5. Inference Pipeline
6. Conclusion
Architecture Overview
Let’s now take a look at the inference pipeline architecture diagram as shown in Figure 2.
Our Inference pipeline architecture consists of 2 components — The Entity Detection component and the Relation Detection component.
The input to the inference pipeline is the text string from the user. We take this text from the user and first perform Entity Detection, which identifies all the entities in our input text.
Once we have the entities, we next perform Relation Detection: We first take in the list of all entities and generate pairs of entities for every entity. Then we iteratively call the Relation Detection model which identifies the relation between a pair of entities, taking the text and this pair of entities as input.
We will be going through the training and inference process of these two components in detail.
Dataset
For this work, we use the Microsoft Common Objects in Context (MSCOCO) dataset with 328k images and corresponding text captions. In this example, we choose three super categories: humans, animals, and vehicles, which contain 16,832 images and the corresponding 84,160 captions. Figure 3 shows an example from the MSCOCO dataset.
For both training and inference performance evaluation, we need the user input string and the corresponding ground truth labels for Entity Detection and Relation Detection. We use the captions from the MSCOCO dataset (We ignore the corresponding images) as the user input strings to our use case. We, then, use a custom rule-based parser on the captions (i.e., our user input string) to obtain the ground truth labels. The below image (Figure 4) shows an example of a caption and its corresponding ground truth labels generated by our custom parser.
Entity Detection Component
The Entity Detection model is a token classification model. We will be implementing it, here, by fine tuning a pre-trained BERT-base model. This model takes a concatenated string of input text and classifies every token in the input string as either “O”, “B-ENT” or “I-ENT”. “O” denotes that the token is not an entity, whereas “B-ENT”(or “I-ENT”) denotes that the token is an entity.
Training
The main task at hand is to train an Entity Detection model. Choosing a framework, that lends itself quite well to the workload at hand, is an important first step. For this particular workload, NVIDIA’s NeMo is the perfect fit.
NVIDIA NeMo
NVIDIA NeMo is an open-source toolkit with a PyTorch and PyTorch Lightning backend that pushes the abstractions one step further. NeMo makes it possible for you to compose and train complex, state-of-the-art, neural network architectures quickly. NeMo not only lets the users easily train, build and manipulate AI models but the models built with NeMo toolkit can be easily trained on multi-GPU and multi-node, with or without Mixed Precision. In NeMo, a model is composed of reusable and modular (almost Lego-like) components called Neural Modules. A neural module takes a set of inputs and computes a set of outputs. It can be thought of as an abstraction, that’s somewhere between a layer and a full neural network. Typically, a module corresponds to a conceptual piece of a neural network, such as an encoder, decoder, or language model.
A neural module’s inputs and outputs have a neural type, including the semantics, axis order, and dimensions of the input/output tensor. This typing ensures safety semantic checks between the modules in NeMo models.
For ease of use, we will be using the Token Classification example provided in NeMo for the entity detection task.
Data Preprocessing for Entity Detection with NeMo
To train our model for entity detection using NeMo, we first preprocess the dataset to generate the data in the format that is expected by NeMo for Token Classification. The dataset will be divided into 4 files — text_train.txt, labels_train.txt, text_dev.txt and label_dev.txt. Each row of text_*.txt contains the text of a sample and the corresponding row in labels_*.txt contains the label for this sample. For example, for the user input text “a person walking across the street with a traffic light above”, and for the entities (“person”, “street”, “traffic”, “light”), the entry in the text_*.txt file is same as the input text “a person walking across the street with a traffic light above” and the corresponding row in the labels_*.txt file is “O B-ENT O O O B-ENT O O B-ENT I-ENT O” where O refers to the word in the sentence that is not an entity and B-ENT refers to the word that is an entity and I-ENT refers to the word that is adjoined to the entity that is labelled as B-ENT.
Training with NeMo
Once we have the training dataset ready, we can use the NeMo example for training. Be sure to modify the hyperparameters in the configuration yaml file for token classification in NeMo.
Let us look at some of the configuration fields:
- Pretrained Model: In this section, we provide the configuration needed to use a pre-trained NeMo model.
- Trainer: With NeMo’s trainer we can adjust macro level params such as, the number of GPUs and nodes, number of epochs of training to run, precision setting, etc.
- Experiment Manager: The experiment manager helps log all the information relevant to training. You can tweak these knobs for the same.
- Model: This is the “meat” of the config file where one can adjust all the model specific hyperparameters, such as, layers configuration, the pretrained base language model configuration, optimizer configuration and dataset configuration.
- Hydra: In this section, we provide hydra configuration.
Running training with the NeMo example:
The code we are covering in this blog is hosted here. The step-by-step instructions to run training is described in the README document here.
Inference
In this section, we are going to perform inference on Entities Extraction, as detailed in the /entities-and-relationsintext/entities_extraction/token_classification_inference.py
We first load the model into GPU memory with the load_model function and keep the models ready for inference.
Next, we perform inference with this loaded model by calling the run_inference function with the list of input queries. run_inference calls model_predictions to perform the inference and then performs post processing on these results using the postprocessing function to make the results more readable.
Relation Detection Component
Relation Detection consists of two steps: First, a preprocessing step to generate entity pairs and followed by a relationship extraction step with the Relation Detection Model.
The input is a list of all entities identified by the Entity Detection model, as well as the input text. We need to process this list of entities to generate every possible pair of entities and this is done during Relation Detection Preprocessing. The Relation Detection model is, then, iteratively called for each pair of entities.
The Relation Detection model is a token classification model, which we will be implementing here by fine tuning the pre-trained BERT-base model. This model takes as input a concatenated string of input text, entities `i` and entities `j`, separated by the [SEP] token. It classifies every token in the input concatenated string as either “O”, “B-REL” or “I-REL”, where “O” indicates that this token doesn’t refer to any relation between entity i and entity j, whereas “B-REL”(or “I-REL”) indicates that this token captures the relation between entity `i` and entity `j`.
Training
Let us look at the training procedure for the Relation Detection Model. As we did for the Entity Detection Model, we will be using NeMo for our training. Specifically we will be using the Token Classification example provided in NeMo.
Data Preprocessing
To train our model using NeMo, we first need to preprocess the dataset to generate our training data in the format that NeMo expects for Token Classification. Like in the above Entity Detection model, we have 4 files — text_train.txt, labels_train.txt, text_dev.txt and labels_dev.txt.
Each row of text_*.txt contains the text of a sample and the corresponding row in labels_*.txt contains the label for this sample.
For example, for the user input text “That boy is next to the girl”, and for the entity pair (“boy”, “girl”), the entry in the text_*.txt file is “That boy is next to the girl [SEP] boy [SEP] girl” and the corresponding row in the labels_*.txt file is “O O O B-REL I-REL O O O O O O”.
Training with NeMo
As discussed previously, changing training parameters is done via the config file. You can check out the configuration file for more!
Running training with the NeMo example:
The code we are covering in this blog is hosted here. The step-by-step instructions to run training is described in the README document here.
Inference
During inference, this model takes as input the concatenated string generated by the Preprocessing step and gives us the relation.
In this section, we are going to perform inference on Entities Extraction, as detailed in the /entities-and-relationsintext/relation_extraction/token_classification_inference.py
As in Inference for Entities Detection, we first load the model into GPU memory with the load_model function.
Next, we perform inference with this loaded model by calling the run_inference function with the input text and the list of entities. From the run_inference function, we first call the preprocessing function to generate input queries to the Relation Extraction model. Then we call model_predictions to perform the inference, followed by post processing on these results using the postprocessing function to make the results more readable.
Inference
In this section, we put together the inference calls to the Entities Detection component and Relation Detection component to create an inference pipeline, as shown in Figure 2, and as present in /entities-and-relationsintext/inferencepipeline_interactive.py
We execute the main function, which acts as the starting point for our code execution and first calls load_model and then inference_interactive_loop.
Before we start accepting user input for inference, we load both the models by calling the load_model function in the corresponding components.
We call the inference_interactive_loop method in main to start an interactive inference loop with the user. Here, we accept user input through command prompt, perform inference by calling the inference_pipeline method and finally print the results using the print_results method.
The inference_pipeline function first calls the run_inference method of the Entities Detection component with the user input string. This result, along with the user input string, is then pipelined to an inference call to the run_inference method of the Relation Detection component. We finally return the results from both these calls.
We call this function to print the inference results in a readable format.
Running Inference with NeMo:
The code we are covering in this blog is hosted here. The step-by-step instructions to run training is described in the README document here.
Conclusion
Understanding the relationship between Entities helps us make sense of the structure of any sentence. This forms the very basis of understanding any language. In many ways, this is a good first step to understand the nuance and depth of work required to build good Conversational AIs. With the above, a reader can start to appreciate the key concepts behind the development of Natural Language Understanding models. We also hope to impress the reader’s mind with the following:
While it is certainly good to build things from scratch for understanding the key concepts, it is much better, efficient and most importantly cheaper (in terms of effort, time and resources) to build using modular and customizable tools which are setting industry standards.
In this article, we showcase just how easy it is to use NeMo to train and customize models(in this case a BERT Based Token Classification model) to a preferred use case. It also has modules and pre-trained models to help build and refine other NLU, Speech Recognition and Speech Synthesis models to best fit in your pipeline.
Stay tuned for more Deep Learning stories!
About the Authors
The authors of this story, Nikhil Srihari, Disha Mehra, Mengdi Huang, Tanay Varshney, Abhishek Sawarkar and Davide Onofrio, all work at NVIDIA in the Technical Marketing Engineer — Deep Learning team.