Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured documents. In most of the cases this activity concerns processing human language texts by means of natural language processing (NLP). IE is a general category which has various sub-categories, such as entity extraction, event extraction, relation extraction, coreference resolution, and entity linking to name a few.
Classical approaches are of 4 types:
- Rule Learning based Extraction Methods
- Dictionary based method
- Rule based method
- Classification based Extraction Methods
- Sequential Labeling based Extraction Methods
- Non-linear Conditional Random Fields
Recent breakthroughs in convolutional neural networks (CNNs) proved efficient for capturing syntactic and semantic relations between words within a sentence (see my blog post here) for NLP tasks. CNNs use a pooling layer, which capture the most useful information in a sentence. However, in event extraction, one sentence may contain two or more events, and these events may share the argument with different roles.
consider the following sentence:
United Kingdom will leave the EU in 2020.
For example, there are 3 entities in this sentence, namely, United Kingdom, EU and 2020. If we use a traditional max-pooling layer and only keep the most important information to represent the sentence, we may miss the information about some entities, which is important for predicting other entities.
In the work of Chen et al., instead of using hand-coded features, the researchers employ word-based representations to capture lexical and sentence level features that work with a modified CNN for superior multiple event extractions in text. They used word context, positions, and event type in their representations and embeddings flowing into a CNN with multiple feature maps (see below figure). Instead of a CNN with max pooling, which can miss multiple events happening in a sentence, the researchers employ dynamic multi pooling in which the feature maps are split into three parts, finding a maximum for each part.
As discussed in the previous post, Zheng et. al and Santos et. al employ CNNs for relation classification without any hand-coded features. In the work of Vu et al., relation classification uses a combination of CNNs and RNNs. In a sentence which has entities and relations between them, the researchers perform a split between left and middle, and middle and right part of the sentence, flowing into two different word embeddings and CNN layers with max pooling. This design gives special attention to the middle part, which is an important aspect in relation classification as compared to previous research. Bi-directional RNNs with an additional hidden layer are introduced to capture relation arguments from succeeding words. The combined approach shows significant improvements over traditional feature-based and even independently used CNNs and RNNs.
Nguyen and Grisham use a CNN-based framework for relation extraction. They use word embeddings and position embeddings concatenated as the input representation of sentences with entities having relations. They employ a CNN with multiple filter sizes and max pooling. It is interesting to see that the performance of their framework is better than all handcrafted feature engineering-based machine learning systems that use many morphological and lexical features.
Adel et al. Compare many techniques from a traditional feature-based machine learning to CNN-based deep learning for relation classification in the context of slot filling. They break the sentences into three parts that capturing the contexts from the sentence. The architecture is a CNN with k-max pooling it can identifies the types of the entities and the relation between them as shown in the figure below.