Named-entity recognition (NER) is an important part of natural-language processing (NLP). The main problem is that language-specific resources and features are costly and hard to develop in new languages making NER a difficult task to accomplish. That's why in this article we will talk about the combination of supervised and unsupervised learning. To be more precise, our model is based on long short-term memory (LSTM) for unsupervised feature generation, and conditional random fields (CRF) for the sequential layer above it. Thanks to this, our model is able to save both orthographic sensitivity and distributional representations.
An LSTM network is a recurrent neural network (RNN) built from LSTM units. It is usually composed of a cell, input gate, output gate, and forget gate. LSTM is now widely used to classify, process, and predict time series data. With LSTMs, the information flows through a mechanism known as “cell states”. This way, LSTMs can selectively remember or forget things. The expression “long short-term” refers to the fact that LSTM is a model for the short-term memory which can last for a long period of time. That's why it is now state-of-the-art for many NLP tasks.
Below you can see the default scheme of an LSTM unit:
We will not dive deeply into the architecture of the LSTM unit because it's only part of our network. There are many online sources where you get more information on the uses for and advantages of this type of network.
Below, we will present conditional random fields (CRF). There are two types of Markov random fields: generative and discriminative.
In probability and statistics, a generative model is a model created to generate all values for a phenomenon, both those that can be observed in the world and "target" variables that can only be computed from those that are observed. By contrast, discriminative models provide a model only for the target variable(s), generating them by analyzing the observed variables. In simple terms, discriminative models infer outputs based on inputs while generative models generate both inputs and outputs and are typically given some hidden parameters. (Wiki)
CRF refers to discriminative Markov fields. They are used for building probabilistic models to segment and label sequential data. Below is an architectural diagram of a CRF.
Formally, the Markov random field consists of the following components:
- Unoriented graph (or factor graph) G = (V, E), where each vertex is a variable X, and each edge is a relation between the random variables u and v;
- A set of potential functions (or factors), one for each click (the click is the full subgraph G of the undirected graph) in the graph. The functions put each possible state of the subgraph elements into a certain, non-negative, real-valued number.
Vertices that are not contiguous must correspond to conditionally independent, random variables. The group of adjacent vertices forms a clique; the set of states of vertices is the argument of the corresponding potential function.
PER = Person
DATE = Date
NORP = Nationalities or religious or political groups
ORG = Organization
GPE = Geopolitical Entity
“Usain Bolt (born 21 August 1986) is a retired Jamaican sprinter and world record holder.”
“Max and Kateryna work together at ARVI.”
"KPI students must pass exams before 25th June 2018."