NER Backend

NER backend for act and propriety extraction.

This module contains the ActNER class, which have all that is necessary to extract an act and, its proprieties, using a trained ner model.

class dodfminer.extract.polished.backend.ner.ActNER[source]

Act NER Class.

This class encapsulate all functions, and attributes related to the process of NER extraction.

Note

This class is one of the fathers of the Base act class.

_model

The trained NER model for the act

_add_base_feat(features, sentence, index, prefix)[source]

Updates a dictionary of features with the features of a word.

Parameters
  • features (dict) – Dictionary with the features already processed.

  • sentence (list) – List of words in the sentence.

  • index (int) – Index of the current word in the sentence.

  • prefix (str) – Prefix to be added to the name of the features of the current word.

classmethod _get_base_feat(word)[source]

Get the base features of a word, for the CRF model.

Parameters

word (str) – Word to be processed.

Returns

Dictionary with the base features of the word.

_get_features(sentence)[source]

Get the features of a sentence, for the CRF model.

Parameters

sentence (list) – List of words in the sentence.

Returns

List of dictionaries with the features of each word.

classmethod _limits(sentence)[source]

Find the limits of words in the sentence.

Parameters

sentence (str) – target sentence.

Returns

List of the positions in which each word in sentence starts.

_load_model()[source]

Load Model from models/folder.

Note

This function needs to be overwriten in the child class. If this function is not overwrite the backend will fall back to regex.

_prediction(act)[source]

Predict classes for a single act.

Parameters

act (string) – Full act

Returns

A dictionary with the proprieties and its predicted value.

_predictions_dict(sentence, prediction)[source]

Create dictionary of proprieties.

Create dictionary of tags to save predicted entities.

Parameters
  • sentence (list) – List of words and tokens in the act.

  • prediction ([type]) – The correspondent predicitons for each word in the sentence.

Returns

A dictionary of the proprieties found.

classmethod _preprocess(text)[source]

Preprocess text for CRF model.

_split_sentence(sentence)[source]

Split a sentence into words.

Parameters

sentence (str) – Sentence to be split.

Returns

List of words in the sentence.