NER Backend

NER backend for act and propriety extraction.

This module contains the ActNER class, which have all that is necessary to extract an act and, its proprieties, using a trained ner model.

class dodfminer.extract.polished.backend.ner.ActNER[source]

Act NER Class.

This class encapsulate all functions, and attributes related to the process of NER extraction.


This class is one of the fathers of the Base act class.


The trained NER model for the act

_add_base_feat(features, sentence, index, prefix)[source]

Updates a dictionary of features with the features of a word.

  • features (dict) – Dictionary with the features already processed.

  • sentence (list) – List of words in the sentence.

  • index (int) – Index of the current word in the sentence.

  • prefix (str) – Prefix to be added to the name of the features of the current word.

classmethod _get_base_feat(word)[source]

Get the base features of a word, for the CRF model.


word (str) – Word to be processed.


Dictionary with the base features of the word.


Get the features of a sentence, for the CRF model.


sentence (list) – List of words in the sentence.


List of dictionaries with the features of each word.

classmethod _limits(sentence)[source]

Find the limits of words in the sentence.


sentence (str) – target sentence.


List of the positions in which each word in sentence starts.


Load Model from models/folder.


This function needs to be overwriten in the child class. If this function is not overwrite the backend will fall back to regex.


Predict classes for a single act.


act (string) – Full act


A dictionary with the proprieties and its predicted value.

_predictions_dict(sentence, prediction)[source]

Create dictionary of proprieties.

Create dictionary of tags to save predicted entities.

  • sentence (list) – List of words and tokens in the act.

  • prediction ([type]) – The correspondent predicitons for each word in the sentence.


A dictionary of the proprieties found.

classmethod _preprocess(text)[source]

Preprocess text for CRF model.


Split a sentence into words.


sentence (str) – Sentence to be split.


List of words in the sentence.