Polished Core

Pure extractor core module.

This module contains the ActsExtractor class, which have all that is necessary to extract a single act or all the acts from a DODF.

Usage Example:

from dodfminer.extract.polished.core import ActsExtractor
ActsExtractor.get_act_obj(ato_id, file, backend)

The Act Extractor Class

class dodfminer.extract.polished.core.ActsExtractor[source]

Polished Extraction main class.

All interactions with the acts needs to be done through this interface. This class handles all the requests to Regex or NER extraction.

Note

This class is static.

static get_act_df(ato_id, file, backend)[source]

Extract a single act type from a single DODF.

Dataframe format.

Parameters
  • ato_id (string) – The name of the act to extract.

  • file (string) – Path of the file.

  • backend (string) – Backend of act extraction, either Regex or NER.

Returns

A dataframe with extracted information, for the desired act.

static get_act_obj(ato_id, file, backend=None, pipeline=None)[source]

Extract a single act type from a single DODF.

Object format.

Parameters
  • ato_id (string) – The name of the act to extract.

  • file (string) – Path of the file.

  • backend (string) – Backend of act extraction, either Regex or NER.

Returns

An object of the desired act, already with extracted information.

static get_all_df(file, backend)[source]

Extract all act types from a single DODF file.

Dataframe format.

Parameters
  • file (string) – Path of the file.

  • backend (string) – Backend of act extraction, either regex or ner.

Returns

A vector of dataframes with extracted information for all acts.

static get_all_df_highlight(file, backend)[source]

Extract all act types from a single DODF file.

Dataframe format.

Parameters
  • file (string) – Path of the file.

  • backend (string) – Backend of act extraction, either regex or ner.

Returns

A vector of dataframes with extracted information for all acts.

static get_all_df_parallel(file, backend, processes=4) Dict[source]

Extract all act types from a single DODF file in parallel.

Dataframe format.

Parameters
  • file (string) – Path of the file.

  • backend (string) – Backend of act extraction, either regex or ner.

Returns

A vector of dataframes with extracted information for all acts.

static get_all_obj(file, backend=None, pipeline=None)[source]

Extract all act types from a single DODF object.

Object format.

Parameters
  • file (string) – Path of the file.

  • backend (string) – Backend of act extraction, either Regex or NER.

Returns

A vector of objects of all the acts with extracted information.

static get_all_obj_highlight(file, backend=None, pipeline=None)[source]

Extract all act types from a single DODF object.

Object format.

Parameters
  • file (string) – Path of the file.

  • backend (string) – Backend of act extraction, either Regex or NER.

Returns

A vector of objects of all the acts with extracted information.

static get_all_obj_parallel(file, backend, processes=4)[source]

Extract all act types from a single DODF object in paralel.

Object format.

Parameters
  • file (string) – Path of the file.

  • backend (string) – Backend of act extraction, either Regex or NER.

Returns

An vector of objects of all the acts with extracted information.

static get_xml(file, _, i)[source]

Extract all act types from a single DODF in xml.

Dataframe format.

Parameters
  • file (string) – Path of the file.

  • backend (string) – Backend of act extraction, either regex or ner.

Returns

A vector of dataframes with extracted information for all acts.

Returning Objects

The methods in this section return objects or vectors of objects.

static ActsExtractor.get_act_obj(ato_id, file, backend=None, pipeline=None)[source]

Extract a single act type from a single DODF.

Object format.

Parameters
  • ato_id (string) – The name of the act to extract.

  • file (string) – Path of the file.

  • backend (string) – Backend of act extraction, either Regex or NER.

Returns

An object of the desired act, already with extracted information.

static ActsExtractor.get_all_obj(file, backend=None, pipeline=None)[source]

Extract all act types from a single DODF object.

Object format.

Parameters
  • file (string) – Path of the file.

  • backend (string) – Backend of act extraction, either Regex or NER.

Returns

A vector of objects of all the acts with extracted information.

Returning Dataframes

The methods in this section return dataframes or vectors of dataframes.

static ActsExtractor.get_act_df(ato_id, file, backend)[source]

Extract a single act type from a single DODF.

Dataframe format.

Parameters
  • ato_id (string) – The name of the act to extract.

  • file (string) – Path of the file.

  • backend (string) – Backend of act extraction, either Regex or NER.

Returns

A dataframe with extracted information, for the desired act.

static ActsExtractor.get_all_df(file, backend)[source]

Extract all act types from a single DODF file.

Dataframe format.

Parameters
  • file (string) – Path of the file.

  • backend (string) – Backend of act extraction, either regex or ner.

Returns

A vector of dataframes with extracted information for all acts.

static ActsExtractor.get_xml(file, _, i)[source]

Extract all act types from a single DODF in xml.

Dataframe format.

Parameters
  • file (string) – Path of the file.

  • backend (string) – Backend of act extraction, either regex or ner.

Returns

A vector of dataframes with extracted information for all acts.