Polished Core
Table of Contents
Pure extractor core module.
This module contains the ActsExtractor class, which have all that is necessary to extract a single act or all the acts from a DODF.
Usage Example:
from dodfminer.extract.polished.core import ActsExtractor
ActsExtractor.get_act_obj(ato_id, file, backend)
The Act Extractor Class
- class dodfminer.extract.polished.core.ActsExtractor[source]
Polished Extraction main class.
All interactions with the acts needs to be done through this interface. This class handles all the requests to Regex or NER extraction.
Note
This class is static.
- static get_act_df(ato_id, file, backend)[source]
Extract a single act type from a single DODF.
Dataframe format.
- Parameters
ato_id (string) – The name of the act to extract.
file (string) – Path of the file.
backend (string) – Backend of act extraction, either Regex or NER.
- Returns
A dataframe with extracted information, for the desired act.
- static get_act_obj(ato_id, file, backend=None, pipeline=None)[source]
Extract a single act type from a single DODF.
Object format.
- Parameters
ato_id (string) – The name of the act to extract.
file (string) – Path of the file.
backend (string) – Backend of act extraction, either Regex or NER.
- Returns
An object of the desired act, already with extracted information.
- static get_all_df(file, backend)[source]
Extract all act types from a single DODF file.
Dataframe format.
- Parameters
file (string) – Path of the file.
backend (string) – Backend of act extraction, either regex or ner.
- Returns
A vector of dataframes with extracted information for all acts.
- static get_all_df_highlight(file, backend)[source]
Extract all act types from a single DODF file.
Dataframe format.
- Parameters
file (string) – Path of the file.
backend (string) – Backend of act extraction, either regex or ner.
- Returns
A vector of dataframes with extracted information for all acts.
- static get_all_df_parallel(file, backend, processes=4) Dict [source]
Extract all act types from a single DODF file in parallel.
Dataframe format.
- Parameters
file (string) – Path of the file.
backend (string) – Backend of act extraction, either regex or ner.
- Returns
A vector of dataframes with extracted information for all acts.
- static get_all_obj(file, backend=None, pipeline=None)[source]
Extract all act types from a single DODF object.
Object format.
- Parameters
file (string) – Path of the file.
backend (string) – Backend of act extraction, either Regex or NER.
- Returns
A vector of objects of all the acts with extracted information.
- static get_all_obj_highlight(file, backend=None, pipeline=None)[source]
Extract all act types from a single DODF object.
Object format.
- Parameters
file (string) – Path of the file.
backend (string) – Backend of act extraction, either Regex or NER.
- Returns
A vector of objects of all the acts with extracted information.
- static get_all_obj_parallel(file, backend, processes=4)[source]
Extract all act types from a single DODF object in paralel.
Object format.
- Parameters
file (string) – Path of the file.
backend (string) – Backend of act extraction, either Regex or NER.
- Returns
An vector of objects of all the acts with extracted information.
Returning Objects
The methods in this section return objects or vectors of objects.
- static ActsExtractor.get_act_obj(ato_id, file, backend=None, pipeline=None)[source]
Extract a single act type from a single DODF.
Object format.
- Parameters
ato_id (string) – The name of the act to extract.
file (string) – Path of the file.
backend (string) – Backend of act extraction, either Regex or NER.
- Returns
An object of the desired act, already with extracted information.
- static ActsExtractor.get_all_obj(file, backend=None, pipeline=None)[source]
Extract all act types from a single DODF object.
Object format.
- Parameters
file (string) – Path of the file.
backend (string) – Backend of act extraction, either Regex or NER.
- Returns
A vector of objects of all the acts with extracted information.
Returning Dataframes
The methods in this section return dataframes or vectors of dataframes.
- static ActsExtractor.get_act_df(ato_id, file, backend)[source]
Extract a single act type from a single DODF.
Dataframe format.
- Parameters
ato_id (string) – The name of the act to extract.
file (string) – Path of the file.
backend (string) – Backend of act extraction, either Regex or NER.
- Returns
A dataframe with extracted information, for the desired act.
- static ActsExtractor.get_all_df(file, backend)[source]
Extract all act types from a single DODF file.
Dataframe format.
- Parameters
file (string) – Path of the file.
backend (string) – Backend of act extraction, either regex or ner.
- Returns
A vector of dataframes with extracted information for all acts.
- static ActsExtractor.get_xml(file, _, i)[source]
Extract all act types from a single DODF in xml.
Dataframe format.
- Parameters
file (string) – Path of the file.
backend (string) – Backend of act extraction, either regex or ner.
- Returns
A vector of dataframes with extracted information for all acts.