Polished Helper
Polished extraction helper functions.
Functions in this files can be used inside, or outside, the ActsExtractor class. Their purpose is to make some tasks easier for the user, like creating txts, searching through files, and print dataframes.
Usage Example:
from dodfminer.extract.polished import helper
helper.print_dataframe(df)
Functions
- dodfminer.extract.polished.helper.build_act_txt(acts, name, save_path='./results/')[source]
Create a text file in disc for a act type.
Note
This function might save data to disc in text format.
- Parameters
acts ([str]) – List of all acts to save in the text file.
name (str) – Name of the output file.
save_path (str) – Path to save the text file.
- dodfminer.extract.polished.helper.committee_classification(all_acts, path, types, backend)[source]
Uses committee classification to find act types.
- Parameters
all_acts (DataFrame) – Dataframe with acts text and regex type.
path (str) – Folder where the Dodfs are.
types ([str]) – Types of the act, see the core class to view avaiables types.
backend (str) – what backend will be used to extract Acts {regex, ner}
- Returns
None
- dodfminer.extract.polished.helper.extract_multiple(files, act_type, backend, txt_out=False, txt_path='./results')[source]
Extract Act from Multiple DODF to a single DataFrame.
Note
This function might save data to disc in text format, if txt_out is True.
- Parameters
files ([str]) – List of dodfs files path.
act_type (str) – Type of the act, see the core class to view avaiables types.
backend (str) – what backend will be used to extract Acts {regex, ner}
txt_out (bool) – Boolean indicating if acts should be saved on text files.
txt_path (str) – Path to save the text files.
- Returns
A dataframe containing all instances of the desired act in the files set.
- dodfminer.extract.polished.helper.extract_multiple_acts(path, types, backend)[source]
Extract multple Acts from Multiple DODFs to act named CSVs.
- Parameters
path (str) – Folder where the Dodfs are.
types ([str]) – Types of the act, see the core class to view avaiables types.
backend (str) – what backend will be used to extract Acts {regex, ner}
- Returns
None
- dodfminer.extract.polished.helper.extract_multiple_acts_parallel(path: str, types: List[str], backend: str, processes=4)[source]
Extract multple Acts from Multiple DODFs to act named CSVs in parallel.
- Parameters
path (str) – Folder where the Dodfs are.
types ([str]) – Types of the act, see the core class to view avaiables types.
backend (str) – what backend will be used to extract Acts {regex, ner}
- Returns
None
- dodfminer.extract.polished.helper.extract_multiple_acts_with_committee(path, types, backend)[source]
Extract multple Acts from Multiple DODFs to act named CSVs. Uses committee_classification to find act types.
- Parameters
path (str) – Folder where the Dodfs are.
types ([str]) – Types of the act, see the core class to view avaiables types.
backend (str) – what backend will be used to extract Acts {regex, ner}
- Returns
None
- dodfminer.extract.polished.helper.extract_single(file, act_type, backend)[source]
Extract Act from a single DODF to a single DataFrame.
Note
This function might save data to disc in text format, if txt_out is True.
- Parameters
files (str) – Dodf file path.
type (str) – Type of the act, see the core class to view avaiables types.
backend (str) – what backend will be used to extract Acts {regex, ner}
- Returns
a dataframe containing all instances of the desired act including the texts found, and a list of the segmented text blocks, and .
- Return type
A tuple containing, respectively
- dodfminer.extract.polished.helper.get_files_path(path, file_type)[source]
Get all files path inside a folder.
Works with nested folders.
- Parameters
path – Folder to look into for files
- Returns:A dataframe containing all instances of the desired
act in the files set. A list of strings with the file path.
- dodfminer.extract.polished.helper.print_dataframe(data_frame)[source]
Style a Dataframe.
- Parameters
styled. (The dataframe to be) –
- Returns
The styled dataframe
- dodfminer.extract.polished.helper.run_extract_simple_wrap(file: str, act_type: str, backend: str) Tuple[str, pandas.DataFrame] [source]
Run one extractions
- dodfminer.extract.polished.helper.run_thread_wrap(files: list, act_type: str, backend: str, all_acts: Queue) None [source]
Run multiple extractions
- dodfminer.extract.polished.helper.run_thread_wrap_multiple(files: list, act_type: str, backend: str) Tuple[str, pandas.DataFrame] [source]
Run multiple extractions
- dodfminer.extract.polished.helper.extract_multiple_acts(path, types, backend)[source]
Extract multple Acts from Multiple DODFs to act named CSVs.
- Parameters
path (str) – Folder where the Dodfs are.
types ([str]) – Types of the act, see the core class to view avaiables types.
backend (str) – what backend will be used to extract Acts {regex, ner}
- Returns
None
- dodfminer.extract.polished.helper.extract_multiple(files, act_type, backend, txt_out=False, txt_path='./results')[source]
Extract Act from Multiple DODF to a single DataFrame.
Note
This function might save data to disc in text format, if txt_out is True.
- Parameters
files ([str]) – List of dodfs files path.
act_type (str) – Type of the act, see the core class to view avaiables types.
backend (str) – what backend will be used to extract Acts {regex, ner}
txt_out (bool) – Boolean indicating if acts should be saved on text files.
txt_path (str) – Path to save the text files.
- Returns
A dataframe containing all instances of the desired act in the files set.
- dodfminer.extract.polished.helper.extract_single(file, act_type, backend)[source]
Extract Act from a single DODF to a single DataFrame.
Note
This function might save data to disc in text format, if txt_out is True.
- Parameters
files (str) – Dodf file path.
type (str) – Type of the act, see the core class to view avaiables types.
backend (str) – what backend will be used to extract Acts {regex, ner}
- Returns
a dataframe containing all instances of the desired act including the texts found, and a list of the segmented text blocks, and .
- Return type
A tuple containing, respectively
- dodfminer.extract.polished.helper.build_act_txt(acts, name, save_path='./results/')[source]
Create a text file in disc for a act type.
Note
This function might save data to disc in text format.
- Parameters
acts ([str]) – List of all acts to save in the text file.
name (str) – Name of the output file.
save_path (str) – Path to save the text file.
- dodfminer.extract.polished.helper.print_dataframe(data_frame)[source]
Style a Dataframe.
- Parameters
styled. (The dataframe to be) –
- Returns
The styled dataframe
- dodfminer.extract.polished.helper.get_files_path(path, file_type)[source]
Get all files path inside a folder.
Works with nested folders.
- Parameters
path – Folder to look into for files
- Returns:A dataframe containing all instances of the desired
act in the files set. A list of strings with the file path.