Regex Backend

Regex backend for act and propriety extraction.

This module contains the ActRegex class, which have all that is necessary to extract an act and, its proprieties, using regex rules.

class dodfminer.extract.polished.backend.regex.ActRegex[source]

Act Regex Class.

This class encapsulate all functions, and attributes related to the process of regex extraction.

Note

This class is one of the fathers of the Base act class.

_flags

All the regex flags which will be used in extraction.

_rules

The regex rules for proprieties extraction.

_inst_rule

The regex rule for act extraction.

_find_prop_value(rule, act)[source]

Find a single proprietie in an single act.

Parameters
  • rule (str) – The regex rule to search for.

  • act (str) – The act to apply the rule.

Returns

The found propriety, or a nan in case nothing is found.

_prop_rules()[source]

Rules for extraction of the proprieties.

Must return a dictionary of regex rules, where the key is the propriety type and the value is the rule.

Raises

NotImplementedError – Child class needs to overwrite this method

classmethod _regex_flags()[source]

Flag of the regex search

_regex_instances()[source]

Search for all instances of the act using the defined rule.

Returns

List of all act instances in the text.

_regex_props(act_raw)[source]

Create an act dict with all its proprieties.

Parameters

act_raw (str) – The raw text of a single act.

Returns

The act, and its props in a dictionary format.

_rule_for_inst()[source]

Rule for extraction of the act

Warning

Must return a regex rule that finds an act in two parts, containing a head and a body. Where only the body will be used to search for proprieties.

Raises

NotImplementedError – Child class needs to overwrite this method.