Pattern_Match Documentation

This page contains the Pattern_Match Package documentation.

The sym_cnt_constr Module

class netbench.pattern_match.sym_cnt_constr.b_Sym_cnt_constr(new_text, symbol, m, n, new_id)

Bases: netbench.pattern_match.b_symbol.b_Symbol

A base class to represent a char symbol.

Parameters:
  • new_text (string) – Text description of symbol, should be human readeable.
  • new_id (int) – Symbol identification number, must be unique.
  • symbol (set(string of length 1) or string of length 1) – Char or char class accepted.
  • m (int) – Minimal count of symbol repetitions
  • n (int) – Maximal count of symbol repetitions
accept(text)

If symbol is at the beginning of the text, is removed from the text and reminder is returned. Otherwise accept_exception is raised. Behavior of this method is controled by attributes greedy and limit. The limit attribute has bigger priority than the greedy one. Limit sets exact number of accepted characters. If -1 is set, the greedy attribute will be used. If greedy is True, the accept() method will consume as much characters from input string as posible. If greedy is False, the accept() mathod will consume only minimal number of characters from input string with respect to the m parameter.

Parameter:text (string) – Text to be parsed.
Returns:Text without begining.
Return type:string
Rises:accept_exception if symbol is not at the begining of the text.
collision(set_of_symbols)

This symbol is used only for representing PCRE constraint repetition block. This method implements prefix collision.

Parameter:set_of_symbols (set(b_Symbol)) – Set of symbols.
Returns:True if at least two symbols are in collision, otherwise False is returned.
Return type:boolean
compute_collision(compSymbol)

Compute collision between self and compSymbol.

NOTE: This method is not implemented.

Parameter:other (b_Symbol) – Other symbol.
Returns:Resolved collision - changes to the symbols and new ones, if they are created.
Return type:tuple(set(b_Symbol), set(b_Symbol), set(b_Symbol))
compute_double_stride(compSymbol, reverse, last, local_chars)

Compute double stride using self and compSymbol. This method should be called only by double_stride method.

NOTE: This method is not implemented.

Parameters:
  • compSymbol (b_Symbol) – Other symbol.
  • last (int) – Number of last chars. Usualy equal to current stride. Used only if self is final state.
  • reverse (boolean) – Determinates if self and compSymbol have to be swaped (usefull when original self in double_stride() method can’t compute the double stride).
  • local_chars (list(set(char))) – List of Set of local_chars. Set of all posible chars. used to have striding deterministic. Number of sets in the list is equal to len(self.kchar).
Returns:

New strided symbol.

Return type:

sym_kchar

compute_equal(other)

Compute if two symbols (self and other) are equivalent.

Parameter:other – Other symbol.
Returns:True if the symbols are equivalent, otherwise returns False.
Return type:boolean
export_symbol()

Returns symbol representation compatible with FSM tools - http://www2.research.att.com/~fsmtools/fsm/man4/fsm.5.html. The symbol is encoded in string without whitespace chars. First char is used for symbol class specification and therefore is not part of the encoded symbol. Symbol class specification char for this symbol class is defined by b_symbol.io_mapper[“b_Sym_cnt_constr”].

Returns:Symbol representation compatible with FSM tools.
Return type:string
get_greedy()

Return value of the greedy attribute.

Returns:Value of the greedy attribute.
Return type:boolean
get_limit()

Return value of the limit attribute.

Returns:Value of the limit attribute.
Return type:int
get_support_type()

Return supported types of symbols for current type of symbol.

Returns:Supported types of symbols for current type of symbol.
Return type:list(int)
import_symbol(text_repr, tid)

Creates symbol from its string representation compatible with FSM tools. See export method for more datails.

Parameters:
  • text_repr (string) – String representation.
  • tid (int) – Symbol identification number, must be unique.
is_empty()

Return True if symbol is empty. False if is not empty.

Returns:True if symbol is empty. False if is not empty.
Return type:boolean
set_greedy(greedy)

Set greedy attribute of this symbol. This attribute is used to controle behavior of the accept method.

Parameter:greedy (boolean) –

New value of the attribute:

If greedy is True, the accept() method will consume as much characters from input string as posible.

If greedy is False, the accept() mathod will consume only minimal number of characters from input string with respect to the m parameter.

set_limit(limit)

Set limit attribute of this symbol. This attribute is used to controle behavior of the accept() method and has bigger priority than greedy attribute.

Parameter:limit (int) – Sets exact number of accepted characters. If -1 is set, the greedy attribute will be used.

The nfa_reductions Module

class netbench.pattern_match.nfa_reductions.nfa_reductions

Bases: netbench.pattern_match.b_nfa.b_nfa

Class for NFA reductions.

share_prefixes()

This method reduce states of NFA by prefix sharing. Only static prefixes are shared.

This method sets _compute to False, and get_compute() will return False until compute() is called.

The b_dfa Module

class netbench.pattern_match.b_dfa.b_dfa

Bases: netbench.pattern_match.b_automaton.b_Automaton

A base class for DFA automata.

compute()
Determinise and minimise object. Implentation of inherited method from b_automaton.
determinise(create_table=False, states_limit=0)

Determinisation of automaton.

Parameters:
  • create_table (boolean) – If create_table = false than state representation table is not created and less memory is consumed.
  • states_limit (int) – If num of states exceeds this limit, during determinization, then flag “Deterministic” is set to False and determinize stops; if nfa exceeds limit and is already deterministic, then nothing happens (this is because speed, not because logic); safe use is only if you want to stop algorithm if it exceeds limit; zero means no limit.
Flags:

Set Deterministic, Epsilon Free and Alphabet collision free.

This method sets _compute to False, and get_compute() will return False until compute() is called.

isomorphic(nfa_data)

Check isomorfism of automaton with another nfa_data object. Both automatons must be deterministic and without unreachable states.

Parameter:nfa_data (nfa_data) – nfa_data object containing second automaton.
Returns:True if automata are isomorphic, else otherwise.
Return type:boolean
minimise()

Minimalization of DFA automaton.

Raises:ALPHABET_COLLISION_ERROR() if alphabet is not collision free.
Raises:DETERMINISTIC_ERROR() if automaton is not deterministic.
Flags:Sets Minimal flag to true.

This method sets _compute to False, and get_compute() will return False until compute() is called.

report_memory_naive()

Report consumed memory in bytes. Naive mapping algorithm is used (2D array). Basic algorithm for this variant of mapping is: M = |states| * |alphabet| * ceil(log(|states| + 1, 2) / 8)

Returns:Returns number of bytes.
Return type:int
report_memory_optimal()

Report consumed memory in bytes. Optimal mapping algorithm is used (with oracle). Basic algorithm for this variant of mapping is: M = |transitions| * ceil(log(|states|, 2) / 8)

Returns:Returns number of bytes.
Return type:int

The parser Module

class netbench.pattern_match.parser.parser(selected_parser='pcre_parser', *args)

Bases: netbench.pattern_match.nfa_parser.nfa_parser

A mata class wrapping under single interface any class for parsing of regular expressions based on base class nfa_parser.

Parameters:
  • selected_parser (string or nfa_parser) – Which class is used for parsing of regular expressions. Defaults to “pcre_parser”. This parameter can be either name of parser class (eg. pcre_parser) or object of class based on nfa_parser class.
  • args (list(Any type)) – any parser parameters. NOTE: Caller is suppossed to assign corect parametrs of corect types. If parameters excess the number of accepted parameters, then they are discarded.
get_nfa()

Parse a current line and returns parsed nfa.

Returns:Created automaton in nfa_data format. Returns None if failure happens.
Return type:nfa_data or None
get_position()

Returns position in ruleset.

Returns:Position in ruleset.
Return type:int
load_file(filename)

This function is used to specify input file and load the whole file into the input text atribute.

Parameter:filename (string) – Name of file.
move_to_line(line)

Move to the specified line

Parameter:line (int) – Line number.
Returns:True if move was performed, Otherwise False is returned.
Return type:boolean
next_line()

Move to the next line (next regular expression)

Returns:True if move was performed, Otherwise False is returned.
Return type:boolean
num_lines()

Returns number of lines.

Returns:Number of lines. Each line corespond to single regular expression.
Return type:int
reset()
Reset the position counter to 0. Parsing will continue from the begining.
set_text(input_text)

Set text to parse - can have multiple text lines

Parameter:input_text (string) – Regular expressions.

The nfa_data Module

class netbench.pattern_match.nfa_data.nfa_data

A class to specify NFA automaton (Q,T,q0,Delta,F).

Attribute Type Description
states dict(int, b_State) Finite set of states.
alphabet dict(int, b_Symbol) Symbols of alphabet.
start int ID of Start state.
transitions set(tupple(int, int, int)) Transitions. Format (Start state ID, Symbol ID, End state ID)
final set(int) Final states.
Flags dict(string, any type) Flags for specified properties.

Flags dictionary stores various properties of automaton. The list of all supported flags is in the next table.

Flag Type Description
Deterministic boolean Is automaton deterministic?
Strided boolean Is automaton strided (accept multiple characters)?
Stride int Number of characters accepted at once by the automaton.
Epsilon Free boolean Is automaton without epsilon transitions?
Alphabet collision free boolean Is alphabet without collisions?
Minimal boolean Is DFA minimal?
Delay DFA boolean Is automaton Delay DFA?
Extend_FA boolean Is automaton Extend DFA?
History FA boolean Is automaton History DFA?
Hybrid FA - one NFA part boolean Is automaton NFA part of Hybrid DFA?
Hybrid FA - DFA part boolean Is automaton DFA part of Hybrid DFA?
ExportToFsm(FileName='automaton.fsm', SymbolFileName='automaton.sym')

Save automaton to file in FSM format. Based on FSM man page: http://www2.research.att.com/~fsmtools/fsm/man4.html

Parameters:
  • FileName (string) – File name to which the fsm part will be exported.
  • SymbolFileName (string) – File name to which the sym part will be exported.

NOTE: This method is deprecated. Use export_to_fsm().

ImportFromFsm(FileName='automaton.fsm', SymbolFileName='automaton.sym')

Load automaton from file in FSM format. Based on FSM man page: http://www2.research.att.com/~fsmtools/fsm/man4.html . This method must be updated if new symbol is added to Netbench. Raises Exception if unknown symbol string type is found and coresponding class can not be determinated.

Parameters:
  • FileName (string) – File name from which the fsm part will be imported.
  • SymbolFileName (string) – File name from which the sym part will be imported.
Raises:

nfa_data_import_exception if unknown symbol string type is found and coresponding class can not be determinated.

NOTE: This method is deprecated. Use import_from_fsm().

LoadFromFile(FileName)

Load nfa_data from file (serialisation).

Parameter:FileName (string) – Name of file from which nfa_data will be loaded.
Returns:Object created from file is returned.
Return type:nfa_data

NOTE: This method is deprecated. Use load_from_file().

SaveToFile(FileName)

Save nfa_data to file (serialisation).

Parameter:FileName (string) – Name of file in which nfa_data will be saved.
Returns:True if success, False otherwise.
Return type:boolean

NOTE: This method is deprecated. Use save_to_file().

Show(FileName, sizeStr=' size="8, 5"n')

Save graphviz dot file, representing graphical structure of nfa_data.

Parameters:
  • FileName (string) – Name of file into which nfa_data graphical representation will be saved.
  • sizeStr (string) – Size of resulting image. Set to ” ” to set unbounded dimensions. Format of this parameter is / size=”x,y”n/ to set width to x inches and height to y inches.
Returns:

True if success, False otherwise.

Return type:

boolean

NOTE: This method is deprecated. Use show().

add_states(states)

Add states to automaton. States can be list, tuple, set and frozen set of states objects. States can also be an an state object (object of b_State class or class derived from b_State class). If state is final its added to the final set of automaton. If state can not be added due an id collision exception is raised and rollback on states and final is performed.

Parameter:states (list(b_State), tuple(b_State), set(b_State), frozenset(b_State) or b_State) – States id to add.
Raises:general_unsupported_type if type of states is not supported.
add_symbols(symbols)

Add symbols to automaton. Symbols can be list, tuple, set and frozen set of symbol objects (objects of classes derived from b_Symbol). Symbols can also be a symbol object. If symbol can not be added due an id collision exception is raised and rollback on symbols is performed. Before calling this method, the symbols must be checked by check_symbols() or has_symbol() method and all symbol must be unique (result of this methods must be False).

Parameter:symbols (list(b_Symbol), tuple(b_Symbol), set(b_Symbol), frozenset(b_Symbol) or b_Symbol) – Symbols id to add.
Raises:general_unsupported_type if type of symbols is not supported.
add_transitions(transitions)

Add transitions to automaton. Transitions can be list, tuple, set and frozen set of tuple(int, int, int). Transitions can also be a tuple(int, int, int).

Parameter:transitions (list(tuple(int, int, int)), tuple(tuple(int, int, int)), set(tuple(int, int, int)), frozenset(tuple(int, int, int)) or tuple(int, int, int)) – Transitions to add.
Raises:general_unsupported_type if type of transitions is not supported.
check_symbols(symbols)

This method checks if symbols in symbols are in automaton alphabet. Symbols can be list, tuple, set and frozen set of symbolobjects. Returns list of boolean values - True if symbol is in alphabet, otherwise False.

Parameter:symbols (list(b_Symbol), tuple(b_Symbol), set(b_Symbol), or frozenset(b_Symbol)) – Symbols id to check.
Returns:Mapping of symbols to their presence in automaton.
Return type:list(tuple(b_Symbol, boolean))
Raises:general_unsupported_type if type of symbols is not supported.
export_to_fsm(FileName='automaton.fsm', SymbolFileName='automaton.sym')

Save automaton to file in FSM format. Based on FSM man page: http://www2.research.att.com/~fsmtools/fsm/man4.html

Parameters:
  • FileName (string) – File name to which the fsm part will be exported.
  • SymbolFileName (string) – File name to which the sym part will be exported.
get_max_alphabet_id()

Returns max id of symbols

Returns:Max id of symbols.
Return type:int
get_max_state_id()

Returns max id of states.

Returns:Max id of states.
Return type:int
get_symbol_id(symbol)

This method returns id of equivalent symbol is in alphabet, otherwise throws exception unknown_symbol.

Parameter:symbol (b_Symbol) – Symbol to be checked.
Returns:Id of equivalent symbol in alphabet.
Return type:int
Raises:symbol_not_found if symbol is not in alphabet.
has_symbol(symbol)

This method returns True if symbol is in alphabet, otherwise returns False.

Parameter:symbol (b_Symbol) – Symbol to be checked.
Returns:True if symbol is in alphabet, otherwise returns False.
Return type:boolean
import_from_fsm(FileName='automaton.fsm', SymbolFileName='automaton.sym')

Load automaton from file in FSM format. Based on FSM man page: http://www2.research.att.com/~fsmtools/fsm/man4.html . This method must be updated if new symbol is added to Netbench. Raises Exception if unknown symbol string type is found and coresponding class can not be determinated.

Parameters:
  • FileName (string) – File name from which the fsm part will be imported.
  • SymbolFileName (string) – File name from which the sym part will be imported.
Raises:

nfa_data_import_exception if unknown symbol string type is found and coresponding class can not be determinated.

is_consistent(syndrome=None)

Return True if nfa_data structure is consistent, otherwise returns False. This method is intended for debuging purposes.

Parameter:syndrome (dict(int->int)) –

Dict of inconsistency syndromes. The dict have to be set to empty dict (If the dict is nonempty, it will be cleared). If syndrome is None no syndrome is returned. Syndrome keys are ints describing kind of inconsitency, the values are lists of appropriate objects.

Key Description
0 Check if values of the states dict are states failed.
1 Check if values of the alphabet dict are symbols failed.
2 Check if values of the transitions set are tuples with at least 3 items failed.
3 Check if transitions are valid failed.
4 Check if final set is valid.
5 Check if start is valid.
6 Check if id in the state object is same as key in the states dict.
7 Check if id in the symbol object is same as key in the alphabet dict.
Key Values
0 List of failed states’ ids.
1 List of failed symbol’s ids.
2 List of failed transitions.
3 List of failed transitions (set of tuples).
4 List of failed states’ ids.
5 List of failed states’ ids.
6 List of failed states’ ids.
7 List of failed symbol’s ids.
Returns:True if nfa_data structure is consistent, otherwise returns False.
Return type:boolean
is_empty()

Return True if nfa_data structure is empty, false otherwise.

Returns:True if nfa_data structure is empty, false otherwise.
Return type:boolean
load_from_file(FileName)

Load nfa_data from file (serialisation).

Parameter:FileName (string) – Name of file from which nfa_data will be loaded.
Returns:Object created from file is returned.
Return type:nfa_data
remove_states(states)

Removes states from automaton. States can be list, tuple, set and frozen set of ids. States can also be an id. If state is final state, the state will be also removed from final.

Parameter:states (list(int), tuple(int), set(int), frozenset(int) or int) – States id to be removed.
Raises:general_unsupported_type if type of states is not supported.
remove_symbols(symbols)

Removes symbols from automaton. Symbols can be list, tuple, set and frozen set of ids. Symbols can also be an id.

Parameter:symbols (list(int), tuple(int), set(int), frozenset(int) or int) – Symbols id to be removed.
Raises:general_unsupported_type if type of symbols is not supported.
remove_transitions(transitions)

Removes transitions from automaton. Transitions can be list, tuple, set and frozen set of tuple(int, int, int). Transitions can also be a tuple(int, int, int).

Parameter:transitions (list(tuple(int, int, int)), tuple(tuple(int, int, int)), set(tuple(int, int, int)), frozenset(tuple(int, int, int)) or tuple(int, int, int)) – Transitions to be removed.
Raises:general_unsupported_type if type of transitions is not supported.
save_to_file(FileName)

Save nfa_data to file (serialisation).

Parameter:FileName (string) – Name of file in which nfa_data will be saved.
Returns:True if success, False otherwise.
Return type:boolean
show(FileName, sizeStr=' size="8, 5"n')

Save graphviz dot file, representing graphical structure of nfa_data.

Parameters:
  • FileName (string) – Name of file into which nfa_data graphical representation will be saved.
  • sizeStr (string) – Size of resulting image. Set to ” ” to set unbounded dimensions. Format of this parameter is / size=”x,y”n/ to set width to x inches and height to y inches.
Returns:

True if success, False otherwise.

Return type:

boolean

The pcre_parser Module

class netbench.pattern_match.pcre_parser.pcre_parser(create_cnt_constr=False, create_eof_symbols=False, *args)

Bases: netbench.pattern_match.nfa_parser.nfa_parser

Class for parsing RE using new C pcre parser. For the correct work of this class it is required that the path to the netbench was in the system variable NETBENCHPATH..

FORMAT of Automata file (MSFM2) version 2:

  1. Number of the States in the automaton
  2. Number of the transition in the automaton
  3. Number of start state
  4. Each transition is represenetd by one line in the file. Line is in format Source_State|Symbol|Target_State|Epsilon
  5. End of the transition table is represented by line of #
  6. Number of the end states
  7. Line with identifikator of the endState. Every endstate is folowed by , (coma)
  8. End of endState section is represented by line of #
  9. Number of the symbols in symbol table
  10. Every symbol is stored on its own line and it is represented as Symbol_Number:Character1|Character2| . Characters are encoded as hexa numbers.
  11. End of the file
get_nfa()

Parse a current line and returns parsed nfa.

Returns:Created automaton in nfa_data format. Returns None if failure happens.
Return type:nfa_data or None

The sym_kchar Module

class netbench.pattern_match.sym_kchar.b_Sym_kchar(new_text, kchar, new_id)

Bases: netbench.pattern_match.b_symbol.b_Symbol

A base class to represent a k-char symbol. Chars can be char or char classes.

Parameters:
  • new_text (string) – Text description of symbol, should be human readeable.
  • new_id (int) – Symbol identification number, must be unique.
  • kchar (tuple(frozenset(char) or char)) – k-char
accept(text)

If symbol is at the beginning of the text, is removed from the text and reminder is returned. Otherwise accept_exception is raised.

Parameter:text (string) – Text to be parsed.
Returns:Text without begining.
Return type:string
Rises:accept_exception if symbol is not at the begining of the text.
collision(set_of_symbols)

Return True if two or more symbols from set_of_symbols can be accepted for the same text.

Parameter:set_of_symbols (set(b_Symbol)) – Set of symbols.
Returns:True if at least two symbols are in collision, otherwise False is returned.
Return type:boolean
compute_collision(compSymbol)

Compute collision between self and compSymbol.

Parameter:other (b_Symbol) – Other symbol.
Returns:Resolved collision - changes to the symbols and new ones, if they are created.
Return type:tuple(set(b_Symbol), set(b_Symbol), set(b_Symbol))
compute_double_stride(compSymbol, reverse, last, local_chars)

Compute double stride using self and compSymbol. This method should be called only by double_stride method.

Parameters:
  • compSymbol (b_Symbol) – Other symbol.
  • last (int) – Number of last chars. Usualy equal to current stride. Used only if self is final state.
  • reverse (boolean) – Determinates if self and compSymbol have to be swaped (usefull when original self in double_stride() method can’t compute the double stride).
  • local_chars (list(set(char))) – List of Set of local_chars. Set of all posible chars. used to have striding deterministic. Number of sets in the list is equal to len(self.kchar).
Returns:

New strided symbol.

Return type:

sym_kchar

compute_equal(other)

Compute if two symbols (self and other) are equivalent.

Parameter:other – Other symbol.
Returns:True if the symbols are equivalent, otherwise returns False.
Return type:boolean
export_symbol()

Returns symbol representation compatible with FSM tools - http://www2.research.att.com/~fsmtools/fsm/man4/fsm.5.html. The symbol is encoded in string without whitespace chars. First char is used for symbol class specification and therefore is not part of the encoded symbol. Symbol class specification char for this symbol class is defined by b_symbol.io_mapper[“b_Sym_kchar”].

Returns:Symbol representation compatible with FSM tools.
Return type:string
get_support_type()

Return supported types of symbols for current type of symbol.

Returns:Supported types of symbols for current type of symbol.
Return type:list(int)
import_symbol(text_repr, tid)

Creates symbol from its string representation compatible with FSM tools. See export method for more datails.

Parameters:
  • text_repr (string) – String representation.
  • tid (int) – Symbol identification number, must be unique.
is_empty()

Return True if symbol is empty. False if is not empty.

Returns:True if symbol is empty. False if is not empty.
Return type:boolean

The aux_func Module

Module containing auxiliary functions for classes, scripts, etc.

netbench.pattern_match.aux_func.deprecation_warning(element_type, old_name, new_name)

Prints deprecation warning on stderr.

Parameters:
  • element_type (string) – Type of deprecated element. Eg. module, class, method, function.
  • old_name (string) – Name of deprecated element.
  • new_name (string) – Name of new element.
netbench.pattern_match.aux_func.getPatternMatchDir()

Parses the NETBENCHPATH environment variable and returns path to pattern match.

Raises:no_netbench_path_variable() if environment variable NETBENCHPATH doesn’t exist.
Returns:Path to Netbench base dir.
Return type:string
netbench.pattern_match.aux_func.getstatusoutput(cmd, input_data)
Run command with arguments and return its output as a string. :param cmd: Command to run. :type cmd: string :param input_data: Data to be passed to stdin. If no data are being passes set this param to None. :type input_data: string :returns: Exit code and content of std_out and std_err. :rtype: tuple(int, string, string)
netbench.pattern_match.aux_func.isPickle(fileName)

Acording to suffix of file name determinates if file is pickled object (.pkl) or ruleset.

Parameter:fileName (string) – Name of input file
Returns:True if pickled object, otherwise return False.
Return type:boolean

The nfa_parser Module

class netbench.pattern_match.nfa_parser.nfa_parser

A base class for parsing of regular expressions.

get_nfa()

Parse a current line and returns parsed nfa.

Returns:Created automaton in nfa_data format. Returns None if failure happens.
Return type:nfa_data or None
get_position()

Returns position in ruleset.

Returns:Position in ruleset.
Return type:int
load_file(filename)

This function is used to specify input file and load the whole file into the input text atribute. Format of this file is simple - one regular expression on each line. Regular expression (RE) must have following form: /RE/M - where RE is regular expression and M are PCRE modifiers.

Parameter:filename (string) – Name of file.
move_to_line(line)

Move to the specified line

Parameter:line (int) – Line number.
Returns:True if move was performed, Otherwise False is returned.
Return type:boolean
next_line()

Move to the next line (next regular expression)

Returns:True if move was performed, Otherwise False is returned.
Return type:boolean
num_lines()

Returns number of lines.

Returns:Number of lines. Each line corespond to single regular expression.
Return type:int
reset()
Reset the position counter to 0. Parsing will continue from the begining.
set_text(input_text)

Set text to parse - can have multiple text lines. Format of the text is simple - one regular expression on each line. Regular expression (RE) must have following form: /RE/M - where RE is regular expression and M are PCRE modifiers.

Parameter:input_text (string) – Regular expressions.

The b_symbol Module

class netbench.pattern_match.b_symbol.DEF_SYMBOLS(new_text, new_id)

Bases: netbench.pattern_match.b_symbol.b_Symbol

Class for default symbol, which is used in Delay DFA.

Parameters:
  • new_text (string) – Text description of symbol, should be human readeable.
  • new_id (int) – Symbol identification number, must be unique.
accept(text)

If symbol is at the beginning of the text, is removed from the text and reminder is returned. Otherwise accept_exception is raised.

Parameter:text (string) – Text to be parsed.
Returns:Text without begining.
Return type:string
Rises:accept_exception if symbol is not at the begining of the text.
collision(set_of_symbols)

Returns True if def_symbol is present in set_of_symbols otherwise returns False. From definition this symbol can be in collision only with self.

Parameter:set_of_symbols (set(b_Symbol)) – Set of symbols.
Returns:True if at least two symbols are in collision, otherwise False is returned.
Return type:boolean
compute_collision(other)

Compute collision with def_symbol. Collision can be only with other def_symbol.

Parameter:other (b_Symbol) – Other symbol.
Returns:Resolved collision - changes to the symbols and new ones, if they are created.
Return type:tuple(set(b_Symbol), set(b_Symbol), set(b_Symbol))
compute_equal(other)

Compute if two symbols (self and other) are equivalent.

Parameter:other – Other symbol.
Returns:True if the symbols are equivalent, otherwise returns False.
Return type:boolean
export_symbol()

Returns symbol representation compatible with FSM tools - http://www2.research.att.com/~fsmtools/fsm/man4/fsm.5.html. The symbol is encoded in string without whitespace chars. First char is used for symbol class specification and therefore is not part of the encoded symbol. Symbol class specification char for this symbol class is defined by b_symbol.io_mapper[“DEF_SYMBOLS”].

Returns:Symbol representation compatible with FSM tools.
Return type:string
get_support_type()
Return supported types of symbols for current type of symbol.
import_symbol(text_repr, tid)

Creates symbol from its string representation compatible with FSM tools. See export method for more datails.

Parameters:
  • text_repr (string) – String representation.
  • tid (int) – Symbol identification number, must be unique.
is_empty()

Return True if symbol is empty. False if is not empty.

Returns:True if symbol is empty. False if is not empty.
Return type:boolean
class netbench.pattern_match.b_symbol.b_Symbol(new_text, new_id)

A base class to represent a symbol.

Parameters:
  • new_text (string) – Text description of symbol, should be human readeable.
  • new_id (int) – Symbol identification number, must be unique.
accept(text)

If symbol is at the beginning of the text, is removed from the text and reminder is returned. Otherwise accept_exception is raised.

Parameter:text (string) – Text to be parsed.
Returns:Text without begining.
Return type:string
Rises:accept_exception if symbol is not at the begining of the text.
collision(set_of_symbols)

Return True if two or more symbols from set_of_symbols can be accepted for the same text.

Parameter:set_of_symbols (set(b_Symbol)) – Set of symbols.
Returns:True if at least two symbols are in collision, otherwise False is returned.
Return type:boolean
compute_double_stride(compSymbol, reverse, last, local_chars)

Compute double stride using self and compSymbol. This method should be called only by double_stride method.

Parameters:
  • compSymbol (b_Symbol) – Other symbol.
  • last (int) – Number of last chars. Usualy equal to current stride. Used only if self is final state.
  • reverse (boolean) – Determinates if self and compSymbol have to be swaped (usefull when original self in double_stride() method can’t compute the double stride).
  • local_chars – List of set of local_chars. Set of all posible chars for each new subsymbol of final kchar (char + char = 2 strided kchar, list will have exactly 1 set present). used to have striding deterministic.
Returns:

New strided symbol and unused symbols from local chars.

Return type:

tuple(b_Symbol, set())

double_stride(compSymbol, last, local_chars)

Double stride using self and compSymbol.

Parameters:
  • compSymbol (b_Symbol) – Other symbol.
  • last (int) – Number of last chars. Usualy equal to current stride. Used only if self is final state.
  • local_chars – List of set of local_chars. Set of all posible chars for each new subsymbol of final kchar (char + char = 2 strided kchar, list will have exactly 1 set present). used to have striding deterministic.
Returns:

New strided symbol and unused symbols from local chars.

Return type:

tuple(b_Symbol, set())

Raises:

symbol_double_stride_exception if double stride can’t be computed.

export_symbol()

Returns symbol representation compatible with FSM tools - http://www2.research.att.com/~fsmtools/fsm/man4/fsm.5.html. The symbol is encoded in string without whitespace chars. First char is used for symbol class specification and therefore is not part of the encoded symbol. Symbol class spcification char should be in [0-9a-zA-Z] and must be unique.

Returns:Symbol representation compatible with FSM tools.
Return type:string
get_id()

Return symbol identification number.

Returns:Symbol identification number.
Return type:int
get_support_type()

TODO: dohodnout se na schuzi Return supported types of symbols for current type of symbol and method.

Parameter:method – Specifies method for which supported symbols are requested.
Returns:Set of supported types.
Return type:set(int)
get_text()

Return symbol description for graph representation.

Returns:Symbol description for graph representation.
Return type:string
get_type()

Return type of symbol.

Returns:Type of symbol.
Return type:int
import_symbol(text_repr, tid)

Creates symbol from its string representation compatible with FSM tools. See export method for more datails.

Parameters:
  • text_repr (string) – String representation.
  • tid (int) – Symbol identification number, must be unique.
is_empty()

Return True if symbol is empty. False if is not empty.

Returns:True if symbol is empty. False if is not empty.
Return type:boolean
resolve_collision(compSymbol)

Resolve collision with self and compSymbol.

Parameter:compSymbol (b_Symbol) – Other symbol.
Returns:Resolved collision - changes to the symbols and new ones, if they are created.
Return type:tuple(set(b_Symbol), set(b_Symbol), set(b_Symbol))
Raises:symbol_resolve_collision_exception() if collision can’t be computed.
set_id(new_id)

Sets the id of symbol to new_id.

Parameter:new_id (int) – Symbol identification value.
set_text(text)

Sets the text description of symbol to text.

Parameter:text (string) – Description of symbol.
netbench.pattern_match.b_symbol.io_mapper
Maps class name to coresponding class specification char. If new symbol is added, io_mapper, io_reverse_mapper and nfa_data class method ImportFromFsm must be updated.
netbench.pattern_match.b_symbol.io_reverse_mapper
Maps class specification char to coresponding class name. If new symbol is added, io_mapper, io_reverse_mapper and nfa_data class method ImportFromFsm must be updated.
netbench.pattern_match.b_symbol.types
Maps class name to coresponding class specification char. It contains same values as io_mapper, but it can be redefined in future. For testing if object is of specified symbol class this dictionary have to be used.

The sym_string Module

class netbench.pattern_match.sym_string.b_Sym_string(new_text, string, new_id)

Bases: netbench.pattern_match.b_symbol.b_Symbol

A base class to represent a string symbol.

Parameters:
  • new_text (string) – Text description of symbol, should be human readeable.
  • new_id (int) – Symbol identification number, must be unique.
  • string (string) – String symbol
accept(text)

If symbol is at the beginning of the text, is removed from the text and reminder is returned. Otherwise accept_exception is raised.

Parameter:text (string) – Text to be parsed.
Returns:Text without begining.
Return type:string
Rises:accept_exception if symbol is not at the begining of the text.
collision(set_of_symbols)

Return True if two or more symbols from set_of_symbols can be accepted for the same text.

Parameter:set_of_symbols (set(b_Symbol)) – Set of symbols.
Returns:True if at least two symbols are in collision, otherwise False is returned.
Return type:boolean
compute_equal(other)

Compute if two symbols (self and other) are equivalent.

Parameter:other – Other symbol.
Returns:True if the symbols are equivalent, otherwise returns False.
Return type:boolean
export_symbol()

Returns symbol representation compatible with FSM tools - http://www2.research.att.com/~fsmtools/fsm/man4/fsm.5.html. The symbol is encoded in string without whitespace chars. First char is used for symbol class specification and therefore is not part of the encoded symbol. Symbol class specification char for this symbol class is defined by b_symbol.io_mapper[“b_Sym_string”].

Returns:Symbol representation compatible with FSM tools.
Return type:string
get_support_type()

Return supported types of symbols for current type of symbol.

Returns:Supported types of symbols for current type of symbol.
Return type:list(int)
import_symbol(text_repr, tid)

Creates symbol from its string representation compatible with FSM tools. See export method for more datails.

Parameters:
  • text_repr (string) – String representation.
  • tid (int) – Symbol identification number, must be unique.
is_empty()

Return True if symbol is empty. False if is not empty.

Returns:True if symbol is empty. False if is not empty.
Return type:boolean

The b_state Module

class netbench.pattern_match.b_state.b_State(mid=0, rnum=set([], ))

A base class for state representation.

Parameters:
  • mid (int) – State unique identification number
  • rnum (set(int)) – Set of regular expression numbres. If set is empty the state is not final. The set of regular expression numbers identify which RE are matched in particular final state.
compute_join(other)

Computes join of two states. Note that new state ID must be set after. Default value is -2.

Parameter:other (b_State) – Second state.
Returns:Joined states.
Return type:b_State
get_id()

Returns state identification number.

Returns:State identification number
Return type:int
get_regexp_number()

Returns set of indexes of regular expression, which corresponds to the final state. If empty set value is returned the state is not final and do not represent any regular expression.

Returns:Set of regular expression numbres coresponding to the state
Return type:set(int)
get_support_type()

Returns supported types of states for current type of state.

Returns:Returns supported types of states for current type of state.
Return type:list(int)
get_text()

Returns text description for graph representation.

Returns:Text description of state
Return type:string
get_type()

Returns type of state.

Returns:Returns type of state.
Return type:int
is_final()

Returns true if the state is a final state.

Returns:True if the state is a final state, False otherwise.
Return type:boolean
join(other)

Joins using self and other state. Creates new state.

Parameter:other (b_State) – Other state.
Returns:New joined state.
Return type:b_State
Raises:state_join_exception() if join can’t be computed.
set_id(new_id)

Sets the id of state to new_id.

Parameter:new_id (int) – New unique state identification number.
set_regexp_number(new_rnum)

Sets set of indexes of regular expression, which corresponds to the final state. If empty set is set the state is not final and do not represent any regular expression.

Parameter:new_rnum (Set of Int) – New set of regular expression numbres
netbench.pattern_match.b_state.types
Maps class name to coresponding class specification char. For testing if object is of specified state class this dictionary have to be used.

The sym_char_class Module

class netbench.pattern_match.sym_char_class.b_Sym_char_class(new_text, charClass, new_id)

Bases: netbench.pattern_match.b_symbol.b_Symbol

A base class to represent a char class symbol.

Parameters:
  • new_text (string) – Text description of symbol, should be human readeable.
  • new_id (int) – Symbol identification number, must be unique.
  • charClass (set(string of length 1)) – Char class
accept(text)

If symbol is at the beginning of the text, it’s removed from the text and reminder is returned. Otherwise accept_exception is raised.

Parameter:text (string) – Text to be parsed.
Returns:Text without begining.
Return type:string
Rises:accept_exception if symbol is not at the begining of the text.
collision(set_of_symbols)

Return True if two or more symbols from set_of_symbols can be accepted for the same text.

Parameter:set_of_symbols (set(b_Symbol)) – Set of symbols.
Returns:True if at least two symbols are in collision, otherwise False is returned.
Return type:boolean
compute_collision(compSymbol)

Compute collision between self and compSymbol.

Parameter:other (b_Symbol) – Other symbol.
Returns:Resolved collision - changes to the symbols and new ones, if they are created.
Return type:tuple(set(b_Symbol), set(b_Symbol), set(b_Symbol))
compute_double_stride(compSymbol, reverse, last, local_chars)

Compute double stride using self and compSymbol. This method should be called only by double_stride method.

Parameters:
  • compSymbol (b_Symbol) – Other symbol.
  • last (int) – Number of last chars. Usualy equal to current stride. Used only if self is final state.
  • reverse (boolean) – Determinates if self and compSymbol have to be swaped (usefull when original self in double_stride() method can’t compute the double stride).
  • local_chars (set(char)) – Set of local_chars. Set of all posible chars. used to have striding deterministic.
Returns:

New strided symbol.

Return type:

sym_kchar

compute_equal(other)

Compute if two symbols (self and other) are equivalent.

Parameter:other – Other symbol.
Returns:True if the symbols are equivalent, otherwise returns False.
Return type:boolean
export_symbol()

Returns symbol representation compatible with FSM tools - http://www2.research.att.com/~fsmtools/fsm/man4/fsm.5.html. The symbol is encoded in string without whitespace chars. First char is used for symbol class specification and therefore is not part of the encoded symbol. Symbol class specification char for this symbol class is defined by b_symbol.io_mapper[“b_Sym_char_class”].

Returns:Symbol representation compatible with FSM tools.
Return type:string
get_support_type()

Return supported types of symbols for current type of symbol.

Returns:Supported types of symbols for current type of symbol.
Return type:list(int)
get_text()

Return symbol description for graph representation.

Returns:Symbol description for graph representation.
Return type:string
import_symbol(text_repr, tid)

Creates symbol from its string representation compatible with FSM tools. See export method for more datails.

Parameters:
  • text_repr (string) – String representation.
  • tid (int) – Symbol identification number, must be unique.
is_empty()

Return True if symbol is empty. False if is not empty.

Returns:True if symbol is empty. False if is not empty.
Return type:boolean

The sym_char Module

class netbench.pattern_match.sym_char.b_Sym_char(new_text, char, new_id)

Bases: netbench.pattern_match.b_symbol.b_Symbol

A base class to represent a char symbol.

Parameters:
  • new_text (string) – Text description of symbol, should be human readeable.
  • new_id (int) – Symbol identification number, must be unique.
  • char (string of length 1) – Char symbol
accept(text)

If symbol is at the beginning of the text, is removed from the text and reminder is returned. Otherwise accept_exception is raised.

Parameter:text (string) – Text to be parsed.
Returns:Text without begining.
Return type:string
Rises:accept_exception if symbol is not at the begining of the text.
collision(set_of_symbols)

Return True if two or more symbols from set_of_symbols can be accepted for the same text.

Parameter:set_of_symbols (set(b_Symbol)) – Set of symbols.
Returns:True if at least two symbols are in collision, otherwise False is returned.
Return type:boolean
compute_collision(compSymbol)

Compute collision between self and compSymbol.

Parameter:other (b_Symbol) – Other symbol.
Returns:Resolved collision - changes to the symbols and new ones, if they are created.
Return type:tuple(set(b_Symbol), set(b_Symbol), set(b_Symbol))
compute_double_stride(compSymbol, reverse, last, local_chars)

Compute double stride using self and compSymbol. This method should be called only by double_stride method.

Parameters:
  • compSymbol (b_Symbol) – Other symbol.
  • last (int) – Number of last chars. Usualy equal to current stride. Used only if self is final state.
  • reverse (boolean) – Determinates if self and compSymbol have to be swaped (usefull when original self in double_stride() method can’t compute the double stride).
  • local_chars (list(set(char))) – List of set of local_chars. Set of all posible chars for each new subsymbol of final kchar (char + char = 2 strided kchar, list will have exactly 1 set present). used to have striding deterministic.
Returns:

New strided symbol and unused symbols from local chars.

Return type:

tuple(b_Symbol, list(set()))

compute_equal(other)

Compute if two symbols (self and other) are equivalent.

Parameter:other – Other symbol.
Returns:True if the symbols are equivalent, otherwise returns False.
Return type:boolean
export_symbol()

Returns symbol representation compatible with FSM tools - http://www2.research.att.com/~fsmtools/fsm/man4/fsm.5.html. The symbol is encoded in string without whitespace chars. First char is used for symbol class specification and therefore is not part of the encoded symbol. Symbol class specification char for this symbol class is defined by b_symbol.io_mapper[“b_Sym_char”].

Returns:Symbol representation compatible with FSM tools.
Return type:string
get_support_type()

Return supported types of symbols for current type of symbol.

Returns:Supported types of symbols for current type of symbol.
Return type:list(int)
import_symbol(text_repr, tid)

Creates symbol from its string representation compatible with FSM tools. See export method for more datails.

Parameters:
  • text_repr (string) – String representation.
  • tid (int) – Symbol identification number, must be unique.
is_empty()

Return True if symbol is empty. False if is not empty.

Returns:True if symbol is empty. False if is not empty.
Return type:boolean

The pattern_exceptions Module

This module groups all exceptions of Netbench - Pattern Match.

exception netbench.pattern_match.pattern_exceptions.ALPHABET_COLLISION_FREE_ERROR(msg='')

Bases: netbench.pattern_match.pattern_exceptions.general_pattern_exception

Class for raising ALPHABET_COLLISION_FREE_ERROR exception if something happen with “Alphabet collision free” flag of automaton. For example, does not exist or has unexpected value.

Parameter:msg (string) – Optional string describing the cause of the exception. Defaults to empty string.
exception netbench.pattern_match.pattern_exceptions.COMPUTE_ERROR(msg='')

Bases: netbench.pattern_match.pattern_exceptions.general_pattern_exception

Class for raising COMPUTE_ERROR exception if something happen with variable self._compute of automaton. For example, does not exist or has unexpected value.

Parameter:msg (string) – Optional string describing the cause of the exception. Defaults to empty string.
exception netbench.pattern_match.pattern_exceptions.DETERMINISTIC_ERROR(msg='')

Bases: netbench.pattern_match.pattern_exceptions.general_pattern_exception

Class for raising DETERMINISTIC_ERROR exception if something happen with “Deterministic” flag of automaton. For example, does not exist or has unexpected value.

Parameter:msg (string) – Optional string describing the cause of the exception. Defaults to empty string.
exception netbench.pattern_match.pattern_exceptions.MINIMAL_ERROR(msg='')

Bases: netbench.pattern_match.pattern_exceptions.general_pattern_exception

Class for raising MINIMAL_ERROR exception if something happen with “Minimal” flag of automaton. For example, does not exist or has unexpected value.

Parameter:msg (string) – Optional string describing the cause of the exception. Defaults to empty string.
exception netbench.pattern_match.pattern_exceptions.empty_automaton_exception

Bases: netbench.pattern_match.pattern_exceptions.general_pattern_exception

Class for raising exception if automaton is empty.

exception netbench.pattern_match.pattern_exceptions.general_not_implemented(msg='')

Bases: netbench.pattern_match.pattern_exceptions.general_pattern_exception

Class for raising general NOT IMPLEMENTED Exceptions. This exception should be used when some method, class, function or feature is defined but not implemented yet.

Parameter:msg (string) – Optional string describing the cause of the exception. Defaults to empty string.
exception netbench.pattern_match.pattern_exceptions.general_pattern_exception

Bases: exceptions.Exception

Base class for all exceptions defined in pattern match part of the Netbench.

exception netbench.pattern_match.pattern_exceptions.general_unsupported_type(method, name, obj, msg='')

Bases: netbench.pattern_match.pattern_exceptions.general_pattern_exception

Class for raising exception if unsuported data typy was pased to method.

Parameters:
  • method (string) – String with name of method which raised the exception.
  • name (string) – Name of object which caused the exception.
  • obj (Any type) – Object which caused the exception.
  • msg (string) – Optional string describing the cause of the exception. Defaults to empty string.
exception netbench.pattern_match.pattern_exceptions.nfa_data_import_exception(msg)

Bases: netbench.pattern_match.pattern_exceptions.general_pattern_exception

Class for raising symbol exception if the detected type of class in import string doesn’t corespond to any class of the symbol.

Parameter:msg (string) – String containing the detected type of class in import string.
exception netbench.pattern_match.pattern_exceptions.no_netbench_path_variable

Bases: netbench.pattern_match.pattern_exceptions.general_pattern_exception

Class for raising exception if environment variable NETBENCHPATH is not set.

exception netbench.pattern_match.pattern_exceptions.not_a_timbuk_file(msg)

Bases: netbench.pattern_match.pattern_exceptions.general_pattern_exception

Class for raising symbol exception if file is not a timbuk formated file.

Parameter:msg (string) – String containing the detected type of class in import string.
exception netbench.pattern_match.pattern_exceptions.not_epsilon_free_automaton_exception

Bases: netbench.pattern_match.pattern_exceptions.general_pattern_exception

Class for raising exception if automaton is empty.

exception netbench.pattern_match.pattern_exceptions.not_strided_exception

Bases: netbench.pattern_match.pattern_exceptions.general_pattern_exception

Class for raising exception if automaton is not strided.

exception netbench.pattern_match.pattern_exceptions.pcre_parser_failure

Bases: netbench.pattern_match.pattern_exceptions.general_pattern_exception

Class for raising exception if pcre parser C-based implementation can not be run or compiled. Version for PCRE parser.

exception netbench.pattern_match.pattern_exceptions.state_colour_operation_not_supported_exception(operation)

Bases: netbench.pattern_match.pattern_exceptions.general_pattern_exception

Class for raising state exception if unsupported join colour operation is required.

Parameter:operation (string) – Requested operation.
exception netbench.pattern_match.pattern_exceptions.state_id_collision(state_id)

Bases: netbench.pattern_match.pattern_exceptions.general_pattern_exception

Class for raising state exception if state id collision in automaton occured.

Parameter:state_id (int) – Id of the state, which caused the collision.
exception netbench.pattern_match.pattern_exceptions.state_join_exception(ftype, stype)

Bases: netbench.pattern_match.pattern_exceptions.general_pattern_exception

Class for raising state exception if join cannot be resolved. Neither state is able to compute join with other symbol.

Parameters:
  • ftype (int) – Class of the first state.
  • stype (int) – Class of the second state.
exception netbench.pattern_match.pattern_exceptions.symbol_accept_exception(msg='')

Bases: netbench.pattern_match.pattern_exceptions.general_pattern_exception

Class for raising symbol acception Exceptions.

Parameter:msg (string) – Optional string describing the cause of the exception. Defaults to empty string.
exception netbench.pattern_match.pattern_exceptions.symbol_double_stride_exception(ftype, stype)

Bases: netbench.pattern_match.pattern_exceptions.general_pattern_exception

Class for raising symbol exception if double stride cannot be resolved. Neither symbol is able to compute double stride with other symbol.

Parameters:
  • ftype (int) – Class of the first symbol.
  • stype (int) – Class of the second symbol.
exception netbench.pattern_match.pattern_exceptions.symbol_equality_exception(ftype, stype)

Bases: netbench.pattern_match.pattern_exceptions.general_pattern_exception

Class for raising symbol exception if symbol equality cannot be resolved. Neither symbol is able to resolve equality with other symbol.

Parameters:
  • ftype (int) – Class of the first symbol.
  • stype (int) – Class of the second symbol.
exception netbench.pattern_match.pattern_exceptions.symbol_id_collision(symbol_id)

Bases: netbench.pattern_match.pattern_exceptions.general_pattern_exception

Class for raising symbol exception if symbol id collision in automaton occured.

Parameter:symbol_id (int) – Id of the symbol, which caused the collision.
exception netbench.pattern_match.pattern_exceptions.symbol_import_exception(msg='')

Bases: netbench.pattern_match.pattern_exceptions.general_pattern_exception

Class for raising symbol exception if the detected type of class in import string doesn’t corespond to the class of the symbol.

Parameter:msg (string) – Optional string describing the cause of the exception. Defaults to empty string.
exception netbench.pattern_match.pattern_exceptions.symbol_not_found(msg)

Bases: netbench.pattern_match.pattern_exceptions.general_pattern_exception

Class for raising symbol exception if symbol have not been found in alphabet during retrival of its alphabet id.

Parameter:msg (string) – String containing string description of the symbol.
exception netbench.pattern_match.pattern_exceptions.symbol_resolve_collision_exception(ftype, stype)

Bases: netbench.pattern_match.pattern_exceptions.general_pattern_exception

Class for raising symbol exception if collision cannot be resolved. Neither symbol is able to resolve collision with other symbol.

Parameters:
  • ftype (int) – Class of the first symbol.
  • stype (int) – Class of the second symbol.
exception netbench.pattern_match.pattern_exceptions.symbol_string_to_short(msg='')

Bases: netbench.pattern_match.pattern_exceptions.general_pattern_exception

Class for raising symbol exception if the passed string to accept method is shorter than length of the symbol.

Parameter:msg (string) – Optional string describing the cause of the exception. Defaults to empty string.
exception netbench.pattern_match.pattern_exceptions.unknown_parser(msg='')

Bases: netbench.pattern_match.pattern_exceptions.general_pattern_exception

Class for raising unknown parser exception. This exception is used when unknown parser class name is passed to parser mataclass for wrapping any parser under single interface.

Parameter:msg (string) – Name of the class that caused this exception.

The coloured_state Module

class netbench.pattern_match.coloured_state.ColouredState(mid, rnum, colours, join_method='union')

Bases: netbench.pattern_match.b_state.b_State

State class with asociated colours.

Parameters:
  • mid (int) – State unique identification number.
  • rnum (set(int)) – Set of regular expression numbres. If set is empty the state is not final. The set of regular expression numbers identify which RE are matched in particular final state.
  • colours (set(int)) – State colours.
  • join_method (string) – Method of merging colours during join operation. Supported values: union, intersection, difference and symmetric_difference. NOTE: In automaton should be used only one method for all coloured states, as colour operation is determinated only by first state in join operation.
compute_join(other)

Computes join of two states. Note that new state ID must be set after. Default value is -2.

Parameter:other (b_State) – Second state.
Returns:Joined state.
Return type:ColouredState
Raises:state_colour_operation_not_supported_exception if unsupported join method is set.
get_colours()

Returns colours of state.

Returns:Colours of state.
Return type:set(int)
get_join_method()

Returns join method of state.

Returns:Join method of state.
Return type:string
get_text()

Returns text description for graph representation.

Returns:Text description of state
Return type:string
set_colours(colours)

Sets colours of state.

Parameter:colours (set(int)) – New colours of state.
set_join_method(join_method)

Sets join method of state.

Parameter:join_method (string) – New join method of state.

The b_ptrn_match Module

class netbench.pattern_match.b_ptrn_match.b_ptrn_match

Base class for the pattern_match experiments. It defines only basic functions.

report_logic()

Reports amount of logic consumed by the algorithm. The meaning of this value can differs depending on the approach (number of cores in multicore versus number of LUTs in FPGA. For specific information check the report function of the child.

Returns:Amount of logic consumed by the algorithm.
Return type:Depends on the algorithm.
report_memory()

Reports amount of the memory consumed by this algorithm. The returned number is in the bytes.

Returns:Amount of the memory consumed/
Return type:int
search(input_string)

This function will find patterns in the given string by the specified approach.

Parameter:input_string (string) – String in which will be the patterns found.
Returns:Bitmap of matched patterns. Match is indicated by 1, mismatch by 0. Number of fields in this array is equal to the count of patterns.
Return type:list(int)

The sym_eof Module

class netbench.pattern_match.sym_eof.b_Sym_EOF(new_text, new_id)

Bases: netbench.pattern_match.b_symbol.b_Symbol

A base class to represent a EOF (end of file or input) symbol.

Parameters:
  • new_text (string) – Text description of symbol, should be human readeable.
  • new_id (int) – Symbol identification number, must be unique.
accept(text)

If symbol is at the beginning of the text, is removed from the text and reminder is returned. Otherwise accept_exception is raised.

Parameter:text (string) – Text to be parsed.
Returns:Text without begining.
Return type:string
Rises:accept_exception if symbol is not at the begining of the text.
collision(set_of_symbols)

Return True if two or more symbols from set_of_symbols can be accepted for the same text.

Parameter:set_of_symbols (set(b_Symbol)) – Set of symbols.
Returns:True if at least two symbols are in collision, otherwise False is returned.
Return type:boolean
compute_collision(compSymbol)

Compute collision between self and compSymbol.

Parameter:other (b_Symbol) – Other symbol.
Returns:Resolved collision - changes to the symbols and new ones, if they are created.
Return type:tuple(set(b_Symbol), set(b_Symbol), set(b_Symbol))
compute_double_stride(compSymbol, reverse, last, local_chars)

Compute double stride using self and compSymbol. This method should be called only by double_stride method.

Parameters:
  • compSymbol (b_Symbol) – Other symbol.
  • last (int) – Number of last chars. Usualy equal to current stride. Used only if self is final state.
  • reverse (boolean) – Determinates if self and compSymbol have to be swaped (usefull when original self in double_stride() method can’t compute the double stride).
  • local_chars (list(set(char))) – List of set of local_chars. Set of all posible chars for each new subsymbol of final kchar (char + char = 2 strided kchar, list will have exactly 1 set present). used to have striding deterministic.
Returns:

New strided symbol and unused symbols from local chars.

Return type:

tuple(b_Symbol, list(set()))

compute_equal(other)

Compute if two symbols (self and other) are equivalent.

Parameter:other – Other symbol.
Returns:True if the symbols are equivalent, otherwise returns False.
Return type:boolean
export_symbol()

Returns symbol representation compatible with FSM tools - http://www2.research.att.com/~fsmtools/fsm/man4/fsm.5.html. The symbol is encoded in string without whitespace chars. First char is used for symbol class specification and therefore is not part of the encoded symbol. Symbol class specification char for this symbol class is defined by b_symbol.io_mapper[“b_Sym_char”].

Returns:Symbol representation compatible with FSM tools.
Return type:string
get_support_type()

Return supported types of symbols for current type of symbol.

Returns:Supported types of symbols for current type of symbol.
Return type:list(int)
import_symbol(text_repr, tid)

Creates symbol from its string representation compatible with FSM tools. See export method for more datails.

Parameters:
  • text_repr (string) – String representation.
  • tid (int) – Symbol identification number, must be unique.
is_empty()

Return True if symbol is empty. False if is not empty.

Returns:True if symbol is empty. False if is not empty.
Return type:boolean

The b_automaton Module

class netbench.pattern_match.b_automaton.b_Automaton

Bases: netbench.pattern_match.b_ptrn_match.b_ptrn_match

A base class for the pattern matching approaches based on finite automaton.

compute()
Base function for special automaton models. Inherited class should implement this method.
create_by_parser(nfa_parser_class)

This function is used to create automaton from set of regular expressions.

Parameter:nfa_parser_class (nfa_parser) – An instation of nfa_parser class.
Returns:False if creation of automaton failed or True if creation was successful.
Return type:boolean

This method sets _compute to False, and get_compute() will return False until compute() is called.

create_char_classes()

Creates char classes, if they can be created. Replaced transitions are removed. Unused symbols are removed after creation of char classes.

Flags:Sets Alphabet collision free flag to False

This method sets _compute to False, and get_compute() will return False until compute() is called.

create_from_nfa_data(nfa, safe=True)

Create automaton from nfa_data object.

Parameters:
  • nfa (nfa_data) – nfa_data object, from which automaton is created.
  • safe (boolean) – If True perform deep copy, otherwise set reference. Default value is True.

This method sets _compute to False, and get_compute() will return False until compute() is called.

epsilon_closure(state, StateOutSymbols)

Compute epsilon closure for selected state.

Parameters:
  • state (int) – State from which epsilon closure is computed.
  • StateOutSymbols (dict(int, tuple(int, int))) – Mapping between states and their transitions.
Returns:

Set containing epsilon closure for state.

Return type:

set(int)

get_alpha_num()
Number of symbols in alphabet in nfa_data object
get_automaton(safe=True)

Return nfa_data object from the automaton.

Parameter:safe (boolean) – If True return deep copy, otherwise return reference. Default value is True.
Returns:nfa_data object from the automaton.
Return type:nfa_data

Warning: If safe=False is used, the result should be used as read-only (Only reference on self._automaton object is returned). Otherwise it can cause undefined behavior!

get_compute()
Return value of variable _compute. Which mean flag call to compute()
get_flag(flag)

Returns values of flag. If flag is not in dict() of flags, dict() exception is thrown (use has_flag() before every get_flag()).

Parameter:flag (string) – String key (for example “Deterministic”)
Returns:Value of flag.
Return type:Type depends on the flag
get_multilanguage()

Returns value of _multilanguage attribute. Determinates behavior of automata minimisation and reduction methods. If set to True, methods preserves relation between final states and coresponding regular expresions (languages). If set to False, methods works exactly by their definitions - final states are joined if possible.

Returns:Value of the _multilanguage attribute.
Return type:boolean
get_set_of_nondeterministic_states(mapper=None)

Computes set of nondeterministic states.

Parameter:mapper (dict(int, dict(int, set(int)))) – Mapping between states and their transitions. If set to None the mapping is computed.
Returns:Nondeterministic states.
Return type:set(int)
get_state_num()
Number of states of nfa_data object
get_trans_num()
Number of transitions of nfa_data object
has_cycle()

Detects if cycle exists in automaton.

Returns:True if nfa_data contain cycle. False otherwise.
Return type:boolean
has_flag(flag)

Returns existence of flag, not value.

Parameter:flag (string) – String key (for example “Deterministic”)
Returns:True if flag exists. Otherwise returns False.
Return type:boolean
join(nfa, modify_reg_exp_num=False)

Join two automata. Joins current automaton with second. Current automaton is modified.

Parameters:
  • nfa (nfa_data) – nfa_data object which contains second automaton.
  • modify_reg_exp_num (boolean) – if set to True, regular expression numbers in final states are modified to not collide with current automaton numbers. The formula is NEW_NUMBER = MAX_CURRENT_NUMBER + OLD_NUMBER + 1. If set to False, regular expression numbers in final states are uneffected.
Flags:

Sets flags Deterministic and Epsilon Free to False.

This method sets _compute to False, and get_compute() will return False until compute() is called.

reduce_alphabet()

Reduce alphabet by character classes. This create another type of char class compatibil with DFA. Alphabet should contain only symbols of b_sym_char class. This function reduces the alphabet into its equivalence class. Creates minimal deterministic alphabet.

Flags:Sets Alphabet collision free flag to True

This method sets _compute to False, and get_compute() will return False until compute() is called.

removeCharClasses()

Remove char classes from automaton. Removed char classes are substituted with equivalent chars and coresponding transitions are added.

NOTE: This method is deprecated. Use remove_char_classes().

remove_char_classes()

Remove char classes from automaton. Removed char classes are substituted with equivalent chars and coresponding transitions are added.

Flags:Sets Alphabet collision free flag to True

This method sets _compute to False, and get_compute() will return False until compute() is called.

remove_epsilons()

Remove all epsilon transitions from automaton. Also removes all isolated, unreachable and blind states.

Flags:Sets flag Epsilon Free to True.

This method sets _compute to False, and get_compute() will return False until compute() is called.

remove_unreachable()

Removes isolated, unreachable and blind states.

This method sets _compute to False, and get_compute() will return False until compute() is called.

report_memory_naive()

Report consumed memory in bytes. Naive mapping algorithm is used (2D array).

Returns:Returns number of bytes.
Return type:int
report_memory_optimal()

Report consumed memory in bytes. Optimal mapping algorithm is used (with oracle).

Returns:Returns number of bytes.
Return type:int
resolve_alphabet()

Alphabet collision free. Example: Alphabet {1:[“a”, “b”, “c”], 2: [“a”, “d”]}.

After resolve_alphabet() will be Alphabet {1:[“b”, “c”], 2:[“d”], 3:[“a]}.

This method sets _compute to False, and get_compute() will return False until compute() is called.

search(input_string)

This function will find patterns in the given string by the specified approach. This default version uses nfa_data for it. Approaches should reimplement this function.

Parameters:
  • input_string – Input string.
  • input_string – string
Returns:

Bitmap of matched regular expressions.

Return type:

list(int)

set_flag(flag, value)

Sets flag to the value.

Parameters:
  • flag (string) – String key (for example “Deterministic”)
  • value (Any type) – Value of the key
set_multilanguage(value)

Sets value of _multilanguage attribute. Determinates behavior of automata minimisation and reduction methods. If set to True, methods preserves relation between final states and coresponding regular expresions (languages). If set to False, methods works exactly by their definitions - final states are joined if possible.

Parameter:value (boolean) – New value of the _multilanguage attribute.
show(fileName, sizeStr=' size="8, 5"n')

Save graphviz dot file, representing graph of automaton.

Parameters:
  • fileName (string) – Name of file into which nfa_data graphical representation will be saved.
  • sizeStr (string) – Size of resulting image. Set to ” ” to set unbounded dimensions. Format of this parameter is / size=”x,y”n/ to set width to x inches and height to y inches.
Returns:

True if success, False otherwise.

Return type:

boolean

stride_2(all_chars=None)

Transform automaton to 2-stride automaton. This new automaton accept 2 chars per transition. If automaton is already strided, then 2*stride automaton will be created. In this case the automaton accept 2*stride chars per cycle. Removes Deterministic Flag. Input automaton must be eps free.

Parameter:all_chars (set(char)) – set of all chars in alphabet. If Not set, it defaults to ASCII 0 - 255.
Flags:Sets Strided Flag to True and set Stride Flag to current stride.

This method sets _compute to False, and get_compute() will return False until compute() is called.

This algorithm is based on article: Brodie et al. A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching, http://dx.doi.org/10.1109/ISCA.2006.7

The b_nfa Module

class netbench.pattern_match.b_nfa.b_nfa

Bases: netbench.pattern_match.b_automaton.b_Automaton

A base class for NFA automata.

report_memory_naive()

Report consumed memory in bytes. Naive mapping algorithm is used (2D array). Basic algorithm for this variant of mapping is: M = |states| * |alphabet| * ceil(log(|states| + 1, 2) / 8) + cnt * ceil(log(|states| + 1, 2) / 8) where cnt is number of nondeterministic transitions.

Returns:Returns number of bytes.
Return type:int
report_memory_optimal()

Report consumed memory in bytes. Optimal mapping algorithm is used (with oracle). Basic algorithm for this variant of mapping is: M = |transitions| * ceil(log(|states|, 2)+1 / 8)

Returns:Returns number of bytes.
Return type:int