rmgpy.data.kinetics.KineticsFamily

class rmgpy.data.kinetics.KineticsFamily(entries=None, top=None, label='', name='', reverse='', reversible=True, short_desc='', long_desc='', forward_template=None, forward_recipe=None, reverse_template=None, reverse_recipe=None, forbidden=None, boundary_atoms=None, tree_distances=None, save_order=False)

A class for working with an RMG kinetics family: a set of reactions with similar chemistry, and therefore similar reaction rates. The attributes are:

Attribute

Type

Description

reverse

string

The name of the reverse reaction family

reversible

Boolean

Is family reversible? (True by default)

forward_template

Reaction

The forward reaction template

forward_recipe

ReactionRecipe

The steps to take when applying the forward reaction to a set of reactants

reverse_template

Reaction

The reverse reaction template

reverse_recipe

ReactionRecipe

The steps to take when applying the reverse reaction to a set of reactants

forbidden

ForbiddenStructures

(Optional) Forbidden product structures in either direction

own_reverse

Boolean

It’s its own reverse?

‘boundary_atoms’

list

Labels which define the boundaries of end groups in backbone/end families

tree_distances

dict

The default distance from parent along each tree, if not set default is 1 for every tree

‘save_order’

Boolean

Whether to preserve atom order when manipulating structures.

groups

KineticsGroups

The set of kinetics group additivity values

rules

KineticsRules

The set of kinetics rate rules from RMG-Java

depositories

list

A set of additional depositories used to store kinetics data from various sources

There are a few reaction families that are their own reverse (hydrogen abstraction and intramolecular hydrogen migration); for these reverseTemplate and reverseRecipe will both be None.

add_atom_labels_for_reaction(reaction, output_with_resonance=True, save_order=False, relabel_atoms=False)

Apply atom labels on a reaction using the appropriate atom labels from this reaction family.

The reaction is modified in place containing species objects with the atoms labeled. If output_with_resonance is True, all resonance structures are generated with labels. If false, only the first resonance structure successfully able to map to the reaction is used. None is returned. If save_order is True the atom order is reset after performing atom isomorphism. If relabel_atoms is True, product atom labels of reversible families will be reversed to assist in identifying forbidden structures.

add_entry(parent, grp, name)

Adds a group entry with parent parent group structure grp and group name name

add_reverse_attribute(rxn, react_non_reactive=True)

For rxn (with species’ objects) from families with ownReverse, this method adds a reverse attribute that contains the reverse reaction information (like degeneracy)

Returns True if successful and False if the reverse reaction is forbidden. Will raise a KineticsError if unsuccessful for other reasons.

add_rules_from_training(thermo_database=None, train_indices=None)

For each reaction involving real reactants and products in the training set, add a rate rule for that reaction.

ancestors(node)

Returns all the ancestors of a node, climbing up the tree to the top.

apply_recipe(reactant_structures, forward=True, unique=True, relabel_atoms=True)

Apply the recipe for this reaction family to the list of Molecule or Group objects reactant_structures. The atoms of the reactant structures must already be tagged with the appropriate labels. Returns a list of structures corresponding to the products after checking that the correct number of products was produced. If relabel_atoms is True, product atom labels of reversible families will be reversed to assist in identifying forbidden structures.

are_siblings(node, node_other)

Return True if node and node_other have the same parent node. Otherwise, return False. Both node and node_other must be Entry types with items containing Group or LogicNode types.

calculate_degeneracy(reaction, resonance=True)

For a reaction with Molecule or Species objects given in the direction in which the kinetics are defined, compute the reaction-path degeneracy. Can specify whether to consider resonance.

This method by default adjusts for double counting of identical reactants. This should only be adjusted once per reaction. To not adjust for identical reactants (since you will be reducing them later in the algorithm), add ignoreSameReactants= True to this method.

clean_tree_groups()

clears groups and rules in the tree, generates an appropriate root group to start from and then reads training reactions Note this only works if a single top node (not a logic node) can be generated

cross_validate(folds=5, template_rxn_map=None, test_rxn_inds=None, T=1000.0, iters=0, random_state=1)

Perform K-fold cross validation on an automatically generated tree at temperature T after finding an appropriate node for kinetics estimation it will move up the tree iters times. Returns a dictionary mapping {rxn:Ln(k_Est/k_Train)}

cross_validate_old(folds=5, T=1000.0, random_state=1, estimator='rate rules', thermo_database=None, get_reverse=False, uncertainties=True)

Perform K-fold cross validation on an automatically generated tree at temperature T Returns a dictionary mapping {rxn:Ln(k_Est/k_Train)}

descend_tree(structure, atoms, root=None, strict=False)

Descend the tree in search of the functional group node that best matches the local structure around atoms in structure.

If root=None then uses the first matching top node.

Returns None if there is no matching root.

Set strict to True if all labels in final matched node must match that of the structure. This is used in kinetics groups to find the correct reaction template, but not generally used in other GAVs due to species generally not being prelabeled.

descendants(node)

Returns all the descendants of a node, climbing down the tree to the bottom.

distribute_tree_distances()

fills in nodal_distance (the distance between an entry and its parent) if not already entered with the value from tree_distances associated with the tree the entry comes from

estimate_kinetics_using_rate_rules(template, degeneracy=1)

Determine the appropriate kinetics for a reaction with the given template using rate rules.

Returns a tuple (kinetics, entry) where entry is the database entry used to determine the kinetics only if it is an exact match, and is None if some averaging or use of a parent node took place.

eval_ext(parent, ext, extname, template_rxn_map, obj=None, T=1000.0)

evaluates the objective function obj for the extension ext with name extname to the parent entry parent

extend_node(parent, template_rxn_map, obj=None, T=1000.0, iter_max=inf, iter_item_cap=inf)

Constructs an extension to the group parent based on evaluation of the objective function obj

extract_source_from_comments(reaction)

Returns the rate rule associated with the kinetics of a reaction by parsing the comments. Will return the template associated with the matched rate rule. Returns a tuple containing (Boolean_Is_Kinetics_From_Training_reaction, Source_Data)

For a training reaction, the Source_Data returns:

[Family_Label, Training_Reaction_Entry, Kinetics_In_Reverse?]

For a reaction from rate rules, the Source_Data is a tuple containing:

[Family_Label, {'template': originalTemplate,
                'degeneracy': degeneracy,
                'exact': boolean_exact?,
                'rules': a list of (original rate rule entry, weight in average)
                'training': a list of (original rate rule entry associated with training entry, original training entry, weight in average)}]

where Exact is a boolean of whether the rate is an exact match, Template is the reaction template used, RateRules is a list of the rate rule entries containing the kinetics used, and TrainingReactions are ones that have created rules used in the estimate.

fill_rules_by_averaging_up(verbose=False)

Fill in gaps in the kinetics rate rules by averaging child nodes recursively starting from the top level root template.

generate_old_tree(entries, level)

Generate a multi-line string representation of the current tree using the old-style syntax.

generate_product_template(reactants0)

Generate the product structures by applying the reaction template to the top-level nodes. For reactants defined by multiple structures, only the first is used here; it is assumed to be the most generic.

generate_reactions(reactants, products=None, prod_resonance=True, delete_labels=True, relabel_atoms=True)

Generate all reactions between the provided list of one, two, or three reactants, which should be either single Molecule objects or lists of same. Does not estimate the kinetics of these reactions at this time. Returns a list of TemplateReaction objects using Molecule objects for both reactants and products The reactions are constructed such that the forward direction is consistent with the template of this reaction family.

Parameters:
  • reactants (list) – List of Molecules to react.

  • products (list, optional) – List of Molecules or Species of desired product structures.

  • prod_resonance (bool, optional) – Flag to generate resonance structures for product checking. Defaults to True, resonance structures are compared.

  • delete_labels (bool, optional) – Delete the labeled atoms from each generated reaction (optional). Default is True, atom labels are deleted.

  • relabel_atoms (bool, optional) – Default is True, atoms are re-labeled.

Returns:

List of all reactions containing Molecule objects with the specified reactants and products within this family. Degenerate reactions are returned as separate reactions.

generate_tree(rxns=None, obj=None, thermo_database=None, T=1000.0, nprocs=1, min_splitable_entry_num=2, min_rxns_to_spawn=20, max_batch_size=800, outlier_fraction=0.02, stratum_num=8, new_fraction_threshold_to_reopt_node=0.25, extension_iter_max=inf, extension_iter_item_cap=inf)

Generate a tree by greedy optimization based on the objective function obj the optimization is done by iterating through every group and if the group has more than one training reaction associated with it a set of potential more specific extensions are generated and the extension that optimizing the objective function combination is chosen and the iteration starts over at the beginning

additionally the tree structure is simplified on the fly by removing groups that have no kinetics data associated if their parent has no kinetics data associated and they either have only one child or have two children one of which has no kinetics data and no children (its parent becomes the parent of its only relevant child node)

Parameters:
  • rxns – List of reactions to generate tree from (if None pull the whole training set)

  • obj – Object to expand tree from (if None uses top node)

  • thermo_database – Thermodynamic database used for reversing training reactions

  • T – Temperature the tree is optimized for

  • nprocs – Number of process for parallel tree generation

  • min_splitable_entry_num – the minimum number of splitable reactions at a node in order to spawn a new process solving that node

  • min_rxns_to_spawn – the minimum number of reactions at a node to spawn a new process solving that node

  • max_batch_size – the maximum number of reactions allowed in a batch, most batches will be this size the last will be smaller, if the # of reactions < max_batch_size the cascade algorithm is not used

  • outlier_fraction – Fraction of reactions that are fastest/slowest and will be automatically included in the first batch

  • stratum_num – Number of strata used in stratified sampling scheme

  • max_rxns_to_reopt_node – Nodes with more matching reactions than this will not be pruned

get_backbone_roots()

Returns: the top level backbone node in a unimolecular family.

get_end_roots()

Returns: A list of top level end nodes in a unimolecular family

get_entries_to_save()

Return a sorted list of the entries in this database that should be saved to the output file.

Then renumber the entry indexes so that we never have any duplicate indexes.

get_extension_edge(parent, template_rxn_map, obj, T, iter_max=inf, iter_item_cap=inf)

finds the set of all extension groups to parent such that 1) the extension group divides the set of reactions under parent 2) No generalization of the extension group divides the set of reactions under parent

We find this by generating all possible extensions of the initial group. Extensions that split reactions are added to the list. All extensions that do not split reactions and do not create bonds are ignored (although those that match every reaction are labeled so we don’t search them twice). Those that match all reactions and involve bond creation undergo this process again.

Principle: Say you have two elementary changes to a group ext1 and ext2 if applying ext1 and ext2 results in a split at least one of ext1 and ext2 must result in a split

Speed of this algorithm relies heavily on searching non bond creation dimensions once.

get_kinetics(reaction, template_labels, degeneracy=1, estimator='', return_all_kinetics=True)

Return the kinetics for the given reaction by searching the various depositories as well as generating a result using the user-specified estimator. Currently, only ‘rate rules’ is a supported estimator. Unlike the regular get_kinetics() method, this returns a list of results, with each result comprising of

  1. the kinetics

  2. the source - this will be None if from a template estimate

  3. the entry - this will be None if from a template estimate

  4. is_forward a boolean denoting whether the matched entry is in the same direction as the inputted reaction. This will always be True if using rates rules. This can be True or False if using a depository

If return_all_kinetics==False, only the first (best?) matching kinetics is returned.

get_kinetics_for_template(template, degeneracy=1, method='rate rules')

Return an estimate of the kinetics for a reaction with the given template and reaction-path degeneracy. There is currently only one method to use: ‘rate rules’ (old RMG-Java behavior, and default RMG-Py behavior). Group additivity was removed in August 2023.

Returns a tuple (kinetics, entry): If it’s estimated via ‘rate rules’ and an exact match is found in the tree, then the entry is returned as the second element of the tuple. But if an average is used, then the tuple returned is (kinetics, None).

get_kinetics_from_depository(depository, reaction, template, degeneracy)

Search the given depository in this kinetics family for kinetics for the given reaction. Returns a list of all of the matching kinetics, the corresponding entries, and True if the kinetics match the forward direction or False if they match the reverse direction.

get_labeled_reactants_and_products(reactants, products, relabel_atoms=True)

Given reactants, a list of Molecule objects, and products, a list of Molecule objects, return two new lists of Molecule objects with atoms labeled: one for reactants, one for products. Returned molecules are totally new entities in memory so input molecules reactants and products won’t be affected. If RMG cannot find appropriate labels, (None, None) will be returned. If relabel_atoms is True, product atom labels of reversible families will be reversed to assist in identifying forbidden structures.

get_rate_rule(template)

Return the rate rule with the given template. Raises a ValueError if no corresponding entry exists.

get_reaction_matches(rxns=None, thermo_database=None, remove_degeneracy=False, estimate_thermo=True, fix_labels=False, exact_matches_only=False, get_reverse=False, rxns_with_kinetics_only=False)

returns a dictionary mapping for each entry in the tree: (entry.label,entry.item) : list of all training reactions (or the list given) that match that entry

get_reaction_pairs(reaction)

For a given reaction with properly-labeled Molecule objects as the reactants, return the reactant-product pairs to use when performing flux analysis.

get_reaction_template(reaction)

For a given reaction with properly-labeled Molecule objects as the reactants, determine the most specific nodes in the tree that describe the reaction.

get_reaction_template_labels(reaction)

Retrieve the template for the reaction and return the corresponding labels for each of the groups in the template.

get_root_template()

Return the root template for the reaction family. Most of the time this is the top-level nodes of the tree (as stored in the KineticsGroups object), but there are a few exceptions (e.g. R_Recombination).

get_rxn_batches(rxns, T=1000.0, max_batch_size=800, outlier_fraction=0.02, stratum_num=8)

Breaks reactions into batches based on a modified stratified sampling scheme Effectively: The top and bottom outlier_fraction of all reactions are always included in the first batch The remaining reactions are ordered by the rate coefficients at T The list of reactions is then split into stratum_num similarly sized intervals batches sample equally from each interval, but randomly within each interval until they reach max_batch_size reactions A list of lists of reactions containing the batches is returned

get_sources_for_template(template)

Returns the set of rate rules and training reactions used to average this template. Note that the tree must be averaged with verbose=True for this to work.

Returns a tuple of rules, training

where rules are a list of tuples containing the [(original_entry, weight_used_in_average), … ]

and training is a list of tuples containing the [(rate_rule_entry, training_reaction_entry, weight_used_in_average),…]

get_species(path, resonance=True)

Load the dictionary containing all of the species in a kinetics library or depository.

get_top_level_groups(root)

Returns a list of group nodes that are the highest in the tree starting at node “root”. If “root” is a group node, then it will return a single-element list with “root”. Otherwise, for every child of root, we descend until we find no nodes with logic nodes. We then return a list of all group nodes found along the way.

get_training_depository()

Returns the training depository from self.depositories

get_training_set(thermo_database=None, remove_degeneracy=False, estimate_thermo=True, fix_labels=False, get_reverse=False, rxns_with_kinetics_only=False)

retrieves all reactions in the training set, assigns thermo to the species objects reverses reactions as necessary so that all reactions are in the forward direction and returns the resulting list of reactions in the forward direction with thermo assigned

has_rate_rule(template)

Return True if a rate rule with the given template currently exists, or False otherwise.

is_entry_match(mol, entry, resonance=True)

determines if the labeled molecule object of reactants matches the entry entry

is_molecule_forbidden(molecule)

Return True if the molecule is forbidden in this family, or False otherwise.

load(path, local_context=None, global_context=None, depository_labels=None)

Load a kinetics database from a file located at path on disk.

If depository_labels is a list, eg. [‘training’,’PrIMe’], then only those depositories are loaded, and they are searched in that order when generating kinetics.

If depository_labels is None then load ‘training’ first then everything else. If depository_labels is not None then load in the order specified in depository_labels.

load_forbidden(label, group, shortDesc='', longDesc='')

Load information about a forbidden structure. Note that argument names are retained for backward compatibility with loading database files.

load_old(dictstr, treestr, libstr, num_parameters, num_labels=1, pattern=True)

Load a dictionary-tree-library based database. The database is stored in three files: dictstr is the path to the dictionary, treestr to the tree, and libstr to the library. The tree is optional, and should be set to ‘’ if not desired.

load_old_dictionary(path, pattern)

Parse an old-style RMG database dictionary located at path. An RMG dictionary is a list of key-value pairs of a one-line string key and a multi-line string value. Each record is separated by at least one empty line. Returns a dict object with the values converted to Molecule or Group objects depending on the value of pattern.

load_old_library(path, num_parameters, num_labels=1)

Parse an RMG database library located at path.

load_old_tree(path)

Parse an old-style RMG database tree located at path. An RMG tree is an n-ary tree representing the hierarchy of items in the dictionary.

load_recipe(actions)

Load information about the reaction recipe.

load_template(reactants, products, ownReverse=False)

Load information about the reaction template. Note that argument names are retained for backward compatibility with loading database files.

make_tree(obj=None, regularization=<function KineticsFamily.simple_regularization>, thermo_database=None, T=1000.0)

generates tree structure and then generates rules for the tree

match_node_to_child(parent_node, child_node)

Return True if parent_node is a parent of child_node. Otherwise, return False. Both parent_node and child_node must be Entry types with items containing Group or LogicNode types. If parent_node and child_node are identical, the function will also return False.

match_node_to_node(node, node_other)

Return True if node and node_other are identical. Otherwise, return False. Both node and node_other must be Entry types with items containing Group or LogicNode types.

match_node_to_structure(node, structure, atoms, strict=False)

Return True if the structure centered at atom matches the structure at node in the dictionary. The structure at node should have atoms with the appropriate labels because they are set on loading and never change. However, the atoms in structure may not have the correct labels, hence the atoms parameter. The atoms parameter may include extra labels, and so we only require that every labeled atom in the functional group represented by node has an equivalent labeled atom in structure.

Matching to structure is more strict than to node. All labels in structure must be found in node. However the reverse is not true, unless strict is set to True.

Attribute

Description

node

Either an Entry or a key in the self.entries dictionary which has a Group or LogicNode as its Entry.item

structure

A Group or a Molecule

atoms

Dictionary of {label: atom} in the structure. A possible dictionary is the one produced by structure.get_all_labeled_atoms()

strict

If set to True, ensures that all the node’s atomLabels are matched by in the structure

parse_old_library(path, num_parameters, num_labels=1)

Parse an RMG database library located at path, returning the loaded entries (rather than storing them in the database). This method does not discard duplicate entries.

prune_tree(rxns, newrxns, thermo_database=None, new_fraction_threshold_to_reopt_node=0.25, fix_labels=True, exact_matches_only=True, get_reverse=True)

Remove nodes that have less than maxRxnToReoptNode reactions that match and clear the regularization dimensions of their parent This is used to remove smaller easier to optimize and more likely to change nodes before adding a new batch in cascade model generation

regularize(regularization=<function KineticsFamily.simple_regularization>, keep_root=True, thermo_database=None, template_rxn_map=None, rxns=None)

Regularizes the tree according to the regularization function regularization

remove_group(group_to_remove)

Removes a group that is in a tree from the database. In addition to deleting from self.entries, it must also update the parent/child relationships

Returns the removed group

retrieve_original_entry(template_label)

Retrieves the original entry, be it a rule or training reaction, given the template label in the form ‘group1;group2’ or ‘group1;group2;group3’

Returns tuple in the form (RateRuleEntry, TrainingReactionEntry)

Where the TrainingReactionEntry is only present if it comes from a training reaction

retrieve_template(template_labels)

Reconstruct the groups associated with the labels of the reaction template and return a list.

save(path)

Save the current database to the file at location path on disk.

save_depository(depository, path)

Save the given kinetics family depository to the location path on disk.

save_dictionary(path)

Extract species from all entries associated with a kinetics library or depository and save them to the path given.

save_entry(f, entry)

Write the given entry in the thermo database to the file object f.

save_generated_tree(path=None)

clears the rules and saves the family to its current location in database

save_groups(path)

Save the current database to the file at location path on disk.

save_old(dictstr, treestr, libstr)

Save the current database to a set of text files using the old-style syntax.

save_old_dictionary(path)

Save the current database dictionary to a text file using the old-style syntax.

save_old_library(path)

Save the current database library to a text file using the old-style syntax.

save_old_tree(path)

Save the current database tree to a text file using the old-style syntax.

save_training_reactions(reactions, reference=None, reference_type='', short_desc='', long_desc='', rank=3)

This function takes a list of reactions appends it to the training reactions file. It ignores the existence of duplicate reactions.

The rank for each new reaction’s kinetics is set to a default value of 3 unless the user specifies differently for those reactions.

For each entry, the long description is imported from the kinetics comment.

simple_regularization(node, template_rxn_map, test=True)

Simplest regularization algorithm All nodes are made as specific as their descendant reactions Training reactions are assumed to not generalize For example if an particular atom at a node is Oxygen for all of its descendent reactions a reaction where it is Sulfur will never hit that node unless it is the top node even if the tree did not split on the identity of that atom

The test option to this function determines whether or not the reactions under a node match the extended group before adding an extension. If the test fails the extension is skipped.

In general test=True is needed if the cascade algorithm was used to generate the tree and test=False is ok if the cascade algorithm wasn’t used.