rmgpy.data.kinetics.KineticsFamily¶
- class rmgpy.data.kinetics.KineticsFamily(entries=None, top=None, label='', name='', reverse='', reversible=True, short_desc='', long_desc='', forward_template=None, forward_recipe=None, reverse_template=None, reverse_recipe=None, forbidden=None, boundary_atoms=None, tree_distances=None, save_order=False)¶
A class for working with an RMG kinetics family: a set of reactions with similar chemistry, and therefore similar reaction rates. The attributes are:
Attribute
Type
Description
reverse
string
The name of the reverse reaction family
reversible
Boolean
Is family reversible? (True by default)
forward_template
Reaction
The forward reaction template
forward_recipe
The steps to take when applying the forward reaction to a set of reactants
reverse_template
Reaction
The reverse reaction template
reverse_recipe
The steps to take when applying the reverse reaction to a set of reactants
forbidden
ForbiddenStructures
(Optional) Forbidden product structures in either direction
own_reverse
Boolean
It’s its own reverse?
‘boundary_atoms’
list
Labels which define the boundaries of end groups in backbone/end families
tree_distances
dict
The default distance from parent along each tree, if not set default is 1 for every tree
‘save_order’
Boolean
Whether to preserve atom order when manipulating structures.
groups
The set of kinetics group additivity values
rules
The set of kinetics rate rules from RMG-Java
depositories
list
A set of additional depositories used to store kinetics data from various sources
There are a few reaction families that are their own reverse (hydrogen abstraction and intramolecular hydrogen migration); for these reverseTemplate and reverseRecipe will both be
None
.- add_atom_labels_for_reaction(reaction, output_with_resonance=True, save_order=False, relabel_atoms=False)¶
Apply atom labels on a reaction using the appropriate atom labels from this reaction family.
The reaction is modified in place containing species objects with the atoms labeled. If output_with_resonance is True, all resonance structures are generated with labels. If false, only the first resonance structure successfully able to map to the reaction is used. None is returned. If
save_order
isTrue
the atom order is reset after performing atom isomorphism. Ifrelabel_atoms
isTrue
, product atom labels of reversible families will be reversed to assist in identifying forbidden structures.
- add_entry(parent, grp, name)¶
Adds a group entry with parent parent group structure grp and group name name
- add_reverse_attribute(rxn, react_non_reactive=True)¶
For rxn (with species’ objects) from families with ownReverse, this method adds a reverse attribute that contains the reverse reaction information (like degeneracy)
Returns True if successful and False if the reverse reaction is forbidden. Will raise a KineticsError if unsuccessful for other reasons.
- add_rules_from_training(thermo_database=None, train_indices=None)¶
For each reaction involving real reactants and products in the training set, add a rate rule for that reaction.
- ancestors(node)¶
Returns all the ancestors of a node, climbing up the tree to the top.
- apply_recipe(reactant_structures, forward=True, unique=True, relabel_atoms=True)¶
Apply the recipe for this reaction family to the list of
Molecule
orGroup
objects reactant_structures. The atoms of the reactant structures must already be tagged with the appropriate labels. Returns a list of structures corresponding to the products after checking that the correct number of products was produced. Ifrelabel_atoms
isTrue
, product atom labels of reversible families will be reversed to assist in identifying forbidden structures.
- are_siblings(node, node_other)¶
Return True if node and node_other have the same parent node. Otherwise, return False. Both node and node_other must be Entry types with items containing Group or LogicNode types.
- calculate_degeneracy(reaction, resonance=True)¶
For a reaction with Molecule or Species objects given in the direction in which the kinetics are defined, compute the reaction-path degeneracy. Can specify whether to consider resonance.
This method by default adjusts for double counting of identical reactants. This should only be adjusted once per reaction. To not adjust for identical reactants (since you will be reducing them later in the algorithm), add ignoreSameReactants= True to this method.
- clean_tree_groups()¶
clears groups and rules in the tree, generates an appropriate root group to start from and then reads training reactions Note this only works if a single top node (not a logic node) can be generated
- cross_validate(folds=5, template_rxn_map=None, test_rxn_inds=None, T=1000.0, iters=0, random_state=1)¶
Perform K-fold cross validation on an automatically generated tree at temperature T after finding an appropriate node for kinetics estimation it will move up the tree iters times. Returns a dictionary mapping {rxn:Ln(k_Est/k_Train)}
- cross_validate_old(folds=5, T=1000.0, random_state=1, estimator='rate rules', thermo_database=None, get_reverse=False, uncertainties=True)¶
Perform K-fold cross validation on an automatically generated tree at temperature T Returns a dictionary mapping {rxn:Ln(k_Est/k_Train)}
- descend_tree(structure, atoms, root=None, strict=False)¶
Descend the tree in search of the functional group node that best matches the local structure around atoms in structure.
If root=None then uses the first matching top node.
Returns None if there is no matching root.
Set strict to
True
if all labels in final matched node must match that of the structure. This is used in kinetics groups to find the correct reaction template, but not generally used in other GAVs due to species generally not being prelabeled.
- descendants(node)¶
Returns all the descendants of a node, climbing down the tree to the bottom.
- distribute_tree_distances()¶
fills in nodal_distance (the distance between an entry and its parent) if not already entered with the value from tree_distances associated with the tree the entry comes from
- estimate_kinetics_using_rate_rules(template, degeneracy=1)¶
Determine the appropriate kinetics for a reaction with the given template using rate rules.
Returns a tuple (kinetics, entry) where entry is the database entry used to determine the kinetics only if it is an exact match, and is None if some averaging or use of a parent node took place.
- eval_ext(parent, ext, extname, template_rxn_map, obj=None, T=1000.0)¶
evaluates the objective function obj for the extension ext with name extname to the parent entry parent
- extend_node(parent, template_rxn_map, obj=None, T=1000.0, iter_max=inf, iter_item_cap=inf)¶
Constructs an extension to the group parent based on evaluation of the objective function obj
- extract_source_from_comments(reaction)¶
Returns the rate rule associated with the kinetics of a reaction by parsing the comments. Will return the template associated with the matched rate rule. Returns a tuple containing (Boolean_Is_Kinetics_From_Training_reaction, Source_Data)
For a training reaction, the Source_Data returns:
[Family_Label, Training_Reaction_Entry, Kinetics_In_Reverse?]
For a reaction from rate rules, the Source_Data is a tuple containing:
[Family_Label, {'template': originalTemplate, 'degeneracy': degeneracy, 'exact': boolean_exact?, 'rules': a list of (original rate rule entry, weight in average) 'training': a list of (original rate rule entry associated with training entry, original training entry, weight in average)}]
where Exact is a boolean of whether the rate is an exact match, Template is the reaction template used, RateRules is a list of the rate rule entries containing the kinetics used, and TrainingReactions are ones that have created rules used in the estimate.
- fill_rules_by_averaging_up(verbose=False)¶
Fill in gaps in the kinetics rate rules by averaging child nodes recursively starting from the top level root template.
- generate_old_tree(entries, level)¶
Generate a multi-line string representation of the current tree using the old-style syntax.
- generate_product_template(reactants0)¶
Generate the product structures by applying the reaction template to the top-level nodes. For reactants defined by multiple structures, only the first is used here; it is assumed to be the most generic.
- generate_reactions(reactants, products=None, prod_resonance=True, delete_labels=True, relabel_atoms=True)¶
Generate all reactions between the provided list of one, two, or three reactants, which should be either single
Molecule
objects or lists of same. Does not estimate the kinetics of these reactions at this time. Returns a list ofTemplateReaction
objects usingMolecule
objects for both reactants and products The reactions are constructed such that the forward direction is consistent with the template of this reaction family.- Parameters:
reactants (list) – List of Molecules to react.
products (list, optional) – List of Molecules or Species of desired product structures.
prod_resonance (bool, optional) – Flag to generate resonance structures for product checking. Defaults to
True
, resonance structures are compared.delete_labels (bool, optional) – Delete the labeled atoms from each generated reaction (optional). Default is
True
, atom labels are deleted.relabel_atoms (bool, optional) – Default is
True
, atoms are re-labeled.
- Returns:
List of all reactions containing Molecule objects with the specified reactants and products within this family. Degenerate reactions are returned as separate reactions.
- generate_tree(rxns=None, obj=None, thermo_database=None, T=1000.0, nprocs=1, min_splitable_entry_num=2, min_rxns_to_spawn=20, max_batch_size=800, outlier_fraction=0.02, stratum_num=8, new_fraction_threshold_to_reopt_node=0.25, extension_iter_max=inf, extension_iter_item_cap=inf)¶
Generate a tree by greedy optimization based on the objective function obj the optimization is done by iterating through every group and if the group has more than one training reaction associated with it a set of potential more specific extensions are generated and the extension that optimizing the objective function combination is chosen and the iteration starts over at the beginning
additionally the tree structure is simplified on the fly by removing groups that have no kinetics data associated if their parent has no kinetics data associated and they either have only one child or have two children one of which has no kinetics data and no children (its parent becomes the parent of its only relevant child node)
- Parameters:
rxns – List of reactions to generate tree from (if None pull the whole training set)
obj – Object to expand tree from (if None uses top node)
thermo_database – Thermodynamic database used for reversing training reactions
T – Temperature the tree is optimized for
nprocs – Number of process for parallel tree generation
min_splitable_entry_num – the minimum number of splitable reactions at a node in order to spawn a new process solving that node
min_rxns_to_spawn – the minimum number of reactions at a node to spawn a new process solving that node
max_batch_size – the maximum number of reactions allowed in a batch, most batches will be this size the last will be smaller, if the # of reactions < max_batch_size the cascade algorithm is not used
outlier_fraction – Fraction of reactions that are fastest/slowest and will be automatically included in the first batch
stratum_num – Number of strata used in stratified sampling scheme
max_rxns_to_reopt_node – Nodes with more matching reactions than this will not be pruned
- get_backbone_roots()¶
Returns: the top level backbone node in a unimolecular family.
- get_end_roots()¶
Returns: A list of top level end nodes in a unimolecular family
- get_entries_to_save()¶
Return a sorted list of the entries in this database that should be saved to the output file.
Then renumber the entry indexes so that we never have any duplicate indexes.
- get_extension_edge(parent, template_rxn_map, obj, T, iter_max=inf, iter_item_cap=inf)¶
finds the set of all extension groups to parent such that 1) the extension group divides the set of reactions under parent 2) No generalization of the extension group divides the set of reactions under parent
We find this by generating all possible extensions of the initial group. Extensions that split reactions are added to the list. All extensions that do not split reactions and do not create bonds are ignored (although those that match every reaction are labeled so we don’t search them twice). Those that match all reactions and involve bond creation undergo this process again.
Principle: Say you have two elementary changes to a group ext1 and ext2 if applying ext1 and ext2 results in a split at least one of ext1 and ext2 must result in a split
Speed of this algorithm relies heavily on searching non bond creation dimensions once.
- get_kinetics(reaction, template_labels, degeneracy=1, estimator='', return_all_kinetics=True)¶
Return the kinetics for the given reaction by searching the various depositories as well as generating a result using the user-specified estimator. Currently, only ‘rate rules’ is a supported estimator. Unlike the regular
get_kinetics()
method, this returns a list of results, with each result comprising ofthe kinetics
the source - this will be None if from a template estimate
the entry - this will be None if from a template estimate
is_forward a boolean denoting whether the matched entry is in the same direction as the inputted reaction. This will always be True if using rates rules. This can be True or False if using a depository
If return_all_kinetics==False, only the first (best?) matching kinetics is returned.
- get_kinetics_for_template(template, degeneracy=1, method='rate rules')¶
Return an estimate of the kinetics for a reaction with the given template and reaction-path degeneracy. There is currently only one method to use: ‘rate rules’ (old RMG-Java behavior, and default RMG-Py behavior). Group additivity was removed in August 2023.
Returns a tuple (kinetics, entry): If it’s estimated via ‘rate rules’ and an exact match is found in the tree, then the entry is returned as the second element of the tuple. But if an average is used, then the tuple returned is (kinetics, None).
- get_kinetics_from_depository(depository, reaction, template, degeneracy)¶
Search the given depository in this kinetics family for kinetics for the given reaction. Returns a list of all of the matching kinetics, the corresponding entries, and
True
if the kinetics match the forward direction orFalse
if they match the reverse direction.
- get_labeled_reactants_and_products(reactants, products, relabel_atoms=True)¶
Given reactants, a list of
Molecule
objects, and products, a list ofMolecule
objects, return two new lists ofMolecule
objects with atoms labeled: one for reactants, one for products. Returned molecules are totally new entities in memory so input molecules reactants and products won’t be affected. If RMG cannot find appropriate labels, (None, None) will be returned. Ifrelabel_atoms
isTrue
, product atom labels of reversible families will be reversed to assist in identifying forbidden structures.
- get_rate_rule(template)¶
Return the rate rule with the given template. Raises a
ValueError
if no corresponding entry exists.
- get_reaction_matches(rxns=None, thermo_database=None, remove_degeneracy=False, estimate_thermo=True, fix_labels=False, exact_matches_only=False, get_reverse=False, rxns_with_kinetics_only=False)¶
returns a dictionary mapping for each entry in the tree: (entry.label,entry.item) : list of all training reactions (or the list given) that match that entry
- get_reaction_pairs(reaction)¶
For a given reaction with properly-labeled
Molecule
objects as the reactants, return the reactant-product pairs to use when performing flux analysis.
- get_reaction_template(reaction)¶
For a given reaction with properly-labeled
Molecule
objects as the reactants, determine the most specific nodes in the tree that describe the reaction.
- get_reaction_template_labels(reaction)¶
Retrieve the template for the reaction and return the corresponding labels for each of the groups in the template.
- get_root_template()¶
Return the root template for the reaction family. Most of the time this is the top-level nodes of the tree (as stored in the
KineticsGroups
object), but there are a few exceptions (e.g. R_Recombination).
- get_rxn_batches(rxns, T=1000.0, max_batch_size=800, outlier_fraction=0.02, stratum_num=8)¶
Breaks reactions into batches based on a modified stratified sampling scheme Effectively: The top and bottom outlier_fraction of all reactions are always included in the first batch The remaining reactions are ordered by the rate coefficients at T The list of reactions is then split into stratum_num similarly sized intervals batches sample equally from each interval, but randomly within each interval until they reach max_batch_size reactions A list of lists of reactions containing the batches is returned
- get_sources_for_template(template)¶
Returns the set of rate rules and training reactions used to average this template. Note that the tree must be averaged with verbose=True for this to work.
Returns a tuple of rules, training
where rules are a list of tuples containing the [(original_entry, weight_used_in_average), … ]
and training is a list of tuples containing the [(rate_rule_entry, training_reaction_entry, weight_used_in_average),…]
- get_species(path, resonance=True)¶
Load the dictionary containing all of the species in a kinetics library or depository.
- get_top_level_groups(root)¶
Returns a list of group nodes that are the highest in the tree starting at node “root”. If “root” is a group node, then it will return a single-element list with “root”. Otherwise, for every child of root, we descend until we find no nodes with logic nodes. We then return a list of all group nodes found along the way.
- get_training_depository()¶
Returns the training depository from self.depositories
- get_training_set(thermo_database=None, remove_degeneracy=False, estimate_thermo=True, fix_labels=False, get_reverse=False, rxns_with_kinetics_only=False)¶
retrieves all reactions in the training set, assigns thermo to the species objects reverses reactions as necessary so that all reactions are in the forward direction and returns the resulting list of reactions in the forward direction with thermo assigned
- has_rate_rule(template)¶
Return
True
if a rate rule with the given template currently exists, orFalse
otherwise.
- is_entry_match(mol, entry, resonance=True)¶
determines if the labeled molecule object of reactants matches the entry entry
- is_molecule_forbidden(molecule)¶
Return
True
if the molecule is forbidden in this family, orFalse
otherwise.
- load(path, local_context=None, global_context=None, depository_labels=None)¶
Load a kinetics database from a file located at path on disk.
If depository_labels is a list, eg. [‘training’,’PrIMe’], then only those depositories are loaded, and they are searched in that order when generating kinetics.
If depository_labels is None then load ‘training’ first then everything else. If depository_labels is not None then load in the order specified in depository_labels.
- load_forbidden(label, group, shortDesc='', longDesc='')¶
Load information about a forbidden structure. Note that argument names are retained for backward compatibility with loading database files.
- load_old(dictstr, treestr, libstr, num_parameters, num_labels=1, pattern=True)¶
Load a dictionary-tree-library based database. The database is stored in three files: dictstr is the path to the dictionary, treestr to the tree, and libstr to the library. The tree is optional, and should be set to ‘’ if not desired.
- load_old_dictionary(path, pattern)¶
Parse an old-style RMG database dictionary located at path. An RMG dictionary is a list of key-value pairs of a one-line string key and a multi-line string value. Each record is separated by at least one empty line. Returns a
dict
object with the values converted toMolecule
orGroup
objects depending on the value of pattern.
- load_old_library(path, num_parameters, num_labels=1)¶
Parse an RMG database library located at path.
- load_old_tree(path)¶
Parse an old-style RMG database tree located at path. An RMG tree is an n-ary tree representing the hierarchy of items in the dictionary.
- load_recipe(actions)¶
Load information about the reaction recipe.
- load_template(reactants, products, ownReverse=False)¶
Load information about the reaction template. Note that argument names are retained for backward compatibility with loading database files.
- make_tree(obj=None, regularization=<function KineticsFamily.simple_regularization>, thermo_database=None, T=1000.0)¶
generates tree structure and then generates rules for the tree
- match_node_to_child(parent_node, child_node)¶
Return True if parent_node is a parent of child_node. Otherwise, return False. Both parent_node and child_node must be Entry types with items containing Group or LogicNode types. If parent_node and child_node are identical, the function will also return False.
- match_node_to_node(node, node_other)¶
Return True if node and node_other are identical. Otherwise, return False. Both node and node_other must be Entry types with items containing Group or LogicNode types.
- match_node_to_structure(node, structure, atoms, strict=False)¶
Return
True
if the structure centered at atom matches the structure at node in the dictionary. The structure at node should have atoms with the appropriate labels because they are set on loading and never change. However, the atoms in structure may not have the correct labels, hence the atoms parameter. The atoms parameter may include extra labels, and so we only require that every labeled atom in the functional group represented by node has an equivalent labeled atom in structure.Matching to structure is more strict than to node. All labels in structure must be found in node. However the reverse is not true, unless strict is set to True.
Attribute
Description
node
Either an Entry or a key in the self.entries dictionary which has a Group or LogicNode as its Entry.item
structure
A Group or a Molecule
atoms
Dictionary of {label: atom} in the structure. A possible dictionary is the one produced by structure.get_all_labeled_atoms()
strict
If set to
True
, ensures that all the node’s atomLabels are matched by in the structure
- parse_old_library(path, num_parameters, num_labels=1)¶
Parse an RMG database library located at path, returning the loaded entries (rather than storing them in the database). This method does not discard duplicate entries.
- prune_tree(rxns, newrxns, thermo_database=None, new_fraction_threshold_to_reopt_node=0.25, fix_labels=True, exact_matches_only=True, get_reverse=True)¶
Remove nodes that have less than maxRxnToReoptNode reactions that match and clear the regularization dimensions of their parent This is used to remove smaller easier to optimize and more likely to change nodes before adding a new batch in cascade model generation
- regularize(regularization=<function KineticsFamily.simple_regularization>, keep_root=True, thermo_database=None, template_rxn_map=None, rxns=None)¶
Regularizes the tree according to the regularization function regularization
- remove_group(group_to_remove)¶
Removes a group that is in a tree from the database. In addition to deleting from self.entries, it must also update the parent/child relationships
Returns the removed group
- retrieve_original_entry(template_label)¶
Retrieves the original entry, be it a rule or training reaction, given the template label in the form ‘group1;group2’ or ‘group1;group2;group3’
Returns tuple in the form (RateRuleEntry, TrainingReactionEntry)
Where the TrainingReactionEntry is only present if it comes from a training reaction
- retrieve_template(template_labels)¶
Reconstruct the groups associated with the labels of the reaction template and return a list.
- save(path)¶
Save the current database to the file at location path on disk.
- save_depository(depository, path)¶
Save the given kinetics family depository to the location path on disk.
- save_dictionary(path)¶
Extract species from all entries associated with a kinetics library or depository and save them to the path given.
- save_entry(f, entry)¶
Write the given entry in the thermo database to the file object f.
- save_generated_tree(path=None)¶
clears the rules and saves the family to its current location in database
- save_groups(path)¶
Save the current database to the file at location path on disk.
- save_old(dictstr, treestr, libstr)¶
Save the current database to a set of text files using the old-style syntax.
- save_old_dictionary(path)¶
Save the current database dictionary to a text file using the old-style syntax.
- save_old_library(path)¶
Save the current database library to a text file using the old-style syntax.
- save_old_tree(path)¶
Save the current database tree to a text file using the old-style syntax.
- save_training_reactions(reactions, reference=None, reference_type='', short_desc='', long_desc='', rank=3)¶
This function takes a list of reactions appends it to the training reactions file. It ignores the existence of duplicate reactions.
The rank for each new reaction’s kinetics is set to a default value of 3 unless the user specifies differently for those reactions.
For each entry, the long description is imported from the kinetics comment.
- simple_regularization(node, template_rxn_map, test=True)¶
Simplest regularization algorithm All nodes are made as specific as their descendant reactions Training reactions are assumed to not generalize For example if an particular atom at a node is Oxygen for all of its descendent reactions a reaction where it is Sulfur will never hit that node unless it is the top node even if the tree did not split on the identity of that atom
The test option to this function determines whether or not the reactions under a node match the extended group before adding an extension. If the test fails the extension is skipped.
In general test=True is needed if the cascade algorithm was used to generate the tree and test=False is ok if the cascade algorithm wasn’t used.