# rmgpy.molecule.Molecule¶

class rmgpy.molecule.Molecule(atoms=None, symmetry=- 1, multiplicity=- 187, reactive=True, props=None, inchi='', smiles='')

A representation of a molecular structure using a graph data type, extending the Graph class. Attributes are:

Attribute

Type

Description

atoms

list

A list of Atom objects in the molecule

symmetry_number

float

The (estimated) external + internal symmetry number of the molecule, modified for chirality

multiplicity

int

The multiplicity of this species, multiplicity = 2*total_spin+1

reactive

bool

True (by default) if the molecule participates in reaction families.

It is set to False by the filtration functions if a non representative resonance structure was generated by a template reaction

props

dict

A list of properties describing the state of the molecule.

inchi

str

A string representation of the molecule in InChI

smiles

str

A string representation of the molecule in SMILES

fingerprint

str

A representation for fast comparison, set as molecular formula

A new molecule object can be easily instantiated by passing the smiles or inchi string representing the molecular structure.

Add an atom to the graph. The atom is initialized with no bonds.

Add a bond to the graph as an edge connecting the two atoms atom1 and atom2.

Add an edge to the graph. The two vertices in the edge must already exist in the graph, or a ValueError is raised.

Add a vertex to the graph. The vertex is initialized with no edges.

assign_atom_ids(self)

Assigns an index to every atom in the molecule for tracking purposes. Uses entire range of cython’s integer values to reduce chance of duplicates

atom_ids_valid(self) bool

Checks to see if the atom IDs are valid in this structure

atoms

List of atoms contained in the current molecule.

Renames the inherited vertices attribute of Graph.

calculate_cp0(self) double

Return the value of the heat capacity at zero temperature in J/mol*K.

calculate_cpinf(self) double

Return the value of the heat capacity at infinite temperature in J/mol*K.

calculate_symmetry_number(self) float

Return the symmetry number for the structure. The symmetry number includes both external and internal modes.

clear_labeled_atoms(self)

Remove the labels from all atoms in the molecule.

connect_the_dots(self, critical_distance_factor=0.45, raise_atomtype_exception=True)

Delete all bonds, and set them again based on the Atoms’ coords. Does not detect bond type.

contains_labeled_atom(self, unicode label) bool

Return True if the molecule contains an atom with the label label and False otherwise.

contains_surface_site(self) bool

Returns True iff the molecule contains an ‘X’ surface site.

copy(self, bool deep=False)

Create a copy of the current graph. If deep is True, a deep copy is made: copies of the vertices and edges are used in the new graph. If deep is False or not specified, a shallow copy is made: the original vertices and edges are used in the new graph.

copy_and_map(self) dict

Create a deep copy of the current graph, and return the dict ‘mapping’. Method was modified from Graph.copy() method

count_aromatic_rings(self) int

Count the number of aromatic rings in the current molecule, as determined by the benzene bond type. This is purely dependent on representation and is unrelated to the actual aromaticity of the molecule.

Returns an integer corresponding to the number or aromatic rings.

count_internal_rotors(self) int

Determine the number of internal rotors in the structure. Any single bond not in a cycle and between two atoms that also have other bonds are considered to be internal rotors.

delete_hydrogens(self)

Irreversibly delete all non-labeled hydrogens without updating connectivity values. If there’s nothing but hydrogens, it does nothing. It destroys information; be careful with it.

draw(self, unicode path)

Generate a pictorial representation of the chemical graph using the draw module. Use path to specify the file to save the generated image to; the image type is automatically determined by extension. Valid extensions are .png, .svg, .pdf, and .ps; of these, the first is a raster format and the remainder are vector formats.

enumerate_bonds(self) dict

Count the number of each type of bond (e.g. ‘C-H’, ‘C=C’) present in the molecule :return: dictionary, with bond strings as keys and counts as values

find_h_bonds(self)

generates a list of (new-existing H bonds ignored) possible Hbond coordinates [(i1,j1),(i2,j2),…] where i and j values correspond to the indexes of the atoms involved, Hbonds are allowed if they meet the following constraints:

1. between a H and [O,N] atoms

2. the hydrogen is covalently bonded to an O or N

3. the Hydrogen bond must complete a ring with at least 5 members

4. An atom can only be hydrogen bonded to one other atom

find_isomorphism(self, Graph other, dict initial_map=None, bool save_order=False, bool strict=True) list

Returns True if other is isomorphic and False otherwise, and the matching mapping. The initialMap attribute can be used to specify a required mapping from self to other (i.e. the atoms of self are the keys, while the atoms of other are the values). The returned mapping also uses the atoms of self for the keys and the atoms of other for the values. The other parameter must be a Molecule object, or a TypeError is raised.

Parameters:
• initial_map (dict, optional) – initial atom mapping to use

• save_order (bool, optional) – if True, reset atom order after performing atom isomorphism

• strict (bool, optional) – if False, perform isomorphism ignoring electrons

find_subgraph_isomorphisms(self, Graph other, dict initial_map=None, bool save_order=False) list

Returns True if other is subgraph isomorphic and False otherwise. Also returns the lists all of valid mappings. The initial_map attribute can be used to specify a required mapping from self to other (i.e. the atoms of self are the keys, while the atoms of other are the values). The returned mappings also use the atoms of self for the keys and the atoms of other for the values. The other parameter must be a Group object, or a TypeError is raised.

fingerprint

Fingerprint used to accelerate graph isomorphism comparisons with other molecules. The fingerprint is a short string containing a summary of selected information about the molecule. Two fingerprint strings matching is a necessary (but not sufficient) condition for the associated molecules to be isomorphic.

Use an expanded molecular formula to also enable sorting.

Convert a string adjacency list adjlist to a molecular structure. Skips the first line (assuming it’s a label) unless withLabel is False.

from_augmented_inchi(self, aug_inchi, raise_atomtype_exception=True)

Convert an Augmented InChI string aug_inchi to a molecular structure.

from_inchi(self, unicode inchistr, backend=u'try-all', bool raise_atomtype_exception=True)

Convert an InChI string inchistr to a molecular structure.

from_smarts(self, smartsstr, raise_atomtype_exception=True)

Convert a SMARTS string smartsstr to a molecular structure. Uses RDKit to perform the conversion. This Kekulizes everything, removing all aromatic atom types.

from_smiles(self, unicode smilesstr, backend=u'try-all', bool raise_atomtype_exception=True)

Convert a SMILES string smilesstr to a molecular structure.

from_xyz(self, ndarray atomic_nums, ndarray coordinates, float critical_distance_factor=0.45, bool raise_atomtype_exception=True)

Create an RMG molecule from a list of coordinates and a corresponding list of atomic numbers. These are typically received from CCLib and the molecule is sent to ConnectTheDots so will only contain single bonds.

generate_h_bonded_structures(self)

generates a list of Hbonded molecular structures in addition to the constraints on Hydrogen bonds applied in the find_H_Bonds function the generated structures are constrained to:

1. An atom can only be hydrogen bonded to one other atom

2. Only two H-bonds can exist in a given molecule

the second is done to avoid explosive growth in the number of structures as without this constraint the number of possible structures grows 2^n where n is the number of possible H-bonds

generate_resonance_structures(self, bool keep_isomorphic=False, bool filter_structures=True, bool save_order=False) list

Returns a list of resonance structures of the molecule.

Get a list of adatoms in the molecule. :returns: A list containing the adatoms in the molecule :rtype: List(Atom)

get_all_cycles(self, Vertex starting_vertex) list

Given a starting vertex, returns a list of all the cycles containing that vertex.

This function returns a duplicate of each cycle because [0,1,2,3] is counted as separate from [0,3,2,1]

get_all_cycles_of_size(self, int size) list

Return a list of the all non-duplicate rings with length ‘size’. The algorithm implements was adapted from a description by Fan, Panaye, Doucet, and Barbu (doi: 10.1021/ci00015a002)

B. T. Fan, A. Panaye, J. P. Doucet, and A. Barbu. “Ring Perception: A New Algorithm for Directly Finding the Smallest Set of Smallest Rings from a Connection Table.” J. Chem. Inf. Comput. Sci. 33, p. 657-662 (1993).

get_all_cyclic_vertices(self) list

Returns all vertices belonging to one or more cycles.

get_all_edges(self) list

Returns a list of all edges in the graph.

get_all_labeled_atoms(self) dict

Return the labeled atoms as a dict with the keys being the labels and the values the atoms themselves. If two or more atoms have the same label, the value is converted to a list of these atoms.

get_all_polycyclic_vertices(self) list

Return all vertices belonging to two or more cycles, fused or spirocyclic.

get_all_simple_cycles_of_size(self, int size) list

Return a list of all non-duplicate monocyclic rings with length ‘size’.

Naive approach by eliminating polycyclic rings that are returned by getAllCyclicsOfSize.

get_aromatic_rings(self, list rings=None) tuple

Returns all aromatic rings as a list of atoms and a list of bonds.

Identifies rings using Graph.get_smallest_set_of_smallest_rings(), then uses RDKit to perceive aromaticity. RDKit uses an atom-based pi-electron counting algorithm to check aromaticity based on Huckel’s Rule. Therefore, this method identifies “true” aromaticity, rather than simply the RMG bond type.

The method currently restricts aromaticity to six-membered carbon-only rings. This is a limitation imposed by RMG, and not by RDKit.

get_bond(self, Atom atom1, Atom atom2) Bond

Returns the bond connecting atoms atom1 and atom2.

get_bonds(self, Atom atom) dict

Return a dictionary of the bonds involving the specified atom.

get_charge_span(self)

Iterate through the atoms in the structure and calculate the charge span on the overall molecule. The charge span is a measure of the number of charge separations in a molecule.

get_desorbed_molecules(self) list

Get a list of desorbed molecules by desorbing the molecule from the surface.

Returns a list of Molecules. Each molecule’s atoms will be labeled corresponding to the bond order with the surface: ‘*1’ - Single bond ‘*2’ - double bond ‘*3’ - triple bond ‘*4’ - quadruple bond

get_deterministic_sssr(self) list

Modified Graph method get_smallest_set_of_smallest_rings by sorting calculated cycles by short length and then high atomic number instead of just short length (for cases where multiple cycles with same length are found, get_smallest_set_of_smallest_rings outputs non-determinstically).

For instance, molecule with this smiles: C1CC2C3CSC(CO3)C2C1, will have non-deterministic output from get_smallest_set_of_smallest_rings, which leads to non-deterministic bicyclic decomposition. Using this new method can effectively prevent this situation.

Important Note: This method returns an incorrect set of SSSR in certain molecules (such as cubane). It is recommended to use the main Graph.get_smallest_set_of_smallest_rings method in new applications. Alternatively, consider using Graph.get_relevant_cycles for deterministic output.

In future development, this method should ideally be replaced by some method to select a deterministic set of SSSR from the set of Relevant Cycles, as that would be a more robust solution.

get_disparate_cycles(self) tuple

Get all disjoint monocyclic and polycyclic cycle clusters in the molecule. Takes the RC and recursively merges all cycles which share vertices.

Returns: monocyclic_cycles, polycyclic_cycles

get_edge(self, Vertex vertex1, Vertex vertex2) Edge

Returns the edge connecting vertices vertex1 and vertex2.

get_edges(self, Vertex vertex) dict

Return a dictionary of the edges involving the specified vertex.

get_edges_in_cycle(self, list vertices, bool sort=False) list

For a given list of atoms comprising a ring, return the set of bonds connecting them, in order around the ring.

If sort=True, then sort the vertices to match their connectivity. Otherwise, assumes that they are already sorted, which is true for cycles returned by get_relevant_cycles or get_smallest_set_of_smallest_rings.

get_element_count(self) dict

Returns the element count for the molecule as a dictionary.

get_formula(self) unicode

Return the molecular formula for the molecule.

get_labeled_atoms(self, unicode label) list

Return the atoms in the molecule that are labeled.

get_largest_ring(self, Vertex vertex) list

returns the largest ring containing vertex. This is typically useful for finding the longest path in a polycyclic ring, since the polycyclic rings returned from get_polycycles are not necessarily in order in the ring structure.

get_max_cycle_overlap(self) int

Return the maximum number of vertices that are shared between any two cycles in the graph. For example, if there are only disparate monocycles or no cycles, the maximum overlap is zero; if there are “spiro” cycles, it is one; if there are “fused” cycles, it is two; and if there are “bridged” cycles, it is three.

get_molecular_weight(self) double

Return the molecular weight of the molecule in kg/mol.

get_monocycles(self) list

Return a list of cycles that are monocyclic.

get_net_charge(self)

Iterate through the atoms in the structure and calculate the net charge on the overall molecule.

get_nth_neighbor(self, starting_atoms, distance_list, ignore_list=None, n=1)

Recursively get the Nth nonHydrogen neighbors of the starting_atoms, and return them in a list. starting_atoms is a list of :class:Atom for which we will get the nth neighbor. distance_list is a list of integers, corresponding to the desired neighbor distances. ignore_list is a list of :class:Atom that have been counted in (n-1)th neighbor, and will not be returned. n is an integer, corresponding to the distance to be calculated in the current iteration.

get_num_atoms(self, unicode element=None) int

Return the number of atoms in molecule. If element is given, ie. “H” or “C”, the number of atoms of that element is returned.

get_polycycles(self) list

Return a list of cycles that are polycyclic. In other words, merge the cycles which are fused or spirocyclic into a single polycyclic cycle, and return only those cycles. Cycles which are not polycyclic are not returned.

Return the atoms in the molecule that have unpaired electrons.

Return the total number of radical electrons on all atoms in the molecule. In this function, monoradical atoms count as one, biradicals count as two, etc.

get_relevant_cycles(self) list

Returns the set of relevant cycles as a list of lists. Uses RingDecomposerLib for ring perception.

Kolodzik, A.; Urbaczek, S.; Rarey, M. Unique Ring Families: A Chemically Meaningful Description of Molecular Ring Topologies. J. Chem. Inf. Model., 2012, 52 (8), pp 2013-2021

Flachsenberg, F.; Andresen, N.; Rarey, M. RingDecomposerLib: An Open-Source Implementation of Unique Ring Families and Other Cycle Bases. J. Chem. Inf. Model., 2017, 57 (2), pp 122-126

get_singlet_carbene_count(self) short

Return the total number of singlet carbenes (lone pair on a carbon atom) in the molecule. Counts the number of carbon atoms with a lone pair. In the case of [C] with two lone pairs, this method will return 1.

get_smallest_set_of_smallest_rings(self) list

Returns the smallest set of smallest rings as a list of lists. Uses RingDecomposerLib for ring perception.

Kolodzik, A.; Urbaczek, S.; Rarey, M. Unique Ring Families: A Chemically Meaningful Description of Molecular Ring Topologies. J. Chem. Inf. Model., 2012, 52 (8), pp 2013-2021

Flachsenberg, F.; Andresen, N.; Rarey, M. RingDecomposerLib: An Open-Source Implementation of Unique Ring Families and Other Cycle Bases. J. Chem. Inf. Model., 2017, 57 (2), pp 122-126

get_surface_sites(self) list

Get a list of surface site atoms in the molecule. :returns: A list containing the surface site atoms in the molecule :rtype: List(Atom)

get_symmetry_number(self)

Returns the symmetry number of Molecule. First checks whether the value is stored as an attribute of Molecule. If not, it calls the calculate_symmetry_number method.

get_url(self)

Get a URL to the molecule’s info page on the RMG website.

has_atom(self, Atom atom) bool

Returns True if atom is an atom in the graph, or False if not.

has_bond(self, Atom atom1, Atom atom2) bool

Returns True if atoms atom1 and atom2 are connected by an bond, or False if not.

has_charge(self)
has_edge(self, Vertex vertex1, Vertex vertex2) bool

Returns True if vertices vertex1 and vertex2 are connected by an edge, or False if not.

has_halogen(self) bool

Return True if the molecule contains at least one halogen (F, Cl, Br, or I), or False otherwise.

has_lone_pairs(self) bool

Return True if the molecule contains at least one lone electron pair, or False otherwise.

has_vertex(self, Vertex vertex) bool

Returns True if vertex is a vertex in the graph, or False if not.

identify_ring_membership(self)

Performs ring perception and saves ring membership information to the Atom.props attribute.

inchi

InChI string for this molecule. Read-only.

is_aromatic(self)

Returns True if the molecule is aromatic, or False if not. Iterates over the SSSR’s and searches for rings that consist solely of Cb atoms. Assumes that aromatic rings always consist of 6 atoms. In cases of naphthalene, where a 6 + 4 aromatic system exists, there will be at least one 6 membered aromatic ring so this algorithm will not fail for fused aromatic rings.

Return True if the molecule only contains aryl radicals, ie. radical on an aromatic ring, or False otherwise.

is_atom_in_cycle(self, Atom atom) bool

Return True if atom is in one or more cycles in the structure, and False if not.

is_bond_in_cycle(self, Bond bond) bool

Return True if the bond between atoms atom1 and atom2 is in one or more cycles in the graph, or False if not.

is_cyclic(self) bool

Return True if one or more cycles are present in the graph or False otherwise.

is_edge_in_cycle(self, Edge edge) bool

Return True if the edge between vertices vertex1 and vertex2 is in one or more cycles in the graph, or False if not.

is_heterocyclic(self) bool

Returns True if the molecule is heterocyclic, or False if not.

is_identical(self, Graph other, bool strict=True) bool

Performs isomorphism checking, with the added constraint that atom IDs must match.

Primary use case is tracking atoms in reactions for reaction degeneracy determination.

Returns True if two graphs are identical and False otherwise.

If strict=False, performs the check ignoring electrons and resonance structures.

is_isomorphic(self, Graph other, dict initial_map=None, bool generate_initial_map=False, bool save_order=False, bool strict=True) bool

Returns True if two graphs are isomorphic and False otherwise. The initialMap attribute can be used to specify a required mapping from self to other (i.e. the atoms of self are the keys, while the atoms of other are the values). The other parameter must be a Molecule object, or a TypeError is raised. Also ensures multiplicities are also equal.

Parameters:
• initial_map (dict, optional) – initial atom mapping to use

• generate_initial_map (bool, optional) – if True, initialize map by pairing atoms with same labels

• save_order (bool, optional) – if True, reset atom order after performing atom isomorphism

• strict (bool, optional) – if False, perform isomorphism ignoring electrons

is_linear(self) bool

Return True if the structure is linear and False otherwise.

is_mapping_valid(self, Graph other, dict mapping, bool equivalent=True, bool strict=True) bool

Check that a proposed mapping of vertices from self to other is valid by checking that the vertices and edges involved in the mapping are mutually equivalent. If equivalent is True it checks if atoms and edges are equivalent, if False it checks if they are specific cases of each other. If strict is True, electrons and bond orders are considered, and ignored if False.

Return True if the molecule contains at least one radical electron, or False otherwise.

is_subgraph_isomorphic(self, Graph other, dict initial_map=None, bool generate_initial_map=False, bool save_order=False) bool

Returns True if other is subgraph isomorphic and False otherwise. The initial_map attribute can be used to specify a required mapping from self to other (i.e. the atoms of self are the keys, while the atoms of other are the values). The other parameter must be a Group object, or a TypeError is raised.

is_surface_site(self) bool

Returns True iff the molecule is nothing but a surface site ‘X’.

is_vertex_in_cycle(self, Vertex vertex) bool

Return True if the given vertex is contained in one or more cycles in the graph, or False if not.

kekulize(self)

Kekulizes an aromatic molecule.

merge(self, Graph other)

Merge two molecules so as to store them in a single Molecule object. The merged Molecule object is returned.

multiplicity

‘int’

Type:

multiplicity

ordered_vertices

list

Type:

ordered_vertices

props

dict

Type:

props

reactive

‘bool’

Type:

reactive

remove_atom(self, Atom atom)

Remove atom and all bonds associated with it from the graph. Does not remove atoms that no longer have any bonds as a result of this removal.

remove_bond(self, Bond bond)

Remove the bond between atoms atom1 and atom2 from the graph. Does not remove atoms that no longer have any bonds as a result of this removal.

remove_edge(self, Edge edge)

Remove the specified edge from the graph. Does not remove vertices that no longer have any edges as a result of this removal.

remove_h_bonds(self)

removes any present hydrogen bonds from the molecule

remove_van_der_waals_bonds(self)

Remove all van der Waals bonds.

remove_vertex(self, Vertex vertex)

Remove vertex and all edges associated with it from the graph. Does not remove vertices that no longer have any edges as a result of this removal.

replace_halogen_with_hydrogen(self, bool raise_atomtype_exception=True)

Replace all halogens in a molecule with hydrogen atoms. Changes self molecule object.

reset_connectivity_values(self)

Reset any cached connectivity information. Call this method when you have modified the graph.

restore_vertex_order(self)

reorder the vertices to what they were before sorting if you saved the order

Saturate the molecule by replacing all radicals with bonds to hydrogen atoms. Changes self molecule object.

saturate_unfilled_valence(self, update=True)

Saturate the molecule by adding H atoms to any unfilled valence

smiles

SMILES string for this molecule. Read-only.

sort_atoms(self)

Sort the atoms in the graph. This can make certain operations, e.g. the isomorphism functions, much more efficient.

This function orders atoms using several attributes in atom.getDescriptor(). Currently it sorts by placing heaviest atoms first and hydrogen atoms last. Placing hydrogens last during sorting ensures that functions with hydrogen removal work properly.

sort_cyclic_vertices(self, list vertices) list

Given a list of vertices comprising a cycle, sort them such that adjacent entries in the list are connected to each other. Warning: Assumes that the cycle is elementary, ie. no bridges.

sort_vertices(self, bool save_order=False)

Sort the vertices in the graph. This can make certain operations, e.g. the isomorphism functions, much more efficient.

sorting_key

Returns a sorting key for comparing Molecule objects. Read-only

split(self) list

Convert a single Molecule object containing two or more unconnected molecules into separate class:Molecule objects.

symmetry_number

‘float’

Type:

symmetry_number

to_adjacency_list(self, unicode label=u'', bool remove_h=False, bool remove_lone_pairs=False, bool old_style=False)

Convert the molecular structure to a string adjacency list.

to_augmented_inchi(self) unicode

Adds an extra layer to the InChI denoting the multiplicity of the molecule.

Separate layer with a forward slash character.

to_augmented_inchi_key(self) unicode

Adds an extra layer to the InChIKey denoting the multiplicity of the molecule.

Simply append the multiplicity string, do not separate by a character like forward slash.

to_group(self)

This method converts a list of atoms in a Molecule to a Group object.

to_inchi(self) unicode

Convert a molecular structure to an InChI string. Uses RDKit to perform the conversion. Perceives aromaticity.

or

Convert a molecular structure to an InChI string. Uses OpenBabel to perform the conversion.

to_inchi_key(self) unicode

Convert a molecular structure to an InChI Key string. Uses OpenBabel to perform the conversion.

or

Convert a molecular structure to an InChI Key string. Uses RDKit to perform the conversion.

to_rdkit_mol(self, *args, **kwargs)

Convert a molecular structure to a RDKit rdmol object.

to_single_bonds(self, raise_atomtype_exception=True)

Returns a copy of the current molecule, consisting of only single bonds.

This is useful for isomorphism comparison against something that was made via from_xyz, which does not attempt to perceive bond orders

to_smarts(self)

Convert a molecular structure to an SMARTS string. Uses RDKit to perform the conversion. Perceives aromaticity and removes Hydrogen atoms.

to_smiles(self) unicode

Convert a molecular structure to an SMILES string.

If there is a Nitrogen atom present it uses OpenBabel to perform the conversion, and the SMILES may or may not be canonical.

Otherwise, it uses RDKit to perform the conversion, so it will be canonical SMILES. While converting to an RDMolecule it will perceive aromaticity and removes Hydrogen atoms.

update(self, log_species=True, raise_atomtype_exception=True, sort_atoms=True)

Update the charge and atom types of atoms. Update multiplicity, and sort atoms (if sort_atoms is True) Does not necessarily update the connectivity values (which are used in isomorphism checks) If you need that, call update_connectivity_values()

update_atomtypes(self, bool log_species=True, bool raise_exception=True)

Iterate through the atoms in the structure, checking their atom types to ensure they are correct (i.e. accurately describe their local bond environment) and complete (i.e. are as detailed as possible).

If raise_exception is False, then the generic atomtype ‘R’ will be prescribed to any atom when get_atomtype fails. Currently used for resonance hybrid atom types.

update_connectivity_values(self)

Update the connectivity values for each vertex in the graph. These are used to accelerate the isomorphism checking.

update_lone_pairs(self)

Iterate through the atoms in the structure and calculate the number of lone electron pairs, assuming a neutral molecule.

update_multiplicity(self)

Update the multiplicity of a newly formed molecule.

vertices

list

Type:

vertices