Adjacency Lists

Note

The adjacency list syntax changed in July 2014. The minimal requirement for most translations is to prefix the number of unpaired electrons with the letter u. The new syntax, however, allows much greater flexibility, including definition of lone pairs, partial charges, wildcards, and molecule multiplicities.

Note

To quickly visualize any adjacency list, or to generate an adjacency list from other types of molecular representations such as SMILES, InChI, or even common species names, use the Molecule Search tool found here: https://rmg.mit.edu/molecule_search

An adjacency list is the most general way of specifying a chemical molecule or molecular pattern in RMG. It is based on the adjacency list representation of the graph data type – the underlying data type for molecules and patterns in RMG – but extended to allow for specification of extra semantic information.

The first line of most adjacency lists is a unique identifier for the molecule or pattern the adjacency list represents. This is not strictly required, but is recommended in most cases. Generally the identifier should only use alphanumeric characters and the underscore, as if an identifier in many popular programming languages. However, strictly speaking any non-space ASCII character is allowed.

The subsequent lines may contain keyword-value pairs. Currently there can be: multiplicity, metal and facet.

For species or molecule declarations, the value after multiplicity defines the spin multiplicity of the molecule. E.g. multiplicity 1 for most ground state closed shell species, multiplicity 2 for most radical species, and multiplicity 3 for a triplet biradical. If the multiplicity line is not present then a value of (1 + number of unpaired electrons) is assumed. Thus, it can usually be omitted, but if present can be used to distinguish, for example, singlet CH2 from triplet CH2.

If defining a Functional Group, then the value must be a list, which defines the multiplicities that will be matched by the group, eg. multiplicity [1,2,3] or, for a single value, multiplicity [1]. If a wildcard is desired, the line 'multiplicity x can be used instead to accept all multiplicities. If the multiplicity line is omitted altogether, then a wildcard is assumed.

metal and facet work similarly and will correspond to lines like metal Fe, metal [Fe,Cu,Ag], facet 111, facet [111,211,110].

e.g. the following two group adjlists represent identical groups.

group1
multiplicity x
1    R!H u0
group2
1    R!H u0

After the identifier line and keyword-value lines, each subsequent line describes a single atom and its local bond structure. The format of these lines is a whitespace-delimited list with tokens

<number> [<label>] <element> u<unpaired> [p<pairs>] [c<charge>] [s<site>] [m<morphology>] <bondlist>

The first item is the number used to identify that atom. Any number may be used, though it is recommended to number the atoms sequentially starting from one. Next is an optional label used to tag that atom; this should be an asterisk followed by a unique number for the label, e.g. *1. In some cases (e.g. thermodynamics groups) there is only one labeled atom, and the label is just an asterisk with no number: *.

After that is the atom’s element or atom type, indicated by its atomic symbol, followed by a sequence of tokens describing the electronic state of the atom:

  • u0 number of unpaired electrons (eg. radicals)

  • p0 number of lone pairs of electrons, common on oxygen and nitrogen.

  • c0 formal charge on the atom, e.g. c-1 (negatively charged), c0, c+1 (positively charged)

  • s the site type a site atom is e.g. s"fcc"

  • m the morphology of a site atom e.g. m"terrace"

For Molecule definitions: The value must be a single integer (and for charge must have a + or - sign if not equal to 0) The number of unpaired electrons (i.e. radical electrons) is required, even if zero. The number of lone pairs and the formal charge are assumed to be zero if omitted.

For Group definitions: The value can be an integer or a list of integers (with signs, for charges), eg. u[0,1,2], c[0,+1,+2,+3,+4], or s["hcp","fcc"] or may be a wildcard ``x which matches any valid value, eg. px is the same as p[0,1,2,3,4, ...] and cx is the same as c[...,-4,-3,-2,-1,0,+1,+2,+3,+4,...]. Lists must be enclosed is square brackets, and separated by commas, without spaces. If lone pairs or formal charges are omitted from a group definition, the wildcard is assumed.

The last set of tokens is the list of bonds. To indicate a bond, place the number of the atom at the other end of the bond and the bond type within curly braces and separated by a comma, e.g. {2,S}. Multiple bonds from the same atom should be separated by whitespace.

Note

You must take care to make sure each bond is listed on the lines of both atoms in the bond, and that these entries have the same bond type. RMG will raise an exception if it encounters such an invalid adjacency list.

When writing a molecular substructure pattern, you may specify multiple elements, radical counts, and bond types as a comma-separated list inside square brackets. For example, to specify any carbon or oxygen atom, use the syntax [C,O]. For a single or double bond to atom 2, write {2,[S,D]}.

Atom types such as R!H or Cdd may also be used as a shorthand. (Atom types like Cdd can also be used in full molecules, but this use is discouraged, as RMG can compute them automatically for full molecules.)

Below is an example adjacency list, for 1,3-hexadiene, with the weakest bond in the molecule labeled with *1 and *2. Note that hydrogen atoms can be omitted if desired, as their presence is inferred, provided that unpaired electrons, lone pairs, and charges are all correctly defined:

HXD13
multiplicity 1
1    C u0       {2,D}
2    C u0 {1,D} {3,S}
3    C u0 {2,S} {4,D}
4    C u0 {3,D} {5,S}
5 *1 C u0 {4,S} {6,S}
6 *2 C u0 {5,S}

The allowed element types, radicals, and bonds are listed in the following table:

Notation

Explanation

Chemical Element

C

Carbon atom

O

Oxygen atom

H

Hydrogen atom

S

Sulfur atom

N

Nitrogen atom

Nonreactive Elements

Si

Silicon atom

Cl

Chlorine atom

He

Helium atom

Ar

Argon atom

Chemical Bond

S

Single Bond

D

Double Bond

T

Triple bond

B | Benzene bond

vdW | Van der Waals bond

H | Hydrogen bond

R | Reaction bond

rmgpy.molecule.adjlist.from_adjacency_list(adjlist, group=False, saturate_h=False, check_consistency=True)

Convert a string adjacency list adjlist into a set of Atom and Bond objects.

rmgpy.molecule.adjlist.to_adjacency_list(atoms, multiplicity, metal='', facet='', label=None, group=False, remove_h=False, remove_lone_pairs=False, old_style=False)

Convert a chemical graph defined by a list of atoms into a string adjacency list.