4. Creating Input Files

The syntax and parameters within an RMG input file are explained below. We recommend trying to build your first input file while referencing one of the Example Input Files. Alternatively, you can use our web form found at http://rmg.mit.edu/input to assist in creating an input file.

4.1. Syntax

The format of RMG-Py input.py is based on Python syntax.

Each section is made up of one or more function calls, where parameters are specified as text strings, numbers, or objects. Text strings must be wrapped in either single or double quotes.

4.2. Datasources

This section explains how to specify various reaction and thermo data sources in the input file.

4.2.1. Thermo Libraries

By default, RMG will calculate the thermodynamic properties of the species from Benson additivity formulas. In general, the group-additivity results are suitably accurate. However, if you would like to override the default settings, you may specify the thermodynamic properties of species in the ThermoLibrary. When a species is specified in the ThermoLibrary, RMG will automatically use those thermodynamic properties instead of generating them from Benson’s formulas. Multiple libraries may be created, if so desired. The order in which the thermo libraries are specified is important: If a species appears in multiple thermo libraries, the first instance will be used.

Now in RMG, you have two types of thermo libraries: gas and liquid thermo libraries. As species thermo in liquid phase depends on the solvent, those libraries can only be used in liquid phase simulation with the corresponding solvent. Gas phase thermo library can be used either in gas phase simulation or in liquid phase simulation. (see more details on the two thermo library types and how to use thermo librairies in liquid phase simulation)

Please see Section editing thermo database for more details. In general, it is best to leave the ThermoLibrary set to its default value. In particular, the thermodynamic properties for H and H2 must be specified in one of the primary thermo libraries as they cannot be estimated by Benson’s method.

For example, if you wish to use the GRI-Mech 3.0 mechanism [GRIMech3.0] as a ThermoLibrary in your model, the syntax will be:

thermoLibraries = ['primaryThermoLibrary','GRI-Mech3.0']
[GRIMech3.0]Gregory P. Smith, David M. Golden, Michael Frenklach, Nigel W. Moriarty, Boris Eiteneer, Mikhail Goldenberg, C. Thomas Bowman, Ronald K. Hanson, Soonho Song, William C. Gardiner, Jr., Vitali V. Lissianski, and Zhiwei Qin http://www.me.berkeley.edu/gri_mech/

This library is located in the $RMG/RMG-database/input/thermo/libraries directory. All “Locations” for the ThermoLibrary field must be with respect to the $RMG/RMG-database/input/thermo/libraries directory.

Note

Checks during the initialization are maid to avoid users to use “liquid thermo librairies” in gas phase simulations or to use “liquid phase libraries” obtained in another solvent that the one defined in the input file in liquid phase simulations.

4.2.2. Reaction Libraries

The next section of the input.py file specifies which, if any, Reaction Libraries should be used. When a reaction library is specified, RMG will first use the reaction library to generate all the relevant reactions for the species in the core before going through the reaction templates. Unlike the Seed Mechanism, reactions present in a Reaction Library will not be included in the core automatically from the start.

You can specify your own reaction library in the location section. In the following example, the user has created a reaction library with a few additional reactions specific to n-butane, and these reactions are to be used in addition to the Glarborg C3 library:

reactionLibraries = [('Glarborg/C3',False)],

The keyword False/True permits user to append all unused reactions (= kept in the edge) from this library to the chemkin file. True means those reactions will be appended. Using just the string inputs would lead to a default value of False. In the previous example, this would look like:

reactionLibraries = ['Glarborg/C3'],

The reaction libraries are stored in $RMG-database/input/kinetics/libraries/ and the Location: should be specified relative to this path.

Because the units for the Arrhenius parameters are given in each mechanism, the different mechanisms can have different units.

Note

While using a Reaction Library the user must be careful enough to provide all instances of a particular reaction in the library file, as RMG will ignore all reactions generated by its templates. For example, suppose you supply the Reaction Library with butyl_1 –> butyl_2. Although RMG would find two unique instances of this reaction (via a three- and four-member cyclic Transition State), RMG would only use the rate coefficient supplied by you in generating the mechanism.

RMG will not handle irreversible reactions correctly, if supplied in a Reaction Library.

4.2.3. Seed Mechanisms

The next section of the input.py file specifies which, if any, Seed Mechanisms should be used. If a seed mechanism is passed to RMG, every species and reaction present in the seed mechanism will be placed into the core, in addition to the species that are listed in the Species Representation section.

For details of the kinetics libraries included with RMG that can be used as a seed mechanism, see Reaction Libraries.

You can specify your own seed mechanism in the location section. Please note that the oxidation library should not be used for pyrolysis models. The syntax for the seed mechanisms is similar to that of the primary reaction libraries.

seedMechanisms = ['GRI-Mech3.0']

The seed mechanisms are stored in RMG-database/input/kinetics/libraries/

As the units for the Arrhenius parameters are given in each mechanism, different mechanisms can have different units. Additionally, if the same reaction occurs more than once in the combined mechanism, the instance of it from the first mechanism in which it appears is the one that gets used.

4.2.4. Kinetics Depositories

kineticsDepositories = ['training']

4.2.5. Kinetics Families

In this section users can specify the particular reaction families that they wish to use to generate their model. for example you can use only Intra_RH_Add_Endocyclic family to build the model by:

kineticsFamilies = ['Intra_RH_Add_Endocyclic']

Otherwise, by typing ‘default’ (and excluding the brackets that are shown in the example above), RMG will use recommended reaction families to generate the mechanism. The recommended reaction families can be found in RMG-database/input/families/recommended.py.

4.2.6. Kinetics Estimator

The last section is specifying that RMG is estimating kinetics of reactions from rate rules. For more details on how kinetic estimations is working check Kinetics Estimation:

kineticsEstimator = 'rate rules'

The following is an example of a database block, based on above chosen libraries and options:

database(
        thermoLibraries = ['primaryThermoLibrary', 'GRI-Mech3.0'],
        reactionLibraries = [('Glarborg/C3',False)],
        seedMechanisms = ['GRI-Mech3.0'],
        kineticsDepositories = ['training'],
        kineticsFamilies = 'defult',
        kineticsEstimator = 'rate rules',
)

4.3. List of species

Species to be included in the core at the start of your RMG job are defined in the species block. The label, reactive or inert, and structure of each reactant must be specified.

The label field will be used throughout your mechanism to identify the species. Inert species in the model can be defined by setting reactive to be False. Reaction families will no longer be applied to these species, but reactions of the inert from libraries and seed mechanisms will still be considered. For all other species the reactive status must be set as True. The structure of the species can be defined using either by using SMILES or adjacencyList.

The following is an example of a typical species item, based on methane using SMILE or adjacency list to define the structure:

species(
        label='CH4',
        reactive=True,
        structure=SMILES("C"),
)

species(
        label='CH4',
        reactive=True,
        structure=adjacencyList(
                """
                1 C 0
                """
)

4.4. Reaction System

Every reaction system we want the model to be generated at must be defined individually. Currently, RMG can only model constant temperature and pressure systems. Future versions will allow for variable temperature and pressure. To define a reaction system we need to define the temperature, pressure and initial mole fractions of the reactant species. The initial mole fractions are defined using the label for the species in the species block. Every reaction system can have its termination criterion based on species conversion or termination time or both. When both termination criterion are specified the model generation will stop when either of the termination criterion is satisfied.

The following is an example of a simple reactor system:

simpleReactor(
        temperature=(1350,'K'),
        pressure=(1.0,'bar'),
        initialMoleFractions={
                "CH4": 0.104,
                "H2": 0.0156,
                "N2": 0.8797,
        },
        terminationConversion={
                'CH4': 0.9,
        },
        terminationTime=(1e0,'s'),
        sensitivity=['CH4','H2'],
        sensitivityThreshold=0.001,

)

Troubleshooting tip: if you are using a goal conversion rather than time, the reaction systems may reach equilibrium below the goal conversion, leading to a job that cannot converge physically. Therefore it is may be necessary to reduce the goal conversion or set a goal reaction time.

For sensitivity analysis, RMG-Py must be compiled with the DASPK solver, which is done by default but has some dependency restrictions. (See License Restrictions on Dependencies for more details.) The sensitivity and sensitivityThrehold are optional arguments for when the user would like to conduct sensitivity analysis with respect to the reaction rate coefficients for the list of species given for sensitivity.

Sensitivity analysis is conducted for the list of species given for sensitivity argument in the input file. The normalized concentration sensitivities with respect to the reaction rate coefficients dln(C_i)/dln(k_j) are saved to a csv file with the file name sensitivity_1_SPC_1.csv with the first index value indicating the reactor system and the second naming the index of the species the sensitivity analysis is conducted for. Sensitivities to thermo of individual species is also saved as semi normalized sensitivities dln(C_i)/d(G_j) where the units are given in 1/(kcal mol-1). The sensitivityThreshold is set to some value so that only sensitivities for dln(C_i)/dln(k_j) > sensitivityThreshold or dlnC_i/d(G_j) > sensitivityThreshold are saved to this file.

Note that in the RMG job, after the model has been generated to completion, sensitivity analysis will be conducted in one final simulation (sensitivity is not performed in intermediate iterations of the job).

4.5. Simulator Tolerances

The next two lines specify the absolute and relative tolerance for the ODE solver, respectively. Common values for the absolute tolerance are 1e-15 to 1e-25. Relative tolerance is usually 1e-4 to 1e-8:

simulator(
    atol=1e-16,
    rtol=1e-8,
    sens_atol=1e-6,
    sens_rtol=1e-4,
)

The sens_atol and sens_rtol are optional arguments for the sensitivity absolute tolerance and sensitivity relative tolerances, respectively. They are set to a default value of 1e-6 and 1e-4 respectively unless the user specifies otherwise. They do not apply when sensitivity analysis is not conducted.

4.6. Model Tolerances

Model tolerances dictate how species get included in the model. For more information, see the theory behind how RMG builds models using the Flux-based Algorithm. For running an initial job, it is recommended to only change the toleranceMoveToCore and toleranceInterruptSimulation values to an equivalent desired value. We find that typically a value between 0.01 and 0.05 is best. If your model cannot converge within a few hours, more advanced settings such as reaction filtering or pruning can be turned on to speed up your simulation at a slight risk of omitting chemistry.

model(
    toleranceMoveToCore=0.1,
    toleranceInterruptSimulation=0.1,
)
  • toleranceMoveToCore indicates how high the edge flux ratio for a species must get to enter the core model. This tolerance is designed for controlling the accuracy of final model.
  • toleranceInterruptSimulation indicates how high the edge flux ratio must get to interrupt the simulation (before reaching the terminationConversion or terminationTime). This value should be set to be equal to toleranceMoveToCore unless the advanced pruning feature is desired.

4.6.1. Advanced Setting: Speed Up by Filtering Reactions

For generating models for larger molecules, RMG-Py may have trouble converging because it must find reactions on the order of \((n_{reaction\: sites})^{{n_{species}}}\). Thus it can be further sped up by pre-filtering reactions that are added to the model. This modification to the algorithm does not react core species together until their concentrations are deemed high enough. It is recommended to turn on this flag when the model does not converge with normal parameter settings. See Filtering Reactions within the Flux-based Algorithm. for more details.

model(
    toleranceMoveToCore=0.1,
    toleranceInterruptSimulation=0.1,
    filterReactions=True,
)

Additional parameters:

  • filterReactions: set to True if reaction filtering is turned on. By default it is set to False.

4.6.2. Advanced Setting: Speed Up by Pruning

For further speed-up, it is also possible to perform mechanism generation with pruning of “unimportant” edge species to reduce memory usage.

A typical set of parameters for pruning is:

model(
    toleranceMoveToCore=0.5,
    toleranceInterruptSimulation=1e8,
    toleranceKeepInEdge=0.05,
    maximumEdgeSpecies=200000
    minCoreSizeForPrune=50,
    minSpeciesExistIterationsForPrune=2,
    )

Additional parameters:

  • toleranceKeepInEdge indicates how low the edge flux ratio for a species must be to keep on the edge. This should be set to zero, which is its default.
  • maximumEdgeSpecies indicates the upper limit for the size of the edge. The default value is set to 1000000 species.
  • minCoreSizeForPrune ensures that a minimum number of species are in the core before pruning occurs, in order to avoid pruning the model when it is far away from completeness. The default value is set to 50 species.
  • minSpeciesExistIterationsForPrune is set so that the edge species stays in the job for at least that many iterations before it can be pruned. The default value is 2 iterations.

Recommendations:

We recommend setting toleranceKeepInEdge to not be larger than 10% of toleranceMoveToCore, based on a pruning case study. In order to always enable pruning, toleranceInterruptSimulation should be set as a high value, e.g. 1e8. maximumEdgeSpecies can be adjusted based on user’s RAM size. Usually 200000 edge species would cause memory shortage of 8GB computer, setting maximumEdgeSpecies = 200000 (or lower values) could effectively prevent memory crash.

Additional Notes:

Note that when using pruning, RMG will not prune unless all reaction systems reach the goal reaction time or conversion without exceeding the toleranceInterruptSimulation. Therefore, you may find that RMG is not pruning even though the model edge size exceeds maximumEdgeSpecies, or an edge species has flux below the toleranceKeepInEdge. This is a safety check within RMG to ensure that species are not pruned too early, resulting in inaccurate chemistry. In order to increase the likelihood of pruning you can try increasing toleranceInterruptSimulation to an arbitrarily high value.

As a contrast, a typical set of parameters for non-pruning is:

model(
    toleranceKeepInEdge=0,
    toleranceMoveToCore=0.5,
    toleranceInterruptSimulation=0.5,
)

where toleranceKeepInEdge is always 0, meaning all the edge species will be kept in edge since all the edge species have positive flux. toleranceInterruptSimulation equals to toleranceMoveToCore so that ODE simulation get interrupted once discovering a new core species. Because the ODE simulation is always interrupted, no pruning is performed.

Please find more details about the theory behind pruning at Pruning Theory.

4.7. On the fly Quantum Calculations

This block is used when quantum mechanical calculations are desired to determine thermodynamic parameters. These calculations are only run if the molecule is not included in a specified thermo library. The onlyCyclics option, if True, only runs these calculations for cyclic species. In this case, group additive estimates are used for all other species.

Molecular geometries are estimated via RDKit [RDKit]. Either MOPAC (2009 and 2012) or GAUSSIAN (2003 and 2009) can be used with the semi-empirical pm3, pm6, and pm7 (pm7 only available in MOPAC2012), specified in the software and method blocks. A folder can be specified to store the files used in these calculations, however if not specified this defaults to a QMfiles folder in the output folder.

The calculations are also only run on species with a maximum radical number set by the user. If a molecule has a higher radical number, the molecule is saturated with hydrogen atoms, then quantum mechanical calculations with subsequent hydrogen bond incrementation is used to determine the thermodynamic parameters.

The following is an example of the quantum mechanics options

quantumMechanics(
        software='mopac',
        method='pm3',
        fileStore='QMfiles',
        scratchDirectory = None,
        onlyCyclics = True,
        maxRadicalNumber = 0,
        )
[RDKit]RDKit: Open-source cheminformatics; http://www.rdkit.org

4.8. Pressure Dependence

This block is used when the model should account for pressure dependent rate coefficients. RMG can estimate pressure dependence kinetics based on Modified Strong Collision and Reservoir State methods. The former utilizes the modified strong collision approach of Chang, Bozzelli, and Dean [Chang2000], and works reasonably well while running more rapidly. The latter utilizes the steady-state/reservoir-state approach of Green and Bhatti [Green2007], and is more theoretically sound but more expensive.

The pressure dependence block should specify the following:

4.8.1. Method used for estimating pressure dependent kinetics

To specify the modified strong collision approach, this item should read

method='Modified Strong Collision'

To specify the reservoir state approach, this item should read

method='Reservoir State'

For more information on the two methods, consult the following resources :

[Chang2000]A.Y. Chang, J.W. Bozzelli, and A. M. Dean. “Kinetic Analysis of Complex Chemical Activation and Unimolecular Dissociation Reactions using QRRK Theory and the Modified Strong Collision Approximation.” Z. Phys. Chem. 214 (11), p. 1533-1568 (2000).
[Green2007]N.J.B. Green and Z.A. Bhatti. “Steady-State Master Equation Methods.” Phys. Chem. Chem. Phys. 9, p. 4275-4290 (2007).

4.8.2. Grain size and minimum number of grains

Since the \(k(E)\) requires discretization in the energy space, we need to specify the number of energy grains to use when solving the Master Equation. The default value for the minimum number of grains is 250; this was selected to balance the speed and accuracy of the Master Equation solver method. However, for some pressure-dependent networks, this number of energy grains will result in the pressure-dependent \(k(T, P)\) being greater than the high-P limit

maximumGrainSize=(0.5,'kcal/mol')
minimumNumberOfGrains=250

4.8.3. Temperature and pressure for the interpolation scheme

To generate the \(k(T,P)\) interpolation model, a set of temperatures and pressures must be used. RMG can do this automatically, but it must be told a few parameters. We need to specify the limits of the temperature and pressure for the fitting of the interpolation scheme and the number of points to be considered in between this limit. For typical combustion model temperatures of the experiments range from 300 - 2000 K and pressure 1E-2 to 100 bar

temperatures=(300,2000,'K',8)
pressures=(0.01,100,'bar',5)

4.8.4. Interpolation scheme

To use logarithmic interpolation of pressure and Arrhenius interpolation for temperature, use the line

interpolation=('PDepArrhenius',)

The auxillary information printed to the Chemkin chem.inp file will have the “PLOG” format. Refer to Section 3.5.3 of the CHEMKIN_Input.pdf document and/or Section 3.6.3 of the CHEMKIN_Theory.pdf document. These files are part of the CHEMKIN manual.

To fit a set of Chebyshev polynomials on inverse temperature and logarithmic pressure axes mapped to [-1,1], use the line

interpolation=('Chebyshev', 6, 4)

You should also specify the number of temperature and pressure basis functions by adding the appropriate integers. For example, the following specifies that six basis functions in temperature and four in pressure should be used

interpolation=('Chebyshev', 6, 4)

The auxillary information printed to the Chemkin chem.inp file will have the “CHEB” format. Refer to Section 3.5.3 of the CHEMKIN_Input.pdf document and/or Section 3.6.4 of the CHEMKIN_Theory.pdf document.

4.8.5. Maximum size of adduct for which pressure dependence kinetics be generated

By default pressure dependence is run for every system that might show pressure dependence, i.e. every isomerization, dissociation, and association reaction. In reality, larger molecules are less likely to exhibit pressure-dependent behavior than smaller molecules due to the presence of more modes for randomization of the internal energy. In certain cases involving very large molecules, it makes sense to only consider pressure dependence for molecules smaller than some user-defined number of atoms. This is specified e.g. using the line

maximumAtoms=16

to turn off pressure dependence for all molecules larger than the given number of atoms (16 in the above example).

The following is an example of pressure dependence options

pressureDependence(
        method='modified strong collision',
        maximumGrainSize=(0.5,'kcal/mol'),
        minimumNumberOfGrains=250,
        temperatures=(300,2000,'K',8),
        pressures=(0.01,100,'bar',5),
        interpolation=('Chebyshev', 6, 4),
        maximumAtoms=16,
)

Regarding the number of polynomial coeffients for Chebyshev interpolated rates, plese refer to the rmgpy.kinetics.Chebyshev documentation. The number of pressures and temperature coefficents should always be smaller than the respective number of user-specified temperatures and pressures.

4.9. Miscellaneous Options

Miscellaneous options:

options(
    units='si',
    saveRestartPeriod=(1,'hour'),
    generateOutputHTML=True,
    generatePlots=False,
    saveSimulationProfiles=True,
    verboseComments=False,
    saveEdgeSpecies=True,
)

The units field is set to si. Currently there are no other unit options.

The saveRestartPeriod indictes how frequently you wish to save restart files. For very large/long RMG jobs, this process can take a significant amount of time. In such cases, the user may wish to increase the time period for saving these restart files.

Setting generateOutputHTML to True will let RMG know that you want to save 2-D images (png files in the local species folder) of all species in the generated core model. It will save a visualized HTML file for your model containing all the species and reactions. Turning this feature off by setting it to False may save memory if running large jobs.

Setting generatePlots to True will generate a number of plots describing the statistics of the RMG job, including the reaction model core and edge size and memory use versus execution time. These will be placed in the output directory in the plot/ folder.

Setting saveSimulationProfiles to True will make RMG save csv files of the simulation in .csv files in the solver/ folder. The filename will be simulation_1_26.csv where the first number corresponds to the reaciton system, and the second number corresponds to the total number of species at the point of the simulation. Therefore, the highest second number will indicate the latest simulation that RMG has complete while enlarging the core model. The information inside the csv file will provide the time, reactor volume in m^3, as well as mole fractions of the individual species.

Setting verboseComments to True will make RMG generate chemkin files with complete verbose commentary for the kinetic and thermo parameters. This will be helpful in debugging what values are being averaged for the kinetics. Note that this may produce very large files.

Setting saveEdgeSpecies to True will make RMG generate chemkin files of the edge reactions in addition to the core model in files such as chem_edge.inp and chem_edge_annotated.inp files located inside the chemkin folder. These files will be helpful in viewing RMG’s estimate for edge reactions and seeing if certain reactions one expects are actually in the edge or not.

Setting keepIrreversible to True will make RMG import library reactions as is, whether they are reversible or irreversible in the library. Otherwise, if False (default value), RMG will force all library reactions to be reversible, and will assign the forward rate from the relevant library.

4.10. Species Constraints

RMG can generate mechanisms with a number of optional species constraints, such as total number of carbon atoms or electrons per species. These are applied to all of RMG’s reaction families.

generatedSpeciesConstraints(
    allowed=['input species','seed mechanisms','reaction libraries'],
    maximumCarbonAtoms=10,
    maximumHydrogenAtoms=10,
    maximumOxygenAtoms=10,
    maximumNitrogenAtoms=10,
    maximumSiliconAtoms=10,
    maximumSulfurAtoms=10,
    maximumHeavyAtoms=10,
    maximumRadicalElectrons=10,
    allowSingletO2 = False,
)

An additional flag allowed can be set to allow species from either the input file, seed mechanisms, or reaction libraries to bypass these constraints. Note that this should be done with caution, since the constraints will still apply to subsequent products that form.

Note that under all circumstances all forbidden species will still be banned unless they are manually removed from the database. See Kinetics Database for more information on forbidden groups.

By default, the allowSingletO2 flag is set to False. See Representing Oxygen for more information.