6. Running a Job¶
Note: In all these examples rmg.py
should be the path to your installed RMG (eg. yours might be /Users/joeblogs/Code/RMG-Py/rmg.py
) and input.py
is the path to the input file you wish to run (eg. yours might be RMG-runs/hexadiene/input.py
). If you get an error like python: can't open file 'rmg.py': [Errno 2] No such file or directory
then probably the first of these is wrong. If you get an error like IOError: [Errno 2] No such file or directory: '/some/path/to/input.py'
then probably the second of these is wrong.
Running a basic RMG job is straightforward, as shown in the example below. However, depending on your case you might want to add the flags outlined in the following section. We recommend you make a job-specific directory for each RMG simulation. Some jobs can take quite a while to complete, so we also recommend using a job scheduler if working in a linux environment.
Basic run:
python-jl rmg.py input.py
6.1. Input flags¶
The options for input flags can be found in /RMG-Py/rmgpy/util.py
. Running
python-jl rmg.py -h
at the command line will print the documentation from util.py
, which is reproduced below for convenience:
usage: rmg.py [-h] [-q | -v | -d] [-o DIR] [-r path/to/seed/] [-p] [-P]
[-t DD:HH:MM:SS] [-i MAXITER] [-n MAXPROC] [-k]
FILE
Reaction Mechanism Generator (RMG) is an automatic chemical reaction mechanism
generator that constructs kinetic models composed of elementary chemical
reaction steps using a general understanding of how molecules react.
positional arguments:
FILE a file describing the job to execute
optional arguments:
-h, --help show this help message and exit
-q, --quiet only print warnings and errors
-v, --verbose print more verbose output
-d, --debug print debug information
-o DIR, --output-directory DIR
use DIR as output directory
-r path/to/seed/, --restart path/to/seed/
restart RMG from a seed
-p, --profile run under cProfile to gather profiling statistics, and
postprocess them if job completes
-P, --postprocess postprocess profiling statistics from previous
[failed] run; does not run the simulation
-t DD:HH:MM:SS, --walltime DD:HH:MM:SS
set the maximum execution time
-i MAXITER, --maxiter MAXITER
set the maximum number of RMG iterations
-n MAXPROC, --maxproc MAXPROC
max number of processes used during reaction
generation
-k, --kineticsdatastore
output a folder, kinetics_database, that contains a
.txt file for each reaction family listing the
source(s) for each entry
Some representative example usages are shown below.
Run by restarting from a seed mechanism:
python-jl rmg.py -r path/to/seed/ input.py
Run with CPU time profiling:
python-jl rmg.py -p input.py
Run with multiprocessing for reaction generation and QMTP:
python-jl rmg.py -n <Max number of processes allowed> input.py
Run with setting a limit on the maximum execution time:
python-jl rmg.py -t <DD:HH:MM:SS> input.py
Run with setting a limit on the maximum number of iterations:
python-jl rmg.py -i <Max number of desired iterations> input.py
6.2. Details on the multiprocessing implementation¶
Currently, multiprocessing is implemented for reaction generation and the generation of QMfiles when using the QMTP option to compute thermodynamic properties of species. The processes are spawned and closed within each function. The number of processes is determined based on the ratio of currently available RAM and currently used RAM. The user can input the maximum number of allowed processes from the command line. For each reaction generation or QMTP call the number of processes will be the minimum value of either the number of allowed processes due to user input or the value obtained by the RAM ratio. The RAM limitation is employed, because multiprocessing is forking the base process and the memory limit (SWAP + RAM) might be exceeded when using too many processors for a base process large in memory.
6.3. Details on profiling RMG jobs¶
Here, we explain how to profile an RMG job. For starters, use the saveSeedModulus
option in the input file, as described in the Section Miscellaneous Options, to save the seed mechanism at regular intervals, perhaps every 50 or 100 iterations depending on the size of the mechanism. This option is particularly important for saving intermediate steps when working with large mechanisms; it may be prudent to save and examine how the chemistry changes over mechanism development rather than just obtaining the final seed mechanism.
These seeds can then be restarted with use of the -r
flag, as described in the Section Input Flags above. Additionally, restarting these seeds with the -i
flag allows examination of how computational effort, time spent in each module, individual processor memory consumption if using the the -n
flag, and overall memory consumption change over the course of mechanism development. To time profile, one could use:
rmg.py -r <path_to_seed>/seed -p -i 15 restart_from_seed.py
such that 15 iterations was arbitrarily chosen as a representative sample size to obtain profiling information. To run memory profiling, one option is to install a python memory profiler as an additional dependency. As detailed in their linked GitHub, there are options for line-by-line memory usage of small functions and for time-based memory usage. An example of memory profiling is:
mprof run --multiprocess rmg.py -r <path_to_seed>/seed -i 15 -n 3 restart_from_seed.py
such that this example demonstrates how to obtain memory consumption for each of three specified processes and again use 15 iterations to obtain representative profiling information. Please see the linked GitHub to learn more about how the memory profiler tool can help characterize your process.