Installation instructions¶
ARC can be installed on a server, as well as on your local desktop / laptop, submitting jobs to your server(s). The instructions below make this differentiation when relevant (the only difference is that ARC should be “aware” of software installed on the same machine, where the communication isn’t done via SSH).
- Note:
ARC was only tested on Linux (Ubuntu 18.04.1 and 20.04 LTS) and Mac machines. We don’t expect it to work smoothly on Windows machines.
- Note:
These installation instructions assume you already have access to a server with a cluster scheduling software (ARC currently supports SGE, Slurm, PBS, and HTCondor) and with properly installed electronic structure software (ARC currently supports Gaussian, QChem, Molpro, TeraChem, Orca, and Psi4). We further assume that you have some experience working with the server, e.g., writing appropriate submit scripts (example scripts are given, but modifications are usually required).
Clone and setup path¶
Download and install the Anaconda Python Platform for Python 3.7 or higher if you haven’t already.
Get git and appropriate compilers if you don’t have them already by typing
sudo apt install git gcc g++ make
in a terminal.Clone ARC’s repository to by typing the following command in the desired folder (e.g., under ~/Code/):
git clone https://github.com/ReactionMechanismGenerator/ARC.git
Add ARC to your local path in
.bashrc
(make sure to change~/Path/to/ARC/
accordingly):export PYTHONPATH=$PYTHONPATH:~/Path/to/ARC/
Install dependencies¶
Create the Anaconda environment for ARC (after changing the directory to the installation folder by, e.g.,
cd ~/Code/ARC/
):conda env create -f environment.yml
Activate the ARC environment every time before you run ARC:
conda activate arc_env
Install the latest DEVELOPER version of RMG (which has Arkane). It is recommended to follow RMG’s Developer installation by source using Anaconda instructions. Make sure to add RMG-Py to your PATH and PYTHONPATH variables as explained in RMG’s documentation.
Type
make install-all
under the ARC repository folder to install the following 3rd party repositories: AutoTST (West et al.), KinBot (Van de Vijver et al.), and TS-GCN (Pattanaik et al.). Note that this should be done on each machine ARC is expected to be executed on. For advanced users: ARC will look for the paths to these repos. If your path on a server in not conventional, you can assist ARC discover your external repos insettings.py
.Test ARC by typing
make test
under the ARC folder after activating the anaconda arc_env environment.
Create a .arc
folder (optional but recommended)¶
Users are encouraged to create a .arc
folder under their HOME
folder on the machine running ARC.
Copy (and modify as appropriate, see below) the following python files
from the ARC repository into the newly created folder:
<base_folder>/ARC/arc/settings/settings.py
–> HOME/.arc/settings.py
<base_folder>/ARC/arc/settings/input.py
–> HOME/.arc/input.py
<base_folder>/ARC/arc/settings/submit.py
–> HOME/.arc/submit.py
By doing this, ARC will use the respective settings and definitions from the files under .arc
to override its defaults. Users many (carefully) modify the definitions in the local files
as appropriate. Note that you may choose to copy only some of these files, in which case the
definitions from any non-copied files will be taken from ARC’s defaults (e.g., most users will
not need to modify input.py
). Note also that definitions within these files may be partial
(i.e., you may keep only those parameters you may wish to change within each file), and that any
missing parameter will be assigned its default value from ARC’s defaults. It is recommended to
only keep parameters you modified in settings.py
, so that other parameters will be updated
as you update the ARC repository in the future. This recommendation helps avoid merge conflicts
when updating ARC, and allows a single ARC instance on a server to be used by different users
with different preferences.
Principally, ARC would also work fine if users directly change the respective files within ARC’s repository instead of making copies. However, modifying the files in ARC directly may cause merging conflicts when updating ARC. The down side is that users are responsible to keep their copies up to date with ARC’s format if major changes are made. Such changes will be listed under the Release Notes and will result in an increase of the MINOR version number (i.e., ,major.MINOR.patch, e.g., 1.1.5 –> 1.2.0).
Generating RSA SSH keys and defining servers¶
The first two directives are only required if you’d like ARC to access remote servers (ARC could also run “locally” when it’s installed on a server).
Generate RSA SSH keys for your favorite server(s) on which relevant electronic structure software (ESS, e.g., Gaussian) are installed. Instructions for generating RSA keys could be found here.
Copy the RSA SSH key path(s) on your local machine to
settings.py
in theservers
dictionary under keys.Update the
servers
dictionary in your copy of ARC’s settings.py.A local server must be named with the reserved keyword ``local``.
cluster_soft
and username (un
) are mandatory.A remote server has no limitations for naming.
cluster_soft
,address
, username (un
), andkey
(the path to the local RSA SSH key) are mandatory.Optional parameters for both local and remote servers are
cpus
,memory
, andpath
. The first two parameters stand for the maximum amount of cpu cores and memory in GB available on a node. If a job crashes due to cpu or memory issues, ARC will automatically re-run the job with different cpu and memory allocations within the specified limitations. By default,cpus
is 8 andmemory
is 14 GB. The optionalpath
key is used if it is necessary to direct ARC to create project files in a specific location on the server. E.g., if the value/storage/group_name/
is given forpath
, ARC will create project files under/storage/group_name/$USER/runs/ARC_Projects/project_name/
.Although ARC currently does not allocate computing resources dynamically based on system size or ESS, the user can manually control memory specifications for each project. See Advanced Features for details.
A default ESS job in ARC has 14 GB of memory, 8 cpu cores, and 120 hours of maximum execution time. The default settings can be changed by providing different values to the
job_total_memory_gb
,job_cpu_cores
, andjob_time_limit_hrs
keys in thedefault_job_settings
dictionary undersettings.py
.ARC will alter job memory, cpu, and time settings when troubleshoot jobs crashed due to resource allocation issues. The
job_max_server_node_memory_allocation
key stands for the maximum percentage of total node memory ARC will use when troubleshoot a job. The default value is 80%.
Update the submit scripts in your copy of ARC’s submit.py according to your servers’ definitions. * See the given template examples, and follow the structure of nested dictionaries (by server name, then by ESS name). * Preserve the variables in curly braces (e.g.,
{memory}
), so that ARC is able to auto-fill them.
Associating software with servers¶
ARC keeps track of software location on servers using a Python dictionary associating the different software (keys)
with the servers they are installed on (values). The server name must be consistent with the respective definition
in the servers
dictionary mentioned above. Typically, you would update the global_ess_settings
dictionary in
your copy of ARC’s
settings.py
to reflect your software and servers, for example:
global_ess_settings = {
'gaussian': ['server1', 'server2'],
'molpro': 'server2',
'qchem': 'local',
}
Note that the above example reflects a situation where QChem in installed on the same machine as ARC, while Gaussian and Molpro are installed on different servers ARC has access to. You can of course make any combination as you’d like. The servers can be listed as a simple string for a single server, or as a list for multiple servers, where relevant.
These global settings are used by default unless ARC is given an ess_settings
dictionary through an input file
or the API, thus allowing more flexibility when running several instances of ARC simultaneously (e.g., if Gaussian is
installed on two servers, where one has more memory in its nodes, the user can request ARC to use that specific server
for the more memory-intensive jobs). More about the ess_settings
dictionary can be found in the
Advanced Features section of the documentation.
If neither global_ess_settings
(in settings.py) nor ess_settings
(via an input file or the API) are specified,
ARC will use its “radar” feature to “scans” the servers it has access to, and assign relevant ESS it is familiar with
to the respective server. In order for this feature to function properly, make sure your .bashrc file on the remote
servers does not have an interactive shell check. If it does, disable it.
It is recommended, though, to use the global_ess_settings
and/or ess_settings
dictionaries rather than allowing
the “radar” to do its thing blindly. The “radar” feature, however, is very useful for diagnostics
(see Tests below).
You can check what the “radar” detects using the ARC ESS diagnostics notebook.
Cluster software definitions¶
ARC supports Slurm and Oracle/Sun Grid Engine (OGE / SGE). If you’re using other cluster software, or if your server’s definitions are different that ARC’s, you should also modify the following variables in your copy of ARC’s settings.py:
check_status_command
submit_command
delete_command
list_available_nodes_command
submit_filename
t_max_format
You will find the values for check_status_command
, submit_command
, delete_command
, and
list_available_nodes_command
by typing on the respective server the which command, e.g.:
which sbatch
If you have different servers with the same cluster software that have different cluster software definitions, just name
them differently, e.g., Slurm1 and Slurm2, and make sure to pair them accordingly under the servers
dictionary.
Tests¶
If you’d like to make sure ARC has access to your servers and recognises your ESS, use the “radar” tool, available as an iPython notebook (see Standalone tools).
Run the minimal example (see Examples), and a couple more examples, if you’d like, using both input files and the API (via iPython notebooks or any other method).
Run ARC’s unit tests. Note that for all tests to pass, ARC expects to find the unmodified settings in settings.py. If you made changes to any settings instead of creating respective files under a local
.arc.
folder, it is recommended to first stash your changes (git stash
). To run the tests, type:make test
After the tests complete, you may unstash your changes, if relevant (
git stash pop
).In addition, functional tests are helpful in making sure that ARC is installed and functioning correctly.
Again, before performing these tests, it is reccommended to first stash your changes (
git stash
).To trigger the functional tests, type:
make test-functional
After the tests complete, you may unstash your changes, if relevant (
git stash pop
).
Optional: Add ARC aliases to your .bashrc (for convenience)¶
Below are optional aliases to make ARC (even) more convenient (make sure to change ~/Path/to/ARC/ accordingly).
Add these to your .bashrc
file (edit it by typing, e.g., nano ~/.bashrc
):
export arc_path=$HOME'/Path/to/ARC/'
alias arce='source activate arc_env'
alias arc='python $arc_path/ARC.py input.yml'
alias arcrestart='python $arc_path/ARC.py restart.yml'
alias arcode='cd $arc_path'
alias j='cd $arc_path/ipython/ && jupyter notebook'
Updating ARC¶
ARC is being updated frequently. Make sure to update ARC and enjoy new features and bug fixes.
- Note:
If you change ARC’s parameters within the repository rather than copies thereof as explained above, it is highly recommended to backup the files you manually changed before updating ARC. These are usually ARC/arc/settings/settings.py and ARC/arc/settings/submit.py.
You can update ARC to a specific version, or to the most recent developer version. To get the most recent developer version, do the following (and make sure to change ~/Path/to/ARC/ accordingly):
cd ~/Path/to/ARC/
git stash
git fetch origin
git pull origin main
git stash pop
The above will update your main branch of ARC.
To update to a specific version (e.g., version 1.1.0), do the following (and make sure to change ~/Path/to/ARC/ accordingly):
cd ~/Path/to/ARC/
git stash
git fetch origin
git checkout tags/1.1.0 -b v1.1.0
git stash pop
The above will create a v1.1.0 branch which replicates the stable 1.1.0 version.
Note: This process might cause merge conflicts if the updated version (either the developer version or a stable version) changes a file you changed locally. Although we try to avoid causing merge conflicts for ARC’s users as much as we can, it could still sometimes happen. You’ll identify a merge conflict if git prints a message similar to this:
$ git merge BRANCH-NAME
> Auto-merging settings.py
> CONFLICT (content): Merge conflict in styleguide.md
> Automatic merge failed; fix conflicts and then commit the result
Detailed steps to resolve a git merge conflict can be found online.
Principally, you should open the files that have merge conflicts, and look for the following markings:
<<<<<<< HEAD
this is some content introduced by updating ARC
=======
totally different content the user added, adding different changes
to the same lines that were also updated remotely
>>>>>>> new_branch_to_merge_later
Resolving a merge conflict consists of three stages:
determine which version of the code you’d like to keep (usually you should manually append your oun changes to the more updated ARC code). Make the changes and get rid of the unneeded
<<<<<<< HEAD
,=======
, and>>>>>>> new_branch_to_merge_later
markings. Repeat for all conflicts.Stage the changed by typing:
git add .
If you don’t plan to commit your changes, unstage them by typing:
git reset --soft origin/main