Installation instructions

ARC can be installed on a server, as well as on your local desktop / laptop, submitting jobs to your server(s). The instructions below make this differentiation when relevant (the only difference is that ARC should be “aware” of software installed on the same machine, where the communication isn’t done via SSH).

Note:: ARC was only tested on Linux (Ubuntu 18.04.1 and 20.04 LTS) and Mac machines. We don’t expect it to work smoothly on Windows machines.
Note:: These installation instructions assume you already have access to a server with a cluster scheduling software (ARC currently supports SGE, Slurm, PBS, and HTCondor) and with properly installed electronic structure software (ARC currently supports Gaussian, QChem, Molpro, TeraChem, Orca, and Psi4). We further assume that you have some experience working with the server, e.g., writing appropriate submit scripts (example scripts are given, but modifications are usually required).

Clone and setup path

Download and install the Anaconda Python Platform for Python 3.7 or higher if you haven’t already.
Get git and appropriate compilers if you don’t have them already by typing sudo apt install git gcc g++ make in a terminal.
Clone ARC’s repository to by typing the following command in the desired folder (e.g., under ~/Code/):
```
git clone https://github.com/ReactionMechanismGenerator/ARC.git
```
Add ARC to your local path in .bashrc (make sure to change ~/Path/to/ARC/ accordingly):
```
export PYTHONPATH=$PYTHONPATH:~/Path/to/ARC/
```

Install dependencies

Create the Anaconda environment for ARC (after changing the directory to the installation folder by, e.g., cd ~/Code/ARC/):
```
conda env create -f environment.yml
```
Activate the ARC environment every time before you run ARC:
```
conda activate arc_env
```
Install the latest DEVELOPER version of RMG (which has Arkane). It is recommended to follow RMG’s Developer installation by source using Anaconda instructions. Make sure to add RMG-Py to your PATH and PYTHONPATH variables as explained in RMG’s documentation.
Type make install-all under the ARC repository folder to install the following 3rd party repositories: AutoTST (West et al.), KinBot (Van de Vijver et al.), and TS-GCN (Pattanaik et al.). Note that this should be done on each machine ARC is expected to be executed on. For advanced users: ARC will look for the paths to these repos. If your path on a server in not conventional, you can assist ARC discover your external repos in settings.py.
Test ARC by typing make test under the ARC folder after activating the anaconda arc_env environment.

Create a `.arc` folder (optional but recommended)

Users are encouraged to create a .arc folder under their HOME folder on the machine running ARC. Copy (and modify as appropriate, see below) the following python files from the ARC repository into the newly created folder: <base_folder>/ARC/arc/settings/settings.py –> HOME/.arc/settings.py <base_folder>/ARC/arc/settings/input.py –> HOME/.arc/input.py <base_folder>/ARC/arc/settings/submit.py –> HOME/.arc/submit.py

By doing this, ARC will use the respective settings and definitions from the files under .arc to override its defaults. Users many (carefully) modify the definitions in the local files as appropriate. Note that you may choose to copy only some of these files, in which case the definitions from any non-copied files will be taken from ARC’s defaults (e.g., most users will not need to modify input.py). Note also that definitions within these files may be partial (i.e., you may keep only those parameters you may wish to change within each file), and that any missing parameter will be assigned its default value from ARC’s defaults. It is recommended to only keep parameters you modified in settings.py, so that other parameters will be updated as you update the ARC repository in the future. This recommendation helps avoid merge conflicts when updating ARC, and allows a single ARC instance on a server to be used by different users with different preferences.

Principally, ARC would also work fine if users directly change the respective files within ARC’s repository instead of making copies. However, modifying the files in ARC directly may cause merging conflicts when updating ARC. The down side is that users are responsible to keep their copies up to date with ARC’s format if major changes are made. Such changes will be listed under the Release Notes and will result in an increase of the MINOR version number (i.e., ,major.MINOR.patch, e.g., 1.1.5 –> 1.2.0).

Generating RSA SSH keys and defining servers

The first two directives are only required if you’d like ARC to access remote servers (ARC could also run “locally” when it’s installed on a server).

Generate RSA SSH keys for your favorite server(s) on which relevant electronic structure software (ESS, e.g., Gaussian) are installed. Instructions for generating RSA keys could be found here.
Copy the RSA SSH key path(s) on your local machine to settings.py in the servers dictionary under keys.
Update the servers dictionary in your copy of ARC’s settings.py.
- A local server must be named with the reserved keyword ``local``. cluster_soft and username (un) are mandatory.
- A remote server has no limitations for naming. cluster_soft, address, username (un), and key (the path to the local RSA SSH key) are mandatory.
- Optional parameters for both local and remote servers are cpus, memory, and path. The first two parameters stand for the maximum amount of cpu cores and memory in GB available on a node. If a job crashes due to cpu or memory issues, ARC will automatically re-run the job with different cpu and memory allocations within the specified limitations. By default, cpus is 8 and memory is 14 GB. The optional path key is used if it is necessary to direct ARC to create project files in a specific location on the server. E.g., if the value /storage/group_name/ is given for path, ARC will create project files under /storage/group_name/$USER/runs/ARC_Projects/project_name/.
- Although ARC currently does not allocate computing resources dynamically based on system size or ESS, the user can manually control memory specifications for each project. See Advanced Features for details.
- A default ESS job in ARC has 14 GB of memory, 8 cpu cores, and 120 hours of maximum execution time. The default settings can be changed by providing different values to the job_total_memory_gb, job_cpu_cores, and job_time_limit_hrs keys in the default_job_settings dictionary under settings.py.
- ARC will alter job memory, cpu, and time settings when troubleshoot jobs crashed due to resource allocation issues. The job_max_server_node_memory_allocation key stands for the maximum percentage of total node memory ARC will use when troubleshoot a job. The default value is 80%.
Update the submit scripts in your copy of ARC’s submit.py according to your servers’ definitions. * See the given template examples, and follow the structure of nested dictionaries (by server name, then by ESS name). * Preserve the variables in curly braces (e.g., {memory}), so that ARC is able to auto-fill them.

Associating software with servers

ARC keeps track of software location on servers using a Python dictionary associating the different software (keys) with the servers they are installed on (values). The server name must be consistent with the respective definition in the servers dictionary mentioned above. Typically, you would update the global_ess_settings dictionary in your copy of ARC’s settings.py to reflect your software and servers, for example:

global_ess_settings = {
    'gaussian': ['server1', 'server2'],
    'molpro': 'server2',
    'qchem': 'local',
}

Note that the above example reflects a situation where QChem in installed on the same machine as ARC, while Gaussian and Molpro are installed on different servers ARC has access to. You can of course make any combination as you’d like. The servers can be listed as a simple string for a single server, or as a list for multiple servers, where relevant.

These global settings are used by default unless ARC is given an ess_settings dictionary through an input file or the API, thus allowing more flexibility when running several instances of ARC simultaneously (e.g., if Gaussian is installed on two servers, where one has more memory in its nodes, the user can request ARC to use that specific server for the more memory-intensive jobs). More about the ess_settings dictionary can be found in the Advanced Features section of the documentation.

If neither global_ess_settings (in settings.py) nor ess_settings (via an input file or the API) are specified, ARC will use its “radar” feature to “scans” the servers it has access to, and assign relevant ESS it is familiar with to the respective server. In order for this feature to function properly, make sure your .bashrc file on the remote servers does not have an interactive shell check. If it does, disable it.

It is recommended, though, to use the global_ess_settings and/or ess_settings dictionaries rather than allowing the “radar” to do its thing blindly. The “radar” feature, however, is very useful for diagnostics (see Tests below).

You can check what the “radar” detects using the ARC ESS diagnostics notebook.

Cluster software definitions

ARC supports Slurm and Oracle/Sun Grid Engine (OGE / SGE). If you’re using other cluster software, or if your server’s definitions are different that ARC’s, you should also modify the following variables in your copy of ARC’s settings.py:

check_status_command
submit_command
delete_command
list_available_nodes_command
submit_filename
t_max_format

You will find the values for check_status_command, submit_command, delete_command, and list_available_nodes_command by typing on the respective server the which command, e.g.:

which sbatch

If you have different servers with the same cluster software that have different cluster software definitions, just name them differently, e.g., Slurm1 and Slurm2, and make sure to pair them accordingly under the servers dictionary.

Tests

If you’d like to make sure ARC has access to your servers and recognises your ESS, use the “radar” tool, available as an iPython notebook (see Standalone tools).
Run the minimal example (see Examples), and a couple more examples, if you’d like, using both input files and the API (via iPython notebooks or any other method).
Run ARC’s unit tests. Note that for all tests to pass, ARC expects to find the unmodified settings in settings.py. If you made changes to any settings instead of creating respective files under a local .arc. folder, it is recommended to first stash your changes (git stash). To run the tests, type:
```
make test
```
After the tests complete, you may unstash your changes, if relevant (git stash pop).

In addition, functional tests are helpful in making sure that ARC is installed and functioning correctly.

Again, before performing these tests, it is reccommended to first stash your changes (git stash).

To trigger the functional tests, type:
```
make test-functional
```
After the tests complete, you may unstash your changes, if relevant (git stash pop).

Optional: Add ARC aliases to your .bashrc (for convenience)

Below are optional aliases to make ARC (even) more convenient (make sure to change ~/Path/to/ARC/ accordingly). Add these to your .bashrc file (edit it by typing, e.g., nano ~/.bashrc):

export arc_path=$HOME'/Path/to/ARC/'
alias arce='source activate arc_env'
alias arc='python $arc_path/ARC.py input.yml'
alias arcrestart='python $arc_path/ARC.py restart.yml'
alias arcode='cd $arc_path'
alias j='cd $arc_path/ipython/ && jupyter notebook'

Updating ARC

ARC is being updated frequently. Make sure to update ARC and enjoy new features and bug fixes.

Note:: If you change ARC’s parameters within the repository rather than copies thereof as explained above, it is highly recommended to backup the files you manually changed before updating ARC. These are usually ARC/arc/settings/settings.py and ARC/arc/settings/submit.py.

You can update ARC to a specific version, or to the most recent developer version. To get the most recent developer version, do the following (and make sure to change ~/Path/to/ARC/ accordingly):

cd ~/Path/to/ARC/
git stash
git fetch origin
git pull origin main
git stash pop

The above will update your main branch of ARC.

To update to a specific version (e.g., version 1.1.0), do the following (and make sure to change ~/Path/to/ARC/ accordingly):

cd ~/Path/to/ARC/
git stash
git fetch origin
git checkout tags/1.1.0 -b v1.1.0
git stash pop

The above will create a v1.1.0 branch which replicates the stable 1.1.0 version.

Note: This process might cause merge conflicts if the updated version (either the developer version or a stable version) changes a file you changed locally. Although we try to avoid causing merge conflicts for ARC’s users as much as we can, it could still sometimes happen. You’ll identify a merge conflict if git prints a message similar to this:

$ git merge BRANCH-NAME
> Auto-merging settings.py
> CONFLICT (content): Merge conflict in styleguide.md
> Automatic merge failed; fix conflicts and then commit the result

Detailed steps to resolve a git merge conflict can be found online.

Principally, you should open the files that have merge conflicts, and look for the following markings:

<<<<<<< HEAD
this is some content introduced by updating ARC
=======
totally different content the user added, adding different changes
to the same lines that were also updated remotely
>>>>>>> new_branch_to_merge_later

Resolving a merge conflict consists of three stages:

determine which version of the code you’d like to keep (usually you should manually append your oun changes to the more updated ARC code). Make the changes and get rid of the unneeded <<<<<<< HEAD, =======, and >>>>>>> new_branch_to_merge_later markings. Repeat for all conflicts.
Stage the changed by typing: git add .
If you don’t plan to commit your changes, unstage them by typing: git reset --soft origin/main