.. _installation: Installation instructions ========================= ARC can be installed on a server, as well as on your local desktop / laptop, submitting jobs to your server(s). The instructions below make this differentiation when relevant (the only difference is that ARC should be "aware" of software installed on the same machine, where the communication isn't done via SSH). Note: ARC was only tested on Linux (Ubuntu_ 18.04.1 and 20.04 LTS) and Mac machines. We don't expect it to work smoothly on Windows machines. Note: These installation instructions assume you already have access to a server with a cluster scheduling software (ARC currently supports SGE, Slurm, PBS, and HTCondor) and with properly installed electronic structure software (ARC currently supports Gaussian, QChem, Molpro, TeraChem, Orca, and Psi4). We further assume that you have some experience working with the server, e.g., writing appropriate submit scripts (example scripts are given, but modifications are usually required). .. _path: Clone and setup path ^^^^^^^^^^^^^^^^^^^^ - Download and install the `Anaconda Python Platform`__ for Python 3.7 or higher if you haven't already. - Get git and appropriate compilers if you don't have them already by typing ``sudo apt install git gcc g++ make`` in a terminal. - Clone ARC's repository to by typing the following command in the desired folder (e.g., under `~/Code/`):: git clone https://github.com/ReactionMechanismGenerator/ARC.git - Add ARC to your local path in ``.bashrc`` (make sure to change ``~/Path/to/ARC/`` accordingly):: export PYTHONPATH=$PYTHONPATH:~/Path/to/ARC/ __ anaconda_ .. _arce: Install dependencies ^^^^^^^^^^^^^^^^^^^^ - Create the Anaconda environment for ARC (after changing the directory to the installation folder by, e.g., ``cd ~/Code/ARC/``):: conda env create -f environment.yml Activate the ARC environment every time before you run ARC:: conda activate arc_env - Install the latest **DEVELOPER** version of RMG (which has Arkane). It is recommended to follow RMG's `Developer installation by source using Anaconda `_ instructions. Make sure to add RMG-Py to your PATH and PYTHONPATH variables as explained in RMG's documentation. - Type ``make install-all`` under the ARC repository folder to install the following 3rd party repositories: `AutoTST `_ (`West et al. `_), `KinBot `_ (`Van de Vijver et al. `_), and `TS-GCN `_ (`Pattanaik et al. `_). Note that this should be done on each machine ARC is expected to be executed on. For advanced users: ARC will look for the paths to these repos. If your path on a server in not conventional, you can assist ARC discover your external repos in ``settings.py``. - Test ARC by typing ``make test`` under the ARC folder after activating the anaconda `arc_env` environment. Create a ``.arc`` folder (optional but recommended) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Users are encouraged to create a ``.arc`` folder under their ``HOME`` folder on the machine running ARC. Copy (and modify as appropriate, see below) the following python files from the ARC repository into the newly created folder: ``/ARC/arc/settings/settings.py`` --> ``HOME/.arc/settings.py`` ``/ARC/arc/settings/input.py`` --> ``HOME/.arc/input.py`` ``/ARC/arc/settings/submit.py`` --> ``HOME/.arc/submit.py`` By doing this, ARC will use the respective settings and definitions from the files under ``.arc`` to override its defaults. Users many (carefully) modify the definitions in the local files as appropriate. Note that you may choose to copy only some of these files, in which case the definitions from any non-copied files will be taken from ARC's defaults (e.g., most users will not need to modify ``input.py``). Note also that definitions within these files may be partial (i.e., you may keep only those parameters you may wish to change within each file), and that any missing parameter will be assigned its default value from ARC's defaults. It is recommended to only keep parameters you modified in ``settings.py``, so that other parameters will be updated as you update the ARC repository in the future. This recommendation helps avoid merge conflicts when updating ARC, and allows a single ARC instance on a server to be used by different users with different preferences. Principally, ARC would also work fine if users directly change the respective files within ARC's repository instead of making copies. However, modifying the files in ARC directly may cause merging conflicts when updating ARC. The down side is that users are responsible to keep their copies up to date with ARC's format if major changes are made. Such changes will be listed under the Release Notes and will result in an increase of the MINOR version number (i.e., ,major.MINOR.patch, e.g., `1.1.5` --> `1.2.0`). Generating RSA SSH keys and defining servers ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The first two directives are only required if you'd like ARC to access remote servers (ARC could also run "locally" when it's installed on a server). - Generate RSA SSH keys for your favorite server(s) on which relevant electronic structure software (ESS, e.g., Gaussian) are installed. `Instructions for generating RSA keys could be found here `_. - Copy the RSA SSH key path(s) on your local machine to ``settings.py`` in the ``servers`` dictionary under keys. - Update the ``servers`` dictionary in your copy of ARC's `settings.py `_. * **A local server must be named with the reserved keyword ``local``**. ``cluster_soft`` and username (``un``) are mandatory. * A remote server has no limitations for naming. ``cluster_soft``, ``address``, username (``un``), and ``key`` (the path to the local RSA SSH key) are mandatory. * Optional parameters for both local and remote servers are ``cpus``, ``memory``, and ``path``. The first two parameters stand for the maximum amount of cpu cores and memory in GB available on a node. If a job crashes due to cpu or memory issues, ARC will automatically re-run the job with different cpu and memory allocations within the specified limitations. By default, ``cpus`` is 8 and ``memory`` is 14 GB. The optional ``path`` key is used if it is necessary to direct ARC to create project files in a specific location on the server. E.g., if the value ``/storage/group_name/`` is given for ``path``, ARC will create project files under ``/storage/group_name/$USER/runs/ARC_Projects/project_name/``. * Although ARC currently does not allocate computing resources dynamically based on system size or ESS, the user can manually control memory specifications for each project. See :ref:`Advanced Features ` for details. * A default ESS job in ARC has 14 GB of memory, 8 cpu cores, and 120 hours of maximum execution time. The default settings can be changed by providing different values to the ``job_total_memory_gb``, ``job_cpu_cores``, and ``job_time_limit_hrs`` keys in the ``default_job_settings`` dictionary under ``settings.py``. * ARC will alter job memory, cpu, and time settings when troubleshoot jobs crashed due to resource allocation issues. The ``job_max_server_node_memory_allocation`` key stands for the maximum percentage of total node memory ARC will use when troubleshoot a job. The default value is 80%. - Update the submit scripts in your copy of ARC's `submit.py `_ according to your servers' definitions. * See the given template examples, and follow the structure of nested dictionaries (by server name, then by ESS name). * Preserve the variables in curly braces (e.g., ``{memory}``), so that ARC is able to auto-fill them. Associating software with servers ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ARC keeps track of software location on servers using a Python dictionary associating the different software (keys) with the servers they are installed on (values). The server name must be consistent with the respective definition in the ``servers`` dictionary mentioned above. Typically, you would update the ``global_ess_settings`` dictionary in your copy of ARC's `settings.py `_ to reflect your software and servers, for example:: global_ess_settings = { 'gaussian': ['server1', 'server2'], 'molpro': 'server2', 'qchem': 'local', } Note that the above example reflects a situation where QChem in installed on the same machine as ARC, while Gaussian and Molpro are installed on different servers ARC has access to. You can of course make any combination as you'd like. The servers can be listed as a simple string for a single server, or as a list for multiple servers, where relevant. These global settings are used by default unless ARC is given an ``ess_settings`` dictionary through an input file or the API, thus allowing more flexibility when running several instances of ARC simultaneously (e.g., if Gaussian is installed on two servers, where one has more memory in its nodes, the user can request ARC to use that specific server for the more memory-intensive jobs). More about the ``ess_settings`` dictionary can be found in the :ref:`Advanced Features ` section of the documentation. If neither ``global_ess_settings`` (in settings.py) nor ``ess_settings`` (via an input file or the API) are specified, ARC will use its "radar" feature to "scans" the servers it has access to, and assign relevant ESS it is familiar with to the respective server. In order for this feature to function properly, make sure your .bashrc file on the remote server\s does not have an interactive shell check. If it does, disable it. It is recommended, though, to use the ``global_ess_settings`` and/or ``ess_settings`` dictionaries rather than allowing the "radar" to do its thing blindly. The "radar" feature, however, is very useful for diagnostics (see Tests_ below). You can check what the "radar" detects using the ARC ESS diagnostics notebook. Cluster software definitions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ARC supports Slurm and Oracle/Sun Grid Engine (OGE / SGE). If you're using other `cluster software`__, or if your server's definitions are different that ARC's, you should also modify the following variables in your copy of ARC's `settings.py `_: - ``check_status_command`` - ``submit_command`` - ``delete_command`` - ``list_available_nodes_command`` - ``submit_filename`` - ``t_max_format`` __ cluster_ You will find the values for ``check_status_command``, ``submit_command``, ``delete_command``, and ``list_available_nodes_command`` by typing on the respective server the `which` command, e.g.:: which sbatch If you have different servers with the same cluster software that have different cluster software definitions, just name them differently, e.g., `Slurm1` and `Slurm2`, and make sure to pair them accordingly under the ``servers`` dictionary. .. _Tests: Tests ^^^^^ - If you'd like to make sure ARC has access to your servers and recognises your ESS, use the "radar" tool, available as an iPython notebook (see :ref:`Standalone tools `). - Run the minimal example (see :ref:`Examples `), and a couple more examples, if you'd like, using both input files and the API (via iPython notebooks or any other method). - Run ARC's unit tests. Note that for all tests to pass, ARC expects to find the unmodified settings in settings.py. If you made changes to any settings instead of creating respective files under a local ``.arc.`` folder, it is recommended to first stash your changes (``git stash``). To run the tests, type:: make test After the tests complete, you may unstash your changes, if relevant (``git stash pop``). In addition, functional tests are helpful in making sure that ARC is installed and functioning correctly. Again, before performing these tests, it is reccommended to first stash your changes (``git stash``). To trigger the functional tests, type:: make test-functional After the tests complete, you may unstash your changes, if relevant (``git stash pop``). .. _aliases: Optional: Add ARC aliases to your .bashrc (for convenience) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Below are optional aliases to make ARC (even) more convenient (make sure to change `~/Path/to/ARC/` accordingly). Add these to your ``.bashrc`` file (edit it by typing, e.g., ``nano ~/.bashrc``):: export arc_path=$HOME'/Path/to/ARC/' alias arce='source activate arc_env' alias arc='python $arc_path/ARC.py input.yml' alias arcrestart='python $arc_path/ARC.py restart.yml' alias arcode='cd $arc_path' alias j='cd $arc_path/ipython/ && jupyter notebook' Updating ARC ^^^^^^^^^^^^ ARC is being updated frequently. Make sure to update ARC and enjoy new features and bug fixes. Note: If you change ARC's parameters within the repository rather than copies thereof as explained above, it is highly recommended to backup the files you manually changed before updating ARC. These are usually `ARC/arc/settings/settings.py` and `ARC/arc/settings/submit.py`. You can update ARC to a specific version, or to the most recent developer version. To get the most recent developer version, do the following (and make sure to change `~/Path/to/ARC/` accordingly):: cd ~/Path/to/ARC/ git stash git fetch origin git pull origin main git stash pop The above will update your `main` branch of ARC. To update to a specific version (e.g., version 1.1.0), do the following (and make sure to change `~/Path/to/ARC/` accordingly):: cd ~/Path/to/ARC/ git stash git fetch origin git checkout tags/1.1.0 -b v1.1.0 git stash pop The above will create a `v1.1.0` branch which replicates the stable 1.1.0 version. **Note:** This process might cause merge conflicts if the updated version (either the developer version or a stable version) changes a file you changed locally. Although we try to avoid causing merge conflicts for ARC's users as much as we can, it could still sometimes happen. You'll identify a merge conflict if git prints a message similar to this:: $ git merge BRANCH-NAME > Auto-merging settings.py > CONFLICT (content): Merge conflict in styleguide.md > Automatic merge failed; fix conflicts and then commit the result Detailed steps to resolve a git merge conflict can be found `online`__. __ mergeConflict_ Principally, you should open the files that have merge conflicts, and look for the following markings:: <<<<<<< HEAD this is some content introduced by updating ARC ======= totally different content the user added, adding different changes to the same lines that were also updated remotely >>>>>>> new_branch_to_merge_later Resolving a merge conflict consists of three stages: - determine which version of the code you'd like to keep (usually you should manually append your oun changes to the more updated ARC code). Make the changes and get rid of the unneeded ``<<<<<<< HEAD``, ``=======``, and ``>>>>>>> new_branch_to_merge_later`` markings. Repeat for all conflicts. - Stage the changed by typing: ``git add .`` - If you don't plan to commit your changes, unstage them by typing: ``git reset --soft origin/main`` .. include:: links.txt