User Manual¶

LigFlow¶

LigFlow handles the curation of compound libraries, stored as SMILES or MOL2 files, automating 3D conformer generation, compound parameterization and charge calculation. It also probes the ChemBase to avoid redundancy.

LigFlow does it all through a series functions designed to prepare the compound for DockFlow and ScoreFlow. LigFlow supports resuming of unfinished calculation.

.mol2 files are stored according to the following hierarchy, with file names determined by molecule name.

|--${project}.chemflow
|  |--LigFlow
|     |--original/${compound}.mol2
|     |--${charge/${compound}.mol2 (gas, bcc or resp charges)

gas - Gasteiger-Marsili charges ; bcc - Bond Charge Correction (AM1-BCC) ; resp - Restrained electrostatic fitted charges

Note

LigFlow uses /tmp/${molecule} during calculations, when running in parallel.

Step 1a - Starting from a .smi file. (SMILES)¶

Conversion of a SMILES library to 3D and conformer generation can be achieved through integration with RDKit, OpenBabel or Chemaxon’s molconvert (licence required), pick your favorite. A 3D structure for each compound will be generated and stored as individual ${compound}.mol2 file.

By default only the most probable tautomer for pH 7.0, and 3D conformer is generated, therefore users are highly encouraged to provide isomeric (ism) smiles or carefully inspect the output library to avoid mistakes.

Step 1b - Starting from a .mol2 file.¶

One should provide a complete .mol2 file, all-hydrogen, correct bond valences. PERIOD. LigFlow will split multimol2 files and store as individual ${compound}.mol2 files.

Tip

Chemical library curation is a crutial step. Consider using a specialized tool for compound tautomer and pka prediction.

Warning

LigFlow will never autogenerate names for your molecules, never. Make sure you provide proper input files.

Step 2 - Compound parameterization¶

Depending on the purpose, a different parameterization should take place. For docking, a Tripos .mol2 file sufices since DockFlow has specific routine to prepare it to the target software.

If one however chooses to use rescore a complex using more accurate free energy methods LigFlow automatizes the parameterization to the General Amber Force-Field (GAFF), and charge calculation through QM methods, either AM1 with BCC charges or HF/6-31G* with RESP charges. GAFF works great for small, drug-like molecules, but remember its a general force field.

Tip

For large screenings we recomend using less accurate BCC charges to prioritize compounds, migrating to more time consuming HF/6-31G* with RESP charges

Tip

To improve accuracy one must carefully parameterize each molecule, search for warnings in the ${molecule}.frcmod file.

Usage¶

To prepare a compound library for file ligand.mol2, for the project myproject use the command bellow. Make sure to choose the appropriate charge model for you project.

LigFlow -l ligand.mol2 -p myproject [--bcc] [--resp]

Options¶

The compound file name (.mol2 file) and project name are mandatory, and you’re done. Check the advanced options bellow.

[Help]
-h/--help           : Show this help message and quit
-hh/--full-help      : Detailed help

[Required]
-p/--project        : ChemFlow project.
-l/--ligand         : Ligands .mol2 input file.

Advanced options¶

These options let you better control the execution, including charge calculation, and parallel (local) or HPC execution. Refer to HPC Run topic for guidance on how to use a High Performance Computers.

[ Optional ]
--gas                  : Compute Gasteiger-Marsili charges
--bcc                  : Compute bcc charges
--resp                 : Compute resp charges

[ Parallel execution ]
-nc/--cores        INT : Number of cores per node [8]
--pbs/--slurm          : Workload manager, PBS or SLURM
--header          FILE : Header file provided to run on your cluster.

[ Development ]
--charges-file    FILE : Contains the net charges for all ligands in a library.
                        ( name charge )  ( CHEMBL123 -1 )

Note

RESP charges require a GAUSSIAN 09+ licence.

DockFlow¶

DockFlow covers docking and Virtual High Throughput Screening (vHTS) of compound(s) against a target (receptor) through the so far implemented docking software: Autodock Vina and PLANTS. The vHTS is efficiently distributed on the available computational resources.

Docking output files are stored according to the following hierarchy, with file names determined by molecule name.

|--${project}.ChemFlow
|  |--DockFlow
|     |--${project}/${receptor}/${protocol}/${compound}/ligand.out
|     |--${project}/${receptor}/${protocol}/${compound}/ligand.pdbqt (VINA)
|     |--${project}/${receptor}/${protocol}/${compound}/ligand.mol2  (PLANTS)

Usage¶

The user should first curate the compound library (.smi or .mol2) using LigFlow then provide that same input file. DockFlow only uses the molecule name from this file and gets all structural data from the LigFlow-generated library.

DockFlow -r receptor.mol2 -l ligand.mol2 -p myproject --center X Y Z [--protocol protocol-name] [-n 10] [-sf chemplp]

Note

Make sure to use the same project name and protocol.

Options¶

DockFlow requires the receptor and “ligands” files are required, together with the center of the binding site.

[Help]
-h/--help              : Show this help message and quit
-hh/--fullhelp         : Detailed help

[ Required ]
-p/--project       STR : ChemFlow project
-r/--receptor     FILE : Receptor MOL2 file
-l/--ligand       FILE : Ligands  MOL2 file
--center         X Y Z : Binding site coordinates (space separated)

Advanced options¶

These options let you better control the execution, including the scoring function and specific parameters for each implemented docking software. In addition has options to control the parallel (local) or HPC execution. Refer to HPC Run topic for guidance on how to use a High Performance Computers.

[ Post Processing ]
--postprocess          : Process DockFlow output for the specified
                         project/protocol/receptor.
--postprocess-all      : Process all DockFlow outputs in a ChemFlow project.
-n/--n-poses       INT : Number of docked poses to keep.
--archive              : Compress the docking folder for a project/protocol/receptor.
--archive-all          : Compress all docking folders in a ChemFLow project.

[ Optional ]
--protocol         STR : Name for this specific protocol [default]
-n/--n-poses       INT : Maximum number docking of poses per ligand [10]
-sf/--function     STR : vina, chemplp, plp, plp95  [chemplp]

[ Parallel execution ]
-nc/--cores        INT : Number of cores per node [${NCORES}]
--pbs/--slurm          : Workload manager, PBS or SLURM
--header          FILE : Header file provided to run on your cluster.

[ Additional ]
--overwrite            : Overwrite results
--yes                  : Yes to all questions
_________________________________________________________________________________
[ Options for docking program ]

[ PLANTS ]
--radius         FLOAT : Radius of the spheric binding site [15]
--speed            INT : Search speed for Plants. 1, 2 or 4 [1]
--ants             INT : Number of ants [20]
--evap_rate      FLOAT : Evaporation rate of pheromones [0.15]
--iter_scaling   FLOAT : Iteration scaling factor [1.0]
--cluster_rmsd   FLOAT : RMSD similarity threshold between poses, in Å [2.0]
--water           FILE : Path to a structural water molecule (.mol2)
--water_xyzr      LIST : xyz coordinates and radius of the water sphere, separated by a space
_________________________________________________________________________________
[ Vina ]
--size            LIST : Size of the grid along the x, y and z axis, separated by a space [15 15 15]
--exhaustiveness   INT : Exhaustiveness of the global search [8]
--energy_range   FLOAT : Max energy difference (kcal/mol) between the best and worst poses [3.00]
_________________________________________________________________________________

Options to Postprocess and Archive¶

Docking produces a number of poses and their associated energies, but each software does it their way. –postprocess[–all] standardizes the output to two files: docked_ligands.mol2 and DockFlow.csv.

|--${project}.ChemFlow
|  |--DockFlow
|     |--${project}/${receptor}/${protocol}/docked_ligands.mol2
|     |--${project}/${receptor}/${protocol}/DockFlow.csv

ScoreFlow¶

ScoreFlow is a bash script designed to work with PLANTS, Vina, IChem and AmberTools16+. It can perform a rescoring of molecular complexes such as protein-ligand

ScoreFlow requires a project folder named ‘myproject’.chemflow. If absent, one will be created.

Usage:¶

# For VINA and PLANTS scoring functions: ScoreFlow -r receptor.mol2 -l ligand.mol2 -p myproject –center X Y Z [–protocol protocol-name] [-sf vina] Usage:

# For MMGBSA only ScoreFlow -r receptor.pdb -l ligand.mol2 -p myproject [-protocol protocol-name] -sf mmgbsa

Options¶

[Help]
-h/--help           : Show this help message and quit
-hh/--fullhelp      : Detailed help

[Required]
-r/--receptor       : Receptor .mol2 or .pdb file.
-l/--ligand         : Ligands .mol2 input file.
-p/--project        : ChemFlow project.

Advanced Options¶

[ Required ]
-p/--project       STR : ChemFlow project
-r/--receptor     FILE : Receptor MOL2 file
-l/--ligand       FILE : Ligands  MOL2 file

[ Optional ]
--protocol         STR : Name for this specific protocol [default]
-sf/--function     STR : vina, chemplp, plp, plp95, mmgbsa, mmpbsa [chemplp]

[ Charges for ligands - MMGBSA ]
--gas                  : Gasteiger-Marsili (default)
--bcc                  : AM1-BCC charges
--resp                 : RESP charges (require gaussian)

[ Simulation - MMGBSA ]
--maxcyc           INT : Maximum number of energy minimization steps for implicit solvent simulations [1000]
--water                : Explicit solvent simulation
--md                   : Molecular dynamics

[ Parallel execution - MMGBSA ]
-nc/--cores        INT : Number of cores per node [${NCORES}]
--pbs/--slurm          : Workload manager, PBS or SLURM
--header          FILE : Header file provided to run on your cluster.
--write-only           : Write a template file (ScoreFlow.run.template) command without running.
--run-only             : Run using the ScoreFlow.run.template file.

[ Additional ]
--overwrite            : Overwrite results

[ Rescoring with vina or plants ]

--center           STR : xyz coordinates of the center of the binding site, separated by a space

[ PLANTS ]
--radius         FLOAT : Radius of the spheric binding site [15]

[ Vina ]
--size            LIST : Size of the grid along the x, y and z axis, separated by a space [15 15 15]
--vina-mode        STR : local_only (local search then score) or score_only [local_only]

[ Post Processing ]
--postprocess          : Process ScoreFlow output for the specified project/protocol/receptor.

Note: You can automatically get the center and radius/size
    for a particular ligand .mol2 file using the bounding_shape.py script

_________________________________________________________________________________

Advanced Use¶

By using the –write-only flag, all input files will be written in the following scheme: PROJECT.chemflow/ScoreFlow/PROTOCOL/receptor/

System Setup: One can customize the system setup (tleap.in) inside a job.
Simulation protocol: The procedures for each protocol can also be modified, the user must review “ScoreFlow.run.template”.

The run input files for Amber and MM(PB,GB)-SA, namely: min1.in, heat.in, equil.in, md.in … can also be manually modified at wish :) After the modifications, rerun ScoreFlow using –run-only. LigFlow ===========