DLPNO-CCSD(T): Domain-Based Local Pair Natural Orbital CCSD(T)¶
Code author: Andy Jiang
Section author: Andy Jiang
Module: Keywords, PSI Variables, DLPNOCCSD(T)
Introduction¶
The CCSD(T) method is considered the “gold standard” method of quantum chemistry, often yielding chemically accurate (< 1 kcal/mol) results relative to experiment or full configuration interaction (FCI) computations. Unfortunately, the high \(\mathcal{O}(N^{7})\) scaling of the canonical gold-standard CCSD(T) method limits its applicability for larger molecules.
The domain-based local pair natural orbital (DLPNO)-CCSD(T) approach of Neese and coworkers [Riplinger:2013:034106] [Riplinger:2013:134101] [Riplinger:2016:024109] [Guo:2018:011101] uses localized orbitals to exploit molecular sparsity for a linear-scaling approximation to canonical CCSD(T). These errors are controllable through user set parameters. Computations on large molecules, such as crambin (636 atoms) [Riplinger:2013:134101] and insulin [Jiang:2024:082502] have been performed with this algorithm. This algorithm was originally implemented in the ORCA software package, but it is now available in PSI4!
For a more comprehensive overview on local correlation and the DLPNO-CCSD(T) algorithm, the reader is referred to the DLPNO-MP2 documentation, and the published work of Jiang et al. describing the implementation of DLPNO-CCSD(T) in PSI4 [Jiang:2024:082502].
An example input file for a DLPNO-CCSD computation is:
memory 2 GB
molecule h2o {
0 1
O
H 1 1.0
H 1 1.0 2 104.5
symmetry c1
}
set basis cc-pvdz
set scf_type df
set freeze_core True
set pno_convergence tight
energy('dlpno-ccsd')
An example input file for a DLPNO-CCSD(T) computation is:
memory 2 GB
molecule h2o {
0 1
O
H 1 1.0
H 1 1.0 2 104.5
symmetry c1
}
set basis cc-pvdz
set scf_type df
set freeze_core True
set pno_convergence tight
energy('dlpno-ccsd(t)') # dlpno-ccsd(t0) for the semicanonical (T0) computation
PNO Convergence Settings¶
Here we present a table of the PNO convergence settings, paramaters, and recommended use cases. Most of these parameters and settings are similar to what is found in ORCA, with two added parameters (T_CUT_TRACE and T_CUT_ENERGY) to increase the robustness of the PNO space. These added parameters truncate by percent recovery of the total occupation number, as well as the percentage energy recovery of the PNOs compared to the non-truncated basis.
T_CUT_PNO |
T_CUT_TRACE |
T_CUT_ENERGY |
T_CUT_DO |
T_CUT_MKN |
T_CUT_PAIRS |
Recommended Applications |
|
|---|---|---|---|---|---|---|---|
Loose |
1.0e-6 |
0.9 |
0.9 |
2e-2 |
1e-3 |
1e-3 |
High-throughput screening |
Normal |
3.33e-7 |
0.99 |
0.99 |
1e-2 |
1e-3 |
1e-4 |
Thermochemistry |
Tight |
1.0e-7 |
0.999 |
0.997 |
5e-3 |
1e-3 |
1e-5 |
Non-covalent Interactions |
Very_Tight |
1.0e-8 |
0.999 |
0.997 |
5e-3 |
1e-4 |
1e-6 |
Benchmarking, Focal Point |
Practical Advice¶
DLPNO-CCSD/(T) is almost always faster than the corresponding canonical CCSD/(T) computation, and computations involving DLPNO-CCSD/(T) are encouraged to be performed on large molecules as a more accurate alternative to DFT.
For most computations, PNO_CONVERGENCE
TIGHTis recommended, especially those involving non-covalent interactions. For larger systems whereTIGHTis too expensive,NORMALfor PNO_CONVERGENCE while setting T_CUT_PAIRS to1.0e-5is recommended. This has been shown to yield errors on the order of kJ/mol for non-covalent interactions [Jiang:2024:082502].Based on user allocated memory, disk/core storage for various integrals and tensors for DLPNO-CCSD/(T) are automatically determined. There is no need to toggle with the disk/core options for the average user.
In DLPNO methods, it is recommended to freeze core orbitals (by setting FREEZE_CORE to
True), since core excitations are known to be more sensitive to PNO truncations than valence truncations. If a non-frozen core computation is requested, all PNOs corresponding to core-core or core-virtual pairs have cutoffs scaled byT_CUT_PNO_CORE_SCALE(default1.0e-2).Note that DLPNO does not yet have molecular point group symmetry implemented and will run in C1 symmetry.
At this time, DLPNO-CCSD/(T) is only available for closed-shell RHF computations.
Computation Size Limits¶
Since DLPNO-CCSD(T) is linear-scaling, with access to sufficient computing resources, a DLPNO-CCSD(T) computation can be possible with any system. In fact, for larger systems, Hartree-Fock becomes the bottleneck (not the coupled-cluster)! Below, we tabulate the roughly projected limits (number of atoms) of DLPNO-CCSD(T) across different access to hardware. All these limits are with a standard polarized double zeta basis set. For larger basis sets, divide by 3 for an increase in cardinality, 2 for full set of diffuse functions, and 1.5 for partial diffuse functions (e.g. if the size limit is 100 for cc-pVDZ, expect 30 for cc-pVTZ, 50 for aug-cc-pVDZ, and 70 for jun-cc-pVDZ). Estimates for larger amounts of RAM should be taken with a grain of salt, as the computation may be theoretically possible with DLPNO-CCSD, but may be hindered by the cost of the preceeding Hartree-Fock computation (in time or memory). New approaches to make Hartree-Fock more efficient in PSI4 are currently under investigation.
Hardware |
RAM |
Normal |
Tight |
|---|---|---|---|
Home Desktop |
32 GB |
90-100 |
80-90 |
Lab Workstation |
64 GB |
120-150 |
100-120 |
Lab Workstation |
192 GB |
300-400 |
200-300 |
Lab Cluster |
512 GB |
700+ |
400+ |
Lab/HPC Cluster |
1 TB |
1000+ |
700+ |
HPC Cluster |
3 TB |
1500+ |
1000+ |
Key Differences with DLPNO-CCSD(T) in ORCA¶
While the DLPNO-CCSD(T) formulation in PSI4 is heavily inspired by the original method proposed by Neese and coworkers in ORCA [Riplinger:2013:034106] [Riplinger:2013:134101] [Riplinger:2016:024109] [Guo:2018:011101], PSI4 employs different algorithms for certain parts of the procedure. Both represent linear-scaling CCSD(T) algorithms solved in the the local pair natural orbital basis, with convergence to canonical CCSD(T) results as the local tolerances are tightened. However, the manner in which the PNO spaces are truncated as well as how the CCSD equations are solved are different. Notable differences in implementation between the two algorithms are highlighted below:
The most notable difference is that the DLPNO-CCSD equations in PSI4 utilize T1-dressed Hamiltonian and Fock matrix elements, which significantly simplifies the number of working equations as well as the number of mathematical operations involved in solving the CCSD residual equations. Because of the reduced number of intermediates that are required, we find that the DLPNO-CCSD code in PSI4 potentially uses less RAM than the ORCA formulation for a given PNO_CONVERGENCE. The runtimes are expected to be similar.
PSI4 often recovers slightly more CCSD correlation energy than ORCA due to additional PNO cutoffs T_CUT_TRACE and T_CUT_ENERGY used in addition to the normal T_CUT_PNO. These result in larger and more robust PNO spaces at a given PNO_CONVERGENCE in PSI4. The difference is typically small in terms of absolute energies (on the order of
1.0e-3to1.0e-4Hartrees), with agreement on the order of 99.95% or better. Due to the different T1 formulation, it is not possible to exactly match ORCA and PSI4 DLPNO-CCSD energies by adjusting keywords.PSI4 also couples MP2 weak pair amplitudes to CCSD strong pair amplitudes in solving the CCSD residual equations. In our formulation, this does not add significantly more time and memory since the most expensive algorithmic steps result from self-coupling terms, such as the particle-particle ladder (\(R_{ij}^{ab} \mathrel{+}= B^{Q}_{ac} t_{ij}^{cd} B^{Q}_{bd}\)). This is likely an additional source of the increased recovery of correlation energy in PSI4 compared to ORCA.
The more robust PNO space combined with strong pair/weak pair couplings often give PSI4 a slight edge in terms of relative energies compared to canonical CCSD(T) at a given PNO_CONVERGENCE, with comparable runtimes. This is showcased by the comparing conformation energies on large water clusters (16-17 waters), as shown in [Jiang:2024:082502].
Another difference with ORCA is the use of Full LMP2 prescreening across all PNO convergence levels, as opposed to only TightPNO in ORCA. For a more direct comparison, one can set
UseFullLMP2Guess truein the corresponding DLPNO-CCSD(T) input file in ORCA. For the triples computation, ORCA defaults to the semicanonical (T0) approach given in [Riplinger:2013:134101] when specifyingDLPNO-CCSD(T)in the input file, while our code defaults to the iterative (T) computation given in [Guo:2018:011101] with anenergy('dlpno-ccsd(t)')call. To perform the iterative (T) computation in ORCA, one needs to specifyDLPNO-CCSD(T1). In our code, the semicanonical (T0) implementation of triples can be requested throughenergy('dlpno-ccsd(t0)').In benchmarking studies, users are encouraged to use both ORCA and PSI4’s implementation. The main advantage of the code in ORCA is the capability to run across different nodes through MPI, while our code is designed for an optimal performance on a single node through OpenMP. Both codes should converge to similar answers for relative energies with larger basis sets and tighter cutoffs, especially when extrapolated to the complete basis set (CBS) limit.