Linear scaling QM Methods - Gerald Monard

Linear Scaling in Quantum Chemistry .... van der Vaart, A.; Suárez, D.; K. M. Merz, J. J. Chem. ..... Performed through the annihilation of F elements connecting.
4MB taille 51 téléchargements 364 vues
Linear scaling QM Methods G´erald MONARD

Equipe de Chimie et Biochimie Th´ eoriques UMR 7565 CNRS - Universit´ e Henri Poincar´ e Facult´ e des Sciences - B.P. 239 54506 Vandœuvre-les-Nancy Cedex - FRANCE http://www.monard.info/

Outline â Fundamentals of Linear Scaling methods 3 QM Bottlenecks 3 General ideas and solutions

â Focus on the Divide & Conquer method 3 D&C scheme at the semiempirical level 3 Energy Decomposition; Charge Transfer & Polarization 3 SemiEmpirical Born-Oppenheimer Molecular Dynamics

â Other linear scaling methods 3 Mozyme 3 CG-DMS

â Comments on the Parallelization of QM/MM and Linear Scaling methods

How to solve the QM scaling problem? Theoretical CPU scaling order for different QM methods QM method semiempirical DFT ab initio MP2 Full CI

Scaling O(N 3 ) O(N 3 ) O(N 4 ) O(N 5 ) O(expN )

bottleneck diagonalization electronic integrals electronic integrals

+ change the methods: use approximate quantum methods 3 semiempirical QM methods 3 molecular mechanics (MM) force fields 3 combined QM/MM methods

+ change the algorithms 3 Linear scaling algorithms

Linear Scaling in Quantum Chemistry Linear Scaling objectives Linear Cubic Quartic

â Standard algorithms must be changed â New algorithms interesting beyond the crossover point

Time (arbitrary unit)

â A QM calculation with a O(N) CPU scaling

Basis Number

Changing the algorithms towards linear scaling CPU bottleneck in Hartree-Fock methods â bielectronic integrals

HF: O(N 4 ) DFT: O(N 3 )

â orthogonal transformation

O(N 3 )

â fock matrix diagonalization

O(N 3 )

+ reaching linear scaling for HF (or DFT) methods involves multiple targets and changes of algorithms CPU bottleneck in semiempirical methods â fock matrix diagonalization

O(N 3 )

+ reaching linear scaling is/should be easier with semiempirical methods

Linear Scaling Foundations Exploiting Electronic Density Locality  p  ρ(r , r 0 ) ∼ exp − Egap |r − r 0 | â Egap is the energy difference between HOMO and LUMO â Thus, in large atomic systems, P is sparse â Except in metallic or conjugated systems â But true for proteins, small solute in solutions, etc

Some Linear Scaling Algorithms An example: Revisiting the Hartree-Fock SCF algorithm â Compute mono- and bielectronic integrals â Build the Fock matrix F â Orthogonal transformation

Fast Multiple Method sparse matrix Cholesky decomposition

â Diagonalization of F replaced by Conjugate-Gradient Density Matrix Search Among the published linear scaling methods â Divide & Conquer

(SemiEmp., DFT, HF, etc.)

â Localized Molecular Orbitals

(SemiEmp. only)

â Density Matrix Minimization

(SemiEmp., DFT, HF, etc.)

The Divide and Conquer Method â If χµ and χν are well separated in space (> 9-11 ˚ A):

Nsub

+ division of the system into Nsub R α subsystems: Full System

P=



{Fα }

(for all systems)



↓  i Cα

(subsystem M.O.)

↑ P

∑ Pα

α =1

Small Systems

F

Pµν = 0

+ theoretical speed-up ∼ Nsub (N α /N)3 buffer2 buffer1 core i H N

O N H

H N O

O N H

H N O

buffer1 buffer2

O N H

H N O

O N H

H N O

O

The Divide and Conquer Method â Fock matrix of subsystem R α :  Fµν if χµ ∈ R α α Fµν = 0 otherwise

and

χν ∈ R α

â For all subsystems, Roothan equations are solved: Fα Cα = Cα ε α â density submatrices are built: MOs α Pµν =

∑ niα (cµαi )∗ cναi

niα =

with

i

2 i − εF )/kT ]

1 + exp[(ε α

â Fermi energy εF is obtained by solving the equation: N Nsub

N

nelec =



Pµ µ =

µ =1

∑∑

MOs

Dµαµ

µ =1 α =1

∑ i

niα |cµαi |2 =

Nsub MOs

∑ ∑ niα Biα

α =1 i

â total density matrix P is built from: N

Pµν =

α α ∑α sub =1 Dµν Pµν N

α ∑α sub =1 Dµν

 with

α Dµν

=

1 if (µ, ν) ∈ (coreα , buffer 1α ) 0 unless

Semiempirical SCF algorithm 1. Compute mono- and bielectronic integrals 2. Build core hamiltonian (invariant) Hc 3. Guess an initial density matrix 4. Build the Fock matrix F 5. For each R α subsystems at iteration (k): 3 Assemble Fα from F(k) 3 Solve Fα Cα = Cα ε α 3 Store Bα i (k−1) 3 Build Pα from Cα , ε α , and εF 3 Add Pα to P(k+1) (k)

6. determine εF from all Bαi and Nelec Back to 4. unless convergence

Semiempirical Divide & Conquer: timings 5

DivCon PM3 (STANDARD) DivCon PM3 (D&C)

wall clock CPU time (seconds) per SCF cycle

wall clock CPU time (seconds) per Energy calculation

500

400

300

200

100

0

DivCon PM3 (STANDARD) DivCon PM3 (D&C)

4

3

2

1

0 0

100

200 300 number of water molecules

400

500

0

50

100 150 number of water molecules

200

â (H2 O)n water cluster (n from 1 to 500) â 1 energy calculations â DivCon99 (Merz et al. + modifications from Nancy Group) â Standard algorithm or Divide & Conquer (ncore = 1, buffer1 = 4˚ A, buffer2 = 2˚ A) â Intel(R) Xeon(R) CPU E5620 2.40GHz (8 cores) 32Gb RAM

250

Semiempirical Divide & Conquer implementations â Linear-scaling semiempirical quantum calculations for macromolecules Lee, T.-S.; York, D. M.; Yang, W. J. Chem. Phys. 1996, 105, 2744–2750 â Semiempirical molecular orbital calculations with linear system size scaling Dixon, S. L.; K. M. Merz, J. J. Chem. Phys. 1996, 104, 6643–6649 â Fast, accurate semiempirical molecular orbital calculations for macromolecules Dixon, S. L.; K. M. Merz, J. J. Chem. Phys. 1997, 107, 879–893 â Critical assessment of the performance of the semiempirical divide and conquer method for single point calculations and geometry optimizations of large chemical systems van der Vaart, A.; Su´arez, D.; K. M. Merz, J. J. Chem. Phys. 2000, 113, 10512–10523

Other Divide & Conquer implementations DFT : Yang, W.; Lee, T.-S. J. Chem. Phys. 1995, 103, 5674–5678 HF : Akama, T.; Kobayashi, M.; Nakai, H. J. Comput. Chem. 2007, 28, 2003 He, X.; Merz, Jr., K. M. J. Chem. Theory Comput. 2010, 6, 405–411 MP2 : Kobayashi, M.; Nakai, H. Int. J. Quant. Chem. 2009, 109, 2227–2237 CCSD(T) : Kobayashi, M.; Nakai, H. J. Chem. Phys. 2008, 129, 044103 Open-Shell (DFT and HF): Kobayashi, M.; Yoshikawa, T.; Nakai, H. Chem. Phys. Lett. 2010, 500, 172–177 Delocalized Systems: Akama, T.; Fujii, A.; Kobayashi, M.; Nakai, H. Mol. Phys. 2007, 105, 2799–2804 Kobayashi, M.; Nakai, H. Phys. Chem. Chem. Phys. 2012, 14, 7629–7639

Scaling factor in Divide & Conquer He, X.; Merz, Jr., K. M. J. Chem. Theory Comput. 2010, 6, 405–411

1700 basis functions ∼ 20 alanine amino acids (200 atoms)

Divide & Conquer Interaction Energy Decomposition â The theoretical framework of the Divide & Conquer method allows for a natural decomposition of the interaction energy into electrostatic, polarization, and charge transfer components: Eint = Ees + Epol + ECT â The total energy E [εF , Pr , r , b] of a multimolecular system is obtained from E [εF , Pr , r , b] =

1 c + Fµν ] + Ecore ∑ Pµν [Hµν 2∑ µ ν

â The total energy is a function of: 3 3 3 3

the Fermi energy εF the intermolecular distance r the density matrix Pr at this intermolecular distance and the buffer region b

Divide & Conquer Interaction Energy Decomposition â The interaction energy of a multimolecular system can then be expressed as Eint = E [εf , Pr , r , b] − E [εF , P∞ , ∞, b] â The number of electrons per subsystem can be constrained by using multiple Fermi energies εFα â Electron flow between subsystems can be inhibited by applying zero-buffering (= every subsystem consists of one core) â For an infinitely separeted system, the buffer region will always be empty between interacting molecules, therefore: E [εF , P∞ , ∞, b] = E [εFα , P∞ , ∞, 0]

Divide & Conquer Interaction Energy Decomposition â The electrostatic contribution to the interaction energy Ees can be obtained from: Ees = E [εFα , P∞ , r , 0] − E [εF , P∞ , ∞, 0] This corresponds to the energy obtained by bringing the infinitely separated system to equilibrium distance, without a change in the charge distribution. â By allowing intramolecular charge rearrangement, without charge flow, the contribution of polarization is obtained: Epol = E [εFα , P, r , 0] − E [εFα , P∞ , r , 0] â By allowing intermolecular charge flow, the contribution of charge transfer is: ECT = E [εf , Pr , r , b] − E [εFα , P, r , 0]

Divide & Conquer Interaction Energy Decomposition 64 water system (JPCA 1999, 103, 3321–3329) â classical molecular dynamics with 64 TIP3P water molecules â NPT: T = 330 K, P = 1 bar, ∆t = 1.5 fs, t = 330ps â 3 snapshots at 210, 270 and 330 ps â single point PM3 energy calculations within D & C framework â Interaction energy decomposition for all water

Divide & Conquer Interaction Energy Decomposition 64 water system (JPCA 1999, 103, 3321–3329)

Divide & Conquer Interaction Energy Decomposition 64 water system (JPCA 1999, 103, 3321–3329)

Divide & Conquer Interaction Energy Decomposition 64 water system (JPCA 1999, 103, 3321–3329)

Divide & Conquer Interaction Energy Decomposition CspA + water (JACS 1999, 121, 9182–9190) â CspA: cold shock protein A (69 residues, charged -1e) â Molecular Dynamics 500ps â MM force field: AMBER + TIP3P â single point PM3 energy calculations within D & C framework (keep only water molecules < 9˚ A + ∼ 4500 atoms)

Divide & Conquer Interaction Energy Decomposition CspA + water (JACS 1999, 121, 9182–9190)

Divide & Conquer Interaction Energy Decomposition CspA + water (JACS 1999, 121, 9182–9190)

Divide & Conquer Interaction Energy Decomposition CspA + water (JACS 1999, 121, 9182–9190)

Divide & Conquer Interaction Energy Decomposition Docking of urate in Urate Oxidase (Theochem, 2009, 898, 31–41) O HN1 2

O

6

H2O + O2 5

3 4

N H

O

H2O2

H N 7

9

8

N H

O

OH H N

HN

H2O

CO2

O NH2

H N O

O

Urate Oxidase

O

Uric Acid

N H

N

5-Hydroxyisourate

O

N H H

N H

Allantoïn

+ Active site (12˚ A box) ∼ 700 atoms, full QM

Divide & Conquer Interaction Energy Decomposition Docking of urate in Urate Oxidase (Theochem, 2009, 898, 31–41) Polres =

1 Nres



a∈res

|qapol − qavac |

CTres =

1 Nres

∑ (qaE :S − qapol )

a∈res

W HAT IS SEBOMD? SemiEmpirical Born-Oppenheimer Molecular Dynamics Born-Oppenheimer Only the positions of the nuclei are defined. At each step of a molecular dynamics, molecular orbitals, energy, and gradient are obtained after a fully converged SCF calculation. + Full quantum description of a molecular system. Semiempirical NDDO approximation is performed (MNDO, AM1, PM3, . . . ). This enables fast and repetitive computations and allows for “long” MD trajectories (∼ 100ps) of “large” systems (several hundred of atoms) on commodity computers (single PC, clusters).

A SEBOMD program developed in Nancy The software The coupling of two programs: â a classical MD program (AMBER1 ) â + a semiempirical program (DivCon2 ) â MD routines are modified: instead of computing energy and forces from a classical force field, the semiempirical program is filled with the system coordinates and outputs the energy and the gradient. Why DivCon? 1. It can compute molecular energies in a linear scaling way (Divide & Conquer methodology) 2. It is parallelized (D&C calculations are spread among nodes) + access to “large” systems 1 D. 2 S.

A. Case et al., AMBER9, Univ. of California (2006) L. Dixon et al., DivCon 99, Penn State Univ. (1999)

T EST CASE : LIQUID WATER Simulation protocol: â box of 64 or 216 water molecules (H2 O or D2 O) â cubic box (a = 12.417˚ A or 18.626˚ A) â dt = 1fs â NVT, 20 ps with strong temperature coupling (Andersen thermostat) 0 → 5ps = 10K, 5 → 10ps = 100K,

10 → 20ps = 300K

â sampling: 80 ps NVT at 300K â comparison with experimental values: Radial Distribution Functions, auto-diffusion constant, heat of vaporization â Various semiempirical methods

L IQUID WATER : MNDO Radial Distribution Functions (RDF)

Radial Distribution Functions (RDF)

3

3

2.5

2.5

g(OH)

2

1.5

1.5

1

1

0.5

0.5

0

0 0

1

2

3

4

5

6

7

8

0

1

r (Angstroms)

2

3

4

5

6

7

8

7

8

r (Angstroms) Radial Distribution Functions (RDF) 3

3 3 3 3

64 water molecules cubic box (a = 12.417˚ A) 100ps, NVT, 300K, dt=1fs MNDO, SCF standard (full diagonalization)

2.5 2 g(HH)

g(OO)

2

exp. Minimum Image Ewald-CM1

1.5 1 0.5 0 0

1

2

3 4 5 r (Angstroms)

6

L IQUID WATER : AM1

G. M ONARD et al. J. Phys. Chem. A, 109, 3425 (2005)

Radial Distribution Functions (RDF)

Radial Distribution Functions (RDF)

3

3

2.5

2.5

g(OH)

2

1.5

1.5

1

1

0.5

0.5

0

0 0

1

2

3

4

5

6

7

8

0

1

r (Angstroms)

2

3

4

5

6

7

8

7

8

r (Angstroms) Radial Distribution Functions (RDF) 3

3 3 3 3

64 water molecules cubic box (a = 12.417˚ A) 100ps, NVT, 300K, dt=1fs AM1, SCF standard (full diagonalization)

2.5 2 g(HH)

g(OO)

2

exp. Minimum Image Ewald-CM1

1.5 1 0.5 0 0

1

2

3 4 5 r (Angstroms)

6

L IQUID WATER : PM3 Radial Distribution Functions (RDF)

Radial Distribution Functions (RDF)

3

3

2.5

2.5

g(OH)

2

1.5

1.5

1

1

0.5

0.5

0

0 0

1

2

3

4

5

6

7

8

0

1

r (Angstroms)

2

3

4

5

6

7

8

7

8

r (Angstroms) Radial Distribution Functions (RDF) 3

3 3 3 3

64 water molecules cubic box (a = 12.417˚ A) 100ps, NVT, 300K, dt=1fs PM3, SCF standard (full diagonalization)

2.5 2 g(HH)

g(OO)

2

exp. Minimum Image Ewald-CM1

1.5 1 0.5 0 0

1

2

3 4 5 r (Angstroms)

6

PM3-PIF (PARAMETRIZED I NTERACTION F UNCTION )6 â Original PM3: VAB = ZA0 ZB0 (sA sA |sB sB )[1 + e −αA RAB + e −αB RAB ] + gAB gAB

Z0 Z0 = A B RAB

!

∑ KA,i e −LA,i (RAB −MA,i

)2

i

+ ∑ KB ,i e −LB,i (RAB −MB,i i

â PM3-PIF (Intermolecular interactions only): gAB = αAB e −βAB RAR +

χAB δAB εAB 6 + R 8 + R 10 RAB AB AB

â parameterized from ab initio potential energy surface (700 water dimer configurations, MP2/aug-cc-pVQZ)

6 M.I. Bernal-Uruchurtu et al. J. Comput. Chem., 21, 572-581 (2000)

)2

L IQUID WATER : PM3-PIF Radial Distribution Functions (RDF)

Radial Distribution Functions (RDF)

3

3

2.5

2.5

g(OH)

2

1.5

1.5

1

1

0.5

0.5

0

0 0

1

2

3

4

5

6

7

8

0

1

r (Angstroms)

3 3 3 3

64 / 216 water molecules cubic box (12.417˚ A/ 18.626˚ A) 100ps, NVT, 300K, dt=1fs/2fs PM3-PIF, SCF standard or Divide & Conquer 3 Hydrogen / Deuterium

2

3

4

5

6

7

8

7

8

r (Angstroms) Radial Distribution Functions (RDF) 3 2.5 2 g(HH)

g(OO)

2

exp. 64 STD Ewald-CM1 64 STD Minimum Image 64 STD Ewald-Mulliken 216 DC-D-1-6-0 Ewald-CM1

1.5 1 0.5 0 0

1

2

3 4 5 r (Angstroms)

6

A COMPARISON BETWEEN PM3-PIF, SPCE AND TIP3P MODELS Radial Distribution Functions (RDF) 3

exp. Minimum Image Ewald CM1 TIP3P SPCE

2.5

g(OO)

2

1.5

1

0.5

0 0

1

2

3

4 r (Angstroms)

5

6

7

8

H EAT OF VAPORIZATION / AUTODIFFUSION CONSTANT

Method exp TIP3P SPCE AM1a AM1b PM3a PM3b PM3-PIFa PM3-PIFb PM3-PIFc PM3-PIFd a

: : c : d : b

Heat of vaporization (kcal/mol) 10.5 10.27 11.84 7.72 8.99 4.11 5.38 8.64 10.12 8.71 9.98

Autodiffusion constant (10−9 m2 s − 1) 2.3 5.0 2.2

3.2 4.7

64 water molecules, standard SCF, Minimum Image 64 water molecules, standard SCF, Ewald Summation, CM1 atomic charges 64 water molecules, standard SCF, Ewald Summation, Mulliken atomic charges 216 water molecules, D&C, Ewald Summation, CM1 atomic charges

M OLECULAR G EOMETRY / CM1 D IPOLE MOMENT

Method exp TIP3P SPCE AM1a AM1b PM3a PM3b PM3-PIFa PM3-PIFb PM3-PIFc PM3-PIFd a

: : c : d : b

d (˚ A) 0.957 0.957 1.000 0.961 0.961 0.951 0.951 0.951 0.951 0.951 0.951

gas phase α (o ) 104.5 104.5 109.5 103.5 103.5 107.7 107.7 107.7 107.7 107.7 107.7

µ (D) 1.85 2.35 2.35 2.02 2.02 1.90 1.90 1.90 1.90 1.90 1.90

d (˚ A) 0.970 0.957 1.000 0.967 0.970 0.962 0.966 0.966 0.972 0.966 0.971

liquid phase α (o ) µ (D) 106.0 104.5 2.35 109.5 2.35 103.7 2.19 102.8 2.22 108.0 2.13 106.8 2.19 106.7 2.18 105.1 2.26 106.8 2.17 105.2 2.23

64 water molecules, standard SCF, Minimum Image 64 water molecules, standard SCF, Ewald Summation, CM1 atomic charges 64 water molecules, standard SCF, Ewald Summation, Mulliken atomic charges 216 water molecules, D&C, Ewald Summation, CM1 atomic charges

P OLARIZATION / C HARGE TRANSFER (PM3-PIF) Distribution of the molecular net charge 7 6

Mulliken CM1 CM2

5 Probability (%)

5 4 3 2

4 3 2 1

1 0 -0.2

CM1 net molecular charge (#1)

Probability (%)

Distribution of the molecular dipole moment 6

Mulliken CM1 CM2

0 -0.15

-0.1

-0.05 0 0.05 Molecular net charge

0.1

0.15

0.2

0

0.5 1 1.5 2 2.5 Molecular Dipole Moment (Debye)

3

Fluctuation of the net molecular charge of one water molecule (SEBOMD: 64 H2O, NVT, 300K, dt=1fs, standard SCF, Ewald-CM1) 0.1 0.05 0 -0.05 -0.1 0

10

20

30

40 Time (ps)

50

60

70

80

B EYOND LIQUID WATER ... â PM3-PIF provides a good model for liquid water â What about small organic solute in water ? + on-going tests of: 3 3 3 3 3 3

formamide, ethanol, dimethyl ether, N-methylacetamide, glycine (zwitterionic or “neutral”), etc.

I NTERMOLECULAR INTERACTIONS â Water-Water interactions + PM3-PIF â Solute-Water interactions + PM3 or PM3-PIF

DIMETHYL ETHER IN WATER ( SOLUTE - WATER

= PM3)

Radial Distribution Functions (RDF) 3

Radial Distribution Functions (RDF) 3

exp. (liquid water) O(Water)-O(Water) O(DME)-O(Water)

2.5

2 g(OH)

2

1.5

1.5

1

1

0.5

0.5

0

0 0

1

2

3

4

5

6

7

8

0

1

r (Angstroms)

2

3

4

3

3 3 3 3 3

1 DME + 61 H2 O cubic box (12.417˚ A) 100ps, NVT, 300K, dt=1fs/2fs PM3-PIF, SCF standard DME ↔ Water = PM3

5

6

7

8

7

8

r (Angstroms) Radial Distribution Functions (RDF) exp. (liquid water) H(Water)-H(Water) H(DME)-H(Water)

2.5

2 g(HH)

g(OO)

exp. (liquid water) O(Water)-H(Water) O(DME)-H(Water) H(DME)-O(Water)

2.5

1.5

1

0.5

0 0

1

2

3

4 r (Angstroms)

5

6

H-H INTERMOLECULAR INTERACTIONS IN PM37 â PM3-PIF extensions to hydrated systems:

CH4 · · · CH4 10

PM3-MAIS MP2 PM3

6 4

Solvent

Eint kcal/mol

8

2 0 -2 1

1.5

2

2.5

3

3.5

4

4.5

5

R (H-H) Å

7 W. Harb et al. Theor. Chem. Acc., 112, 204-216 (2004)

H C N O

Solute H C x x

N x

O x

x

x

x

x

DIMETHYL ETHER IN WATER ( SOLUTE - WATER

= PM3-PIF)

Radial Distribution Functions (RDF) 3

Radial Distribution Functions (RDF) 3

exp. (liquid water) O(Water)-O(Water) O(DME)-O(Water)

2.5

2 g(OH)

2

1.5

1.5

1

1

0.5

0.5

0

0 0

1

2

3

4

5

6

7

8

0

1

r (Angstroms)

2

3

4

3

3 3 3 3 3

1 DME + 61 H2 O cubic box (12.417˚ A) 100ps, NVT, 300K, dt=1fs/2fs PM3-PIF, SCF standard DME ↔ Water = PIF

5

6

7

8

7

8

r (Angstroms) Radial Distribution Functions (RDF) exp. (liquid water) H(Water)-H(Water) H(DME)-H(Water)

2.5

2 g(HH)

g(OO)

exp. (liquid water) O(Water)-H(Water) O(DME)-H(Water) H(DME)-O(Water)

2.5

1.5

1

0.5

0 0

1

2

3

4 r (Angstroms)

5

6

Infra-Red spectra of N-methylacetamide in Water: a SEBOMD study

Infra-Red spectra of N-methylacetamide in Water: a SEBOMD study Computational details (JCTC, 2011, 7, 1840–1849) â 1 NMA + 64 water molecules in a cubix box (ρ = 0.996 g/ml) â 500 ps+1 ns AMBER+SPC/E MD (NVT 300 K, ∆t = 1 fs) â 100+300 ps PM3-PIF SEBOMD (NVT 300 K, ∆t = 1 fs) â Ewald sum to treat long-range electrostatics (CM1 charges) â IR spectra calculated from the Fourier Transform of the electric dipole time correlation function: I (ω) ∼ Q(ω)

Z

+∞

dt exp−i ω t hµ(0).µ(t)i

−∞

â NMA molecular dipole moment is computed from Mulliken, CM1, or CM2 atomic charges (with the NMA center of masses as the cartesian space origin)

Infra-Red spectra of N-methylacetamide in Water: a SEBOMD study AMBER classical MD Isolated NMA

NMA in water

cis NMA trans NMA

500

1000

1500

2000

2500 -1

ω (cm )

3000

cis NMA trans NMA

3500

500

1000

1500

2000 -1

ω (cm )

2500

3000

3500

Infra-Red spectra of N-methylacetamide in Water: a SEBOMD study PM3-PIF SEBOMD Isolated NMA

NMA in water

Mulliken

Mulliken 3000

3200

3400

3000

3200

3400

3000

3200

3400

cis NMA trans NMA

CM1

2800 3000 3200 3400

CM1 2800 3000 3200 3400

CM2

CM2

500

1000

1500

2000

2500 -1

ω (cm )

3000

2800 3000 3200 3400

3500

500

1000

1500

2000

2500 -1

ω (cm )

3000

3500

cis NMA trans NMA

Infra-Red spectra of N-methylacetamide in Water: a SEBOMD study Comparison of the IR pics

Ingrosso, F.; Monard, G.; Farag, M. H.; Bastida, A.; Ruiz-L´opez, M. F. J. Chem. Theory Comput. 2011, 7, 1840–1849

What we learn from SEBOMD studies â SemiEmpirical methods can tackle very large molecular systems in static or dynamical simulations â SemiEmpirical methods are fast enough to run molecular dynamics simulations (SEBOMD) on commodity computers or small clusters â Results are mainly dependent on the parameters of the “semiempirical force field” which naturally includes intermolecular polarization and charge transfer â Now, there is a need for an improvement of the semiempirical parameters that would correctly take into account intermolecular interactions + “SemiAbInitio” parameters: the lack of empirical data to represent intermolecular interactions imposes the fitting of semiempirical parameters from ab initio data

Localized Molecular Orbitals Semiempirical foundations (Stewart, 1996)1

â P is sparse, thus MOs have a localized extension i.e. MOs are developped on a limited set of atoms â For SCF convergence, F does not need to be fully diagonalized; â It is only necessary to block diagonalize F between occupied and non-occupied MOs Algorithm â Starting MO set: a set of strictly localized MOs (LMOs) â For all non-orthogonal (occupied MO, non-occupied MO) tuples that share at least one atom, combine them to form two new orthogonal MOs â Repeat until F is block diagonal

1 Stewart,

J. J. P. Int. J. Quant. Chem. 1996, 58, 133–146

Localized Molecular Orbitals Starting set of LMOs â They are built from a hybrid orbital basis set â To each bonding LMO corresponds an orthogonal antibonding LMO â They are usually built from a Lewis structure LMO Orthogonalization â Performed through the annihilation of F elements connecting occupied and unoccupied MOs sharing at least one atom in common (pseudo-diagonalization) â Semiempirical fact: if an occupied LMO does not extend to any atom of an unoccupied LMO, thus they are orthogonal (S = I)

Localized Molecular Orbitals MO energy: εij =< ψi |F |ψj >= ∑ ∑ cµ i Fµν cν j µ ν

with: ψi = ∑ c µ i ϕ µ

LCAO

µ

Pseudo-diagonalization step:  0 ψi = αψi + β ψj ψj0 = −β ψi + αψj v   u u1 D u   with D = εjj − εii α =t 1+ q 2 4εij2 + D 2  p 1 if εij < 0 β = φ 1 − α 2 with φ = −1 unless Procedure stops when all occupied LMOs are two-by-two orthogonal to all unoccupied LMOs. Then, P can be build.

Localized Molecular Orbitals Pros and cons

Pros â Avoid diagonalization â Fast computations (small prefactor) â Can handle very large molecular systems Cons â Tricks not published â No extension to ab initio or DFT theories â Parallelization? â Linear Scaling? + available as MOZYME in MOPAC

Localized Molecular Orbitals Is LMO a linear scaling methods?

Stewart, J. J. P. Int. J. Quant. Chem. 1996, 58, 133–146

Localized Molecular Orbitals Is LMO a linear scaling methods? 5000

MOZYME MOPAC 93

4000

Time (s)

3000

2000

1000

0 0

100

200

300

400

500 600 Size (atoms)

700

800

Stewart, J. J. P. Int. J. Quant. Chem. 1996, 58, 133–146

900

1000

Localized Molecular Orbitals Some LMOs applications â Calculation of the geometry of a small protein using semiempirical methods Stewart, J. J. P. J. Mol. Struct. (THEOCHEM) 1997, 401, 195–205 â A practical method for modeling solid using semiempirical methods Stewart, J. J. P. J. Mol. Struct. 2000, 556, 59–67 â Application of the PM6 method to modeling proteins Stewart, J. J. P. J. Mol. Model. 2009, 15, 765–805 â Application of the PM6 method to modeling the solid state Stewart, J. J. P. J. Mol. Model. 2008, 14, 499–535

Localized Molecular Orbitals Another method using LMOs in a linear scaling fashion: LocalSCF â LocalSCF method for semiempirical quantum-chemical calculation of ultralarge biomolecule Anikin, N. A.; Anisimov, V. M.; Bugaenko, V. L.; Bobrikov, V. V.; Andreyev, A. M. J. Chem. Phys. 2004, 121, 1266–1270 â Validation of linear scaling semiempirical LocalSCF method Anisimov, V. M.; Bugaenko, V. L.; Bobrikov, V. V. J. Chem. Theory Comput. 2006, 2, 1685–1692 â Charge Transfer Effects in the GroEL-GroES Chaperonin Tetramer in Solution Anisimov, V. M.; Bliznyuk, A. A. J. Phys. Chem. B 2012, 116, 6261–6268

Density Matrix Minimization Mc Weeny, 19622 â Eigensolution of the SCF equations can be avoided by direct minimization of Eelec subject to the constraint of normalization: Ne − = Tr{P} and idempotency P = PP â Mc Weeny purification to obtain idempotency: ˜ = 3PP − 2PPP P

2 McWeeny,

R. Rev. Mod. Phys. 1960, 32, 335–369

Density Matrix Minimization Li, Nunes, Vanderbilt, 19933 4 â P is the quantity minimizing: Ω(P) = Tr{(3PP − 2PPP)F} + µ(Tr{P} − Nel ) (in the orthogonal basis set) â µ: Lagrange constraint to enforce the correct number of electrons â To achieve O(N) complexity: sparse matrix algebra + gradient-only minimization

3 Lu,

X. P.; Nunes, R. W.; Vanderbilt, D. Phys. Rev. B 1993, 47, 10891–10894 J. M.; Scuseria, G. E. J. Chem. Phys. 1997, 106, 5569–5577

4 Millam,

Density Matrix Minimization Conjugate-Gradient Density Matrix Search (CG-DMS) 5 1. initial search direction (steepest descent step): G0 = H0 = −∇Ω(P0 ) 2. conjugate gradient search:

5 Millam,

Pi+1

=

Pi + λi Hi

the step

Gi

=

−∇Pi Ω(P0 )

the gradient

Hi

=

the search direction

γi

=

Gi+1 + γi Hi (Gi+1 − Gi ).Gi+1 Gi .Gi

J. M.; Scuseria, G. E. J. Chem. Phys. 1997, 106, 5569–5577

Density Matrix Minimization Conjugate-Gradient Density Matrix Search (CG-DMS) 6

Ω(P)

=

Tr{(3PP − 2PPP)F} + µ(Tr{P} − Nel )

∇Ω(P)

=

3(PF + FP) − 2(PPF + PFP + FPP) − µI

µ is chosen such that the gradient of the functional Ω(P) is traceless: µ = Tr {3(PF + FP) − 2(PPF + PFP + FPP)} /Nbasis Then, any finite step preserves the number of electrons7 .

6 Millam, 7 Millam,

J. M.; Scuseria, G. E. J. Chem. Phys. 1997, 106, 5569–5577 J. M.; Scuseria, G. E. J. Chem. Phys. 1997, 106, 5569–5577

Density Matrix Minimization Millam, J. M.; Scuseria, G. E. J. Chem. Phys. 1997, 106, 5569–5577

Density Matrix Minimization Millam, J. M.; Scuseria, G. E. J. Chem. Phys. 1997, 106, 5569–5577 â Basis set affects computational cost significantly. For a given molecule, if we change the basis from 3-21G to 6-31G**, the number of basis functions increases substantially, and so does the CPU time. However, for a given number of basis functions, a 6-31G** calculation on a given system is more expensive than a 3-21G calculation with identical number of basis functions (and consequently on a larger similar molecule) because polarized basis sets normally yield much less sparse matrices than unpolarized basis sets.

Density Matrix Minimization Daniels, A. D.; Millam, J. M.; Scuseria, G. E. J. Chem. Phys. 1997, 107, 879–893 1D is different than 3D!

Density Matrix Minimization Pros and cons

Pros â Scales linearly if sparse matrix algebra is well implemented â Easy to implement ? â Parallelism is possible (matrix multiplication only) Cons â Not enough to ensure linear scaling for ab initio methods (other limiting factors like integral computations, etc) â Threshold in the matrix multiplication (affects the prefactor) â Efficient for small basis sets

Conclusions on Linear Scaling Methods As of today â Standard algorithms are being modified to allow O(N) â O(N) publications: semiempirical, DFT, HF, MPx, CCSD(T), etc â The smaller basis set, the smaller prefactor â Computer codes: fine

Chemical answer : ?

Some Available codes â semiempirical: MOPAC(Mozyme), LocalSCF, DivCon(AMBER) â DFT: Siesta, ADF â ab initio: Gaussian, MondoSCF (Freeon), MolPro

Linear Scaling QM methods Some selected reviews â Linear scaling electronic structure methods Goedecker, S. Rev. Mod. Phys. 1999, 71, 1085–1123 â Order-N methodologies and their applications Wu, S. Y.; Jayanthi, C. S. Phys. Rep. 2002, 358, 1–74 â Linear-Scaling Methods in Quantum Chemistry Ochsenfeld, C.; Kussmann, J.; Lambrecht, D. In Reviews in Computational Chemistry; Lipkowitz, K. B., Cundari, T. R., Eds., Vol. 23; Wiley-VCH, John Wiley & Sons, Inc., 2007; chapter 1 â O(N) methods in electronic structure calculations Bowler, D. R.; Miyazaki, T. Rep. Prog. Phys. 2012, 75, 036503 â Linear-scaling self-consistent field methods for large molecules Kussmann, J.; Beer, M.; Ochsenfeld, C. WIREs Comput. Mol. Sci. 2013, in press

QM/MM methods, Linear Scaling methods and Parallelism Bottlenecks for parallelization â bielectronic integrals â matrix multiplication â matrix diagonalization

+ parallelization , + parallelization ,

+ parallelization /

+ linear scaling methods employ scheme that can be better parallelized, thus reducing the prefactor. GPUs in quantum chemistry (ab initio and semiempirical) â bielectronic integrals: Ufimtsev, I. S.; Martinez, T. J. J. Chem. Theory Comput. 2008, 4, 222–231 â self-consistent field: Ufimtsev, I. S.; Martinez, T. J. J. Chem. Theory Comput. 2009, 5, 1004–1015 â Born-Oppenheimer Molecular Dynamics: Ufimtsev, I. S.; Martinez, T. J. J. Chem. Theory Comput. 2009, 5, 2619–2628 + What about linear scaling + GPU?

,