Data stored in TREXIO
Table of Contents
For simplicity, the singular form is always used for the names of
groups and attributes, and all data are stored in atomic units.
The dimensions of the arrays in the tables below are given in
columnmajor order (as in Fortran), and the ordering of the dimensions
is reversed in the produced trex.json
configuration file as the
library is written in C.
1 Metadata (metadata group)
As we expect TREXIO files to be archived in opendata repositories, we give the possibility to the users to store some metadata inside the files. We propose to store the list of names of the codes which have participated to the creation of the file, a list of authors of the file, and a textual description.
Variable  Type  Dimensions (for arrays)  Description 

code_num 
dim 
Number of codes used to produce the file  
code 
str 
(metadata.code_num) 
Names of the codes used 
author_num 
dim 
Number of authors of the file  
author 
str 
(metadata.author_num) 
Names of the authors of the file 
package_version 
str 
TREXIO version used to produce the file  
description 
str 
Text describing the content of file  
unsafe 
int 
1 : true, 0 : false 
Note: The unsafe
attribute of the metadata
group indicates
whether the file has been previously opened with 'u'
mode. It is
automatically written in the file upon the first unsafe opening. If
the user has checked that the TREXIO file is valid (e.g. using
trexiotools
) after unsafe operations, then the unsafe
attribute
value can be manually overwritten (in unsafe mode) from 1
to 0
.
2 System
2.1 Nucleus (nucleus group)
The nuclei are considered as fixed point charges. Coordinates are given in Cartesian \((x,y,z)\) format.
Variable  Type  Dimensions  Description 

num 
dim 
Number of nuclei  
charge 
float 
(nucleus.num) 
Charges of the nuclei 
coord 
float 
(3,nucleus.num) 
Coordinates of the atoms 
label 
str 
(nucleus.num) 
Atom labels 
point_group 
str 
Symmetry point group  
repulsion 
float 
Nuclear repulsion energy 
2.2 Cell (cell group)
3 Lattice vectors to define a box containing the system, for example used in periodic calculations.
Variable  Type  Dimensions  Description 

a 
float 
(3) 
First real space lattice vector 
b 
float 
(3) 
Second real space lattice vector 
c 
float 
(3) 
Third real space lattice vector 
G_a 
float 
(3) 
First reciprocal space lattice vector 
G_b 
float 
(3) 
Second reciprocal space lattice vector 
G_c 
float 
(3) 
Third reciprocal space lattice vector 
two_pi 
int 
0 or 1 . If two_pi=1 , \(2\pi\) is included in the reciprocal vectors. 
2.3 Periodic boundary calculations (pbc group)
A single $k$point per TREXIO file can be stored. The $k$point is defined in this group.
Variable  Type  Dimensions  Description 

periodic 
int 
1 : true or 0 : false 

k_point 
float 
(3) 
$k$point sampling 
2.4 Electron (electron group)
The chemical system consists of nuclei and electrons, where the nuclei are considered as fixed point charges with Cartesian coordinates. The wave function is stored in the spinfree formalism, and therefore, it is necessary for the user to explicitly store the number of electrons with spin up (\(N_\uparrow\)) and spin down (\(N_\downarrow\)). These numbers correspond to the normalization of the spinup and spindown singleparticle reduced density matrices.
We consider wave functions expressed in the spinfree formalism, where the number of ↑ and ↓ electrons is fixed.
Variable  Type  Dimensions  Description 

num 
dim 
Number of electrons  
up_num 
int 
Number of ↑spin electrons  
dn_num 
int 
Number of ↓spin electrons 
2.5 Ground or excited states (state group)
This group contains information about excited states. Since only a single state can be stored in a TREXIO file, it is possible to store in the main TREXIO file the names of auxiliary files containing the information of the other states.
The file_name
and label
arrays have to be written only for the
main file, e.g. the one containing the ground state wave function
together with the basis set parameters, molecular orbitals,
integrals, etc.
The id
and current_label
attributes need to be specified for each file.
Variable  Type  Dimensions  Description 

num 
dim 
Number of states (including the ground state)  
id 
int 
Index of the current state (0 is ground state)  
current_label 
str 
Label of the current state  
label 
str 
(state.num) 
Labels of all states 
file_name 
str 
(state.num) 
Names of the TREXIO files linked to the current one (i.e. containing data for other states) 
3 Basis functions
3.1 Basis set (basis group)
3.1.1 Gaussian and Slatertype orbitals
We consider here basis functions centered on nuclei. Hence, it is possibile to define dummy atoms to place basis functions in arbitrary positions.
The atomic basis set is defined as a list of shells. Each shell \(s\) is centered on a center \(A\), possesses a given angular momentum \(l\) and a radial function \(R_s\). The radial function is a linear combination of \(N_{\text{prim}}\) primitive functions that can be of type Slater (\(p=1\)) or Gaussian (\(p=2\)), parameterized by exponents \(\gamma_{ks}\) and coefficients \(a_{ks}\): \[ R_s(\mathbf{r}) = \mathcal{N}_s \vert\mathbf{r}\mathbf{R}_A\vert^{n_s} \sum_{k=1}^{N_{\text{prim}}} a_{ks}\, f_{ks}(\gamma_{ks},p)\, \exp \left(  \gamma_{ks} \vert \mathbf{r}\mathbf{R}_A \vert ^p \right). \]
In the case of Gaussian functions, \(n_s\) is always zero.
Different codes normalize functions at different levels. Computing normalization factors requires the ability to compute overlap integrals, so the normalization factors should be written in the file to ensure that the file is selfcontained and does not need the client program to have the ability to compute such integrals.
Some codes assume that the contraction coefficients are for a linear combination of normalized primitives. This implies that a normalization constant for the primitive \(ks\) needs to be computed and stored. If this normalization factor is not required, \(f_{ks}=1\).
Some codes assume that the basis function are normalized. This implies the computation of an extra normalization factor, \(\mathcal{N}_s\). If the the basis function is not considered normalized, \(\mathcal{N}_s=1\).
All the basis set parameters are stored in onedimensional arrays.
3.1.2 Plane waves
A plane wave is defined as
\[ \chi_j(\mathbf{r}) = \exp \left( i \mathbf{G}_j \cdot \mathbf{r} \right) \]
The basis set is defined as the array of $k$points in the
reciprocal space \(\mathbf{G}_j\), defined in the pbc
group. The
kinetic energy cutoff e_cut
is the only input data relevant to
plane waves.
3.1.3 Data definitions
Variable  Type  Dimensions  Description 

type 
str 
Type of basis set: "Gaussian", "Slater" or "PW" for plane waves  
prim_num 
dim 
Total number of primitives  
shell_num 
dim 
Total number of shells  
nucleus_index 
index 
(basis.shell_num) 
Onetoone correspondence between shells and atomic indices 
shell_ang_mom 
int 
(basis.shell_num) 
Onetoone correspondence between shells and angular momenta 
shell_factor 
float 
(basis.shell_num) 
Normalization factor of each shell (\(\mathcal{N}_s\)) 
r_power 
int 
(basis.shell_num) 
Power to which \(r\) is raised (\(n_s\)) 
shell_index 
index 
(basis.prim_num) 
Onetoone correspondence between primitives and shell index 
exponent 
float 
(basis.prim_num) 
Exponents of the primitives (\(\gamma_{ks}\)) 
coefficient 
float 
(basis.prim_num) 
Coefficients of the primitives (\(a_{ks}\)) 
prim_factor 
float 
(basis.prim_num) 
Normalization coefficients for the primitives (\(f_{ks}\)) 
e_cut 
float 
Energy cutoff for planewave calculations 
3.1.4 Example
For example, consider H_{2} with the following basis set (in GAMESS format), where both the AOs and primitives are considered normalized:
HYDROGEN S 5 1 3.387000E+01 6.068000E03 2 5.095000E+00 4.530800E02 3 1.159000E+00 2.028220E01 4 3.258000E01 5.039030E01 5 1.027000E01 3.834210E01 S 1 1 3.258000E01 1.000000E+00 S 1 1 1.027000E01 1.000000E+00 P 1 1 1.407000E+00 1.000000E+00 P 1 1 3.880000E01 1.000000E+00 D 1 1 1.057000E+00 1.000000E+00
In TREXIO representaion we have:
type = "Gaussian" prim_num = 20 shell_num = 12 # 6 shells per H atom nucleus_index = [ 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1 ] # 3 shells in S (l=0), 2 in P (l=1), 1 in D (l=2) shell_ang_mom = [ 0, 0, 0, 1, 1, 2, 0, 0, 0, 1, 1, 2 ] # no need to renormalize shells shell_factor = [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1. ] # 5 primitives for the first S shell and then 1 primitive per remaining shells in each H atom shell_index = [ 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 7, 8, 9, 10, 11 ] # parameters of the primitives (10 per H atom) exponent = [ 33.87, 5.095, 1.159, 0.3258, 0.1027, 0.3258, 0.1027, 1.407, 0.388, 1.057, 33.87, 5.095, 1.159, 0.3258, 0.1027, 0.3258, 0.1027, 1.407, 0.388, 1.057 ] coefficient = [ 0.006068, 0.045308, 0.202822, 0.503903, 0.383421, 1.0, 1.0, 1.0, 1.0, 1.0, 0.006068, 0.045308, 0.202822, 0.503903, 0.383421, 1.0, 1.0, 1.0, 1.0, 1.0 ] prim_factor = [ 1.0006253235944540e+01, 2.4169531573445120e+00, 7.9610924849766440e01 3.0734305383061117e01, 1.2929684417481876e01, 3.0734305383061117e01, 1.2929684417481876e01, 2.1842769845268308e+00, 4.3649547399719840e01, 1.8135965626177861e+00, 1.0006253235944540e+01, 2.4169531573445120e+00, 7.9610924849766440e01, 3.0734305383061117e01, 1.2929684417481876e01, 3.0734305383061117e01, 1.2929684417481876e01, 2.1842769845268308e+00, 4.3649547399719840e01, 1.8135965626177861e+00 ]
3.2 Effective core potentials (ecp group)
An effective core potential (ECP) \(V_A^{\text{ECP}}\) replacing the core electrons of atom \(A\) can be expressed as \[ V_A^{\text{ECP}} = V_{A \ell_{\max}+1} + \sum_{\ell=0}^{\ell_{\max}} V_{A \ell} \sum_{m=\ell}^{\ell}  Y_{\ell m} \rangle \langle Y_{\ell m}  \]
The first term in the equation above is sometimes attributed to the local channel, while the remaining terms correspond to the nonlocal channel projections.
All the functions \(V_{A\ell}\) are parameterized as: \[ V_{A \ell}(\mathbf{r}) = \sum_{q=1}^{N_{q \ell}} \beta_{A q \ell}\, \mathbf{r}\mathbf{R}_{A}^{n_{A q \ell}}\, e^{\alpha_{A q \ell} \mathbf{r}\mathbf{R}_{A}^2 } \].
See http://dx.doi.org/10.1063/1.4984046 or https://doi.org/10.1063/1.5121006 for more info.
Variable  Type  Dimensions  Description 

max_ang_mom_plus_1 
int 
(nucleus.num) 
\(\ell_{\max}+1\), one higher than the max angular momentum in the removed core orbitals 
z_core 
int 
(nucleus.num) 
Number of core electrons to remove per atom 
num 
dim 
Total number of ECP functions for all atoms and all values of \(\ell\)  
ang_mom 
int 
(ecp.num) 
Onetoone correspondence between ECP items and the angular momentum \(\ell\) 
nucleus_index 
index 
(ecp.num) 
Onetoone correspondence between ECP items and the atom index 
exponent 
float 
(ecp.num) 
\(\alpha_{A q \ell}\) all ECP exponents 
coefficient 
float 
(ecp.num) 
\(\beta_{A q \ell}\) all ECP coefficients 
power 
int 
(ecp.num) 
\(n_{A q \ell}\) all ECP powers 
There might be some confusion in the meaning of the \(\ell_{\max}\). It can be attributed to the maximum angular momentum occupied in the core orbitals, which are removed by the ECP. On the other hand, it can be attributed to the maximum angular momentum of the ECP that replaces the core electrons. Note, that the latter \(\ell_{\max}\) is always higher by 1 than the former.
Note for developers: avoid having variables with similar prefix
in their name. The HDF5 back end might cause issues due to the way
find_dataset
function works. For example, in the ECP group we
use max_ang_mom
and not ang_mom_max
. The latter causes issues
when written before the ang_mom
array in the TREXIO file.
Update: in fact, the aforementioned issue has only been observed
when using HDF5 version 1.10.4 installed via aptget
. Installing
the same version from the condaforge
channel and running it in
an isolated conda
environment works just fine. Thus, it seems to
be a bug in the apt
provided package.
If you encounter the aforementioned issue, please report it to our
issue tracker on GitHub.
3.2.1 Example
For example, consider H_{2} molecule with the following effective core potential (in GAMESS input format for the H atom):
HccECP GEN 0 1 3 1.00000000000000 1 21.24359508259891 21.24359508259891 3 21.24359508259891 10.85192405303825 2 21.77696655044365 1 0.00000000000000 2 1.000000000000000
In TREXIO representation this would be:
num = 8 # lmax+1 per atom max_ang_mom_plus_1 = [ 1, 1 ] # number of core electrons to remove per atom zcore = [ 0, 0 ] # first 4 ECP elements correspond to the first H atom ; the remaining 4 elements are for the second H atom nucleus_index = [ 0, 0, 0, 0, 1, 1, 1, 1 ] # 3 first ECP elements correspond to potential of the P orbital (l=1), then 1 element for the S orbital (l=0) ; similar for the second H atom ang_mom = [ 1, 1, 1, 0, 1, 1, 1, 0 ] # ECP quantities that can be attributed to atoms and/or angular momenta based on the aforementioned ecp_nucleus and ecp_ang_mom arrays coefficient = [ 1.00000000000000, 21.24359508259891, 10.85192405303825, 0.00000000000000, 1.00000000000000, 21.24359508259891, 10.85192405303825, 0.00000000000000 ] exponent = [ 21.24359508259891, 21.24359508259891, 21.77696655044365, 1.000000000000000, 21.24359508259891, 21.24359508259891, 21.77696655044365, 1.000000000000000 ] power = [ 1, 1, 0, 0, 1, 1, 0, 0 ]
3.3 Numerical integration grid (grid group)
In some applications, such as DFT calculations, integrals have to be computed numerically on a grid. A common choice for the angular grid is the one proposed by Lebedev and Laikov [Russian Academy of Sciences Doklady Mathematics, Volume 59, Number 3, 1999, pages 477481]. For the radial grids, many approaches have been developed over the years.
The structure of this group is adapted for the numgrid library. Feel free to submit a PR if you find missing options/functionalities.
Variable  Type  Dimensions  Description 

description 
str 
Details about the used quadratures can go here  
rad_precision 
float 
Radial precision parameter (not used in some schemes like KrackKöster)  
num 
dim 
Number of grid points  
max_ang_num 
int 
Maximum number of angular grid points (for pruning)  
min_ang_num 
int 
Minimum number of angular grid points (for pruning)  
coord 
float 
(grid.num) 
Discretized coordinate space 
weight 
float 
(grid.num) 
Grid weights according to a given partitioning (e.g. Becke) 
ang_num 
dim 
Number of angular integration points (if used)  
ang_coord 
float 
(grid.ang_num) 
Discretized angular space (if used) 
ang_weight 
float 
(grid.ang_num) 
Angular grid weights (if used) 
rad_num 
dim 
Number of radial integration points (if used)  
rad_coord 
float 
(grid.rad_num) 
Discretized radial space (if used) 
rad_weight 
float 
(grid.rad_num) 
Radial grid weights (if used) 
4 Orbitals
4.1 Atomic orbitals (ao group)
AOs are defined as
\[ \chi_i (\mathbf{r}) = \mathcal{N}_i'\, P_{\eta(i)}(\mathbf{r})\, R_{s(i)} (\mathbf{r}) \]
where \(i\) is the atomic orbital index, \(P\) refers to either polynomials or spherical harmonics, and \(s(i)\) specifies the shell on which the AO is expanded.
\(\eta(i)\) denotes the chosen angular function. The AOs can be expressed using real spherical harmonics or polynomials in Cartesian coordinates. In the case of real spherical harmonics, the AOs are ordered as \(0, +1, 1, +2, 2, \dots, + m, m\) (see Wikipedia). In the case of polynomials, the canonical (or alphabetical) ordering is used,
\(p\)  \(p_x, p_y, p_z\) 
\(d\)  \(d_{xx}, d_{xy}, d_{xz}, d_{yy}, d_{yz}, d_{zz}\) 
\(f\)  \(f_{xxx}, f_{xxy}, f_{xxz}, f_{xyy}, f_{xyz}\), 
\(f_{xzz}, f_{yyy}, f_{yyz}, f_{yzz}, f_{zzz}\)  
\(\vdots\) 
Note that for \(p\) orbitals in spherical coordinates, the ordering is \(0,+1,1\) which corresponds to \(p_z, p_x, p_y\).
\(\mathcal{N}_i'\) is a normalization factor that allows for different
normalization coefficients within a single shell, as in the GAMESS
convention where each individual function is unitnormalized.
Using GAMESS convention, the normalization factor of the shell
\(\mathcal{N}_d\) in the basis
group is appropriate for instance
for the \(d_z^2\) function (i.e.
\(\mathcal{N}_{d}\equiv\mathcal{N}_{z^2}\)) but not for the \(d_{xy}\)
AO, so the correction factor \(\mathcal{N}_i'\) for \(d_{xy}\) in the
ao
groups is the ratio \(\frac{\mathcal{N}_{xy}}{\mathcal{N}_{z^2}}\).
Variable  Type  Dimensions  Description 

cartesian 
int 
1 : true, 0 : false 

num 
dim 
Total number of atomic orbitals  
shell 
index 
(ao.num) 
Basis set shell for each AO 
normalization 
float 
(ao.num) 
Normalization factor \(\mathcal{N}_i\) 
4.1.1 Oneelectron integrals (ao_1e_int
group)
 \[ \hat{V}_{\text{ne}} = \sum_{A=1}^{N_\text{nucl}} \sum_{i=1}^{N_\text{elec}} \frac{Z_A }{\vert \mathbf{R}_A  \mathbf{r}_i \vert} \] : electronnucleus attractive potential,
 \[ \hat{T}_{\text{e}} = \sum_{i=1}^{N_\text{elec}} \frac{1}{2}\hat{\Delta}_i \] : electronic kinetic energy
 \(\hat{h} = \hat{T}_\text{e} + \hat{V}_\text{ne} + \hat{V}_\text{ECP}\) : core electronic Hamiltonian
The oneelectron integrals for a oneelectron operator \(\hat{O}\) are \[ \langle p \vert \hat{O} \vert q \rangle \], returned as a matrix over atomic orbitals.
Variable  Type  Dimensions  Description 

overlap 
float 
(ao.num, ao.num) 
\(\langle p \vert q \rangle\) 
kinetic 
float 
(ao.num, ao.num) 
\(\langle p \vert \hat{T}_e \vert q \rangle\) 
potential_n_e 
float 
(ao.num, ao.num) 
\(\langle p \vert \hat{V}_{\text{ne}} \vert q \rangle\) 
ecp 
float 
(ao.num, ao.num) 
\(\langle p \vert \hat{V}_{\text{ecp}} \vert q \rangle\) 
core_hamiltonian 
float 
(ao.num, ao.num) 
\(\langle p \vert \hat{h} \vert q \rangle\) 
overlap_im 
float 
(ao.num, ao.num) 
\(\langle p \vert q \rangle\) (imaginary part) 
kinetic_im 
float 
(ao.num, ao.num) 
\(\langle p \vert \hat{T}_e \vert q \rangle\) (imaginary part) 
potential_n_e_im 
float 
(ao.num, ao.num) 
\(\langle p \vert \hat{V}_{\text{ne}} \vert q \rangle\) (imaginary part) 
ecp_im 
float 
(ao.num, ao.num) 
\(\langle p \vert \hat{V}_{\text{ECP}} \vert q \rangle\) (imaginary part) 
core_hamiltonian_im 
float 
(ao.num, ao.num) 
\(\langle p \vert \hat{h} \vert q \rangle\) (imaginary part) 
4.1.2 Twoelectron integrals (ao_2e_int
group)
The twoelectron integrals for a twoelectron operator \(\hat{O}\) are \[ \langle p q \vert \hat{O} \vert r s \rangle \] in physicists notation, where \(p,q,r,s\) are indices over atomic orbitals.
 \[ \hat{W}_{\text{ee}} = \sum_{i=2}^{N_\text{elec}} \sum_{j=1}^{i1} \frac{1}{\vert \mathbf{r}_i  \mathbf{r}_j \vert} \] : electronelectron repulsive potential operator.
 \[ \hat{W}^{lr}_{\text{ee}} = \sum_{i=2}^{N_\text{elec}} \sum_{j=1}^{i1} \frac{\text{erf}(\mu\, \vert \mathbf{r}_i  \mathbf{r}_j \vert)}{\vert \mathbf{r}_i  \mathbf{r}_j \vert} \] : electronelectron long range potential
The Cholesky decomposition of the integrals can also be stored:
\[ \langle ij  kl \rangle = \sum_{\alpha} G_{ik\alpha} G_{jl\alpha} \]
Variable  Type  Dimensions  Description 

eri 
float sparse 
(ao.num, ao.num, ao.num, ao.num) 
Electron repulsion integrals 
eri_lr 
float sparse 
(ao.num, ao.num, ao.num, ao.num) 
Longrange electron repulsion integrals 
eri_cholesky_num 
dim 
Number of Cholesky vectors for ERI  
eri_cholesky 
float sparse 
(ao.num, ao.num, ao_2e_int.eri_cholesky_num) 
Cholesky decomposition of the ERI 
eri_lr_cholesky_num 
dim 
Number of Cholesky vectors for long range ERI  
eri_lr_cholesky 
float sparse 
(ao.num, ao.num, ao_2e_int.eri_lr_cholesky_num) 
Cholesky decomposition of the long range ERI 
4.2 Molecular orbitals (mo group)
Variable  Type  Dimensions  Description 

type 
str 
Free text to identify the set of MOs (HF, Natural, Local, CASSCF, etc)  
num 
dim 
Number of MOs  
coefficient 
float 
(ao.num, mo.num) 
MO coefficients 
coefficient_im 
float 
(ao.num, mo.num) 
MO coefficients (imaginary part) 
class 
str 
(mo.num) 
Choose among: Core, Inactive, Active, Virtual, Deleted 
symmetry 
str 
(mo.num) 
Symmetry in the point group 
occupation 
float 
(mo.num) 
Occupation number 
energy 
float 
(mo.num) 
For canonical MOs, corresponding eigenvalue 
spin 
int 
(mo.num) 
For UHF wave functions, 0 is \(\alpha\) and 1 is \(\beta\) 
4.2.1 Oneelectron integrals (mo_1e_int
group)
The operators as the same as those defined in the AO oneelectron integrals section. Here, the integrals are given in the basis of molecular orbitals.
Variable  Type  Dimensions  Description 

overlap 
float 
(mo.num, mo.num) 
\(\langle i \vert j \rangle\) 
kinetic 
float 
(mo.num, mo.num) 
\(\langle i \vert \hat{T}_e \vert j \rangle\) 
potential_n_e 
float 
(mo.num, mo.num) 
\(\langle i \vert \hat{V}_{\text{ne}} \vert j \rangle\) 
ecp 
float 
(mo.num, mo.num) 
\(\langle i \vert \hat{V}_{\text{ECP}} \vert j \rangle\) 
core_hamiltonian 
float 
(mo.num, mo.num) 
\(\langle i \vert \hat{h} \vert j \rangle\) 
overlap_im 
float 
(mo.num, mo.num) 
\(\langle i \vert j \rangle\) (imaginary part) 
kinetic_im 
float 
(mo.num, mo.num) 
\(\langle i \vert \hat{T}_e \vert j \rangle\) (imaginary part) 
potential_n_e_im 
float 
(mo.num, mo.num) 
\(\langle i \vert \hat{V}_{\text{ne}} \vert j \rangle\) (imaginary part) 
ecp_im 
float 
(mo.num, mo.num) 
\(\langle i \vert \hat{V}_{\text{ECP}} \vert j \rangle\) (imaginary part) 
core_hamiltonian_im 
float 
(mo.num, mo.num) 
\(\langle i \vert \hat{h} \vert j \rangle\) (imaginary part) 
4.2.2 Twoelectron integrals (mo_2e_int
group)
The operators are the same as those defined in the AO twoelectron integrals section. Here, the integrals are given in the basis of molecular orbitals.
Variable  Type  Dimensions  Description 

eri 
float sparse 
(mo.num, mo.num, mo.num, mo.num) 
Electron repulsion integrals 
eri_lr 
float sparse 
(mo.num, mo.num, mo.num, mo.num) 
Longrange electron repulsion integrals 
eri_cholesky_num 
dim 
Number of Cholesky vectors for ERI  
eri_cholesky 
float sparse 
(mo.num, mo.num, mo_2e_int.eri_cholesky_num) 
Cholesky decomposition of the ERI 
eri_lr_cholesky_num 
dim 
Number of Cholesky vectors for long range ERI  
eri_lr_cholesky 
float sparse 
(mo.num, mo.num, mo_2e_int.eri_lr_cholesky_num) 
Cholesky decomposition of the long range ERI 
5 Multideterminant information
5.1 Slater determinants (determinant group)
The configuration interaction (CI) wave function \(\Psi\) can be expanded in the basis of Slater determinants \(D_I\) as follows
\[ \Psi = \sum_I C_I D_I \]
For relatively small expansions, a given determinant can be represented as a list of occupied orbitals. However, this becomes unfeasible for larger expansions and requires more advanced data structures. The bit field representation is used here, namely a given determinant is represented as \(N_{\text{int}}\) 64bit integers where jth bit is set to 1 if there is an electron in the jth orbital and 0 otherwise. This gives access to larger determinant expansions by optimising the storage of the determinant lists in the memory.
\[ D_I = \alpha_1 \alpha_2 \ldots \alpha_{n_\uparrow} \beta_1 \beta_2 \ldots \beta_{n_\downarrow} \]
where \(\alpha\) and \(\beta\) denote ↑spin and ↓spin electrons, respectively,
\(n_\uparrow\) and \(n_\downarrow\) correspond to electron.up_num
and electron.dn_num
, respectively.
Note: the special
attribute is present in the types, meaning that the source node is not
produced by the code generator.
An illustration on how to read determinants is presented in the examples.
Variable  Type  Dimensions  Description 

num 
dim readonly 
Number of determinants  
list 
int special 
(determinant.num) 
List of determinants as integer bit fields 
coefficient 
float buffered 
(determinant.num) 
Coefficients of the determinants from the CI expansion 
5.2 Configuration state functions (csf group)
The configuration interaction (CI) wave function \(\Psi\) can be expanded in the basis of configuration state functions (CSFs) \(\Psi_I\) as follows
\[ \Psi = \sum_I C_I \psi_I. \]
Each CSF \(\psi_I\) is a linear combination of Slater determinants. Slater
determinants are stored in the determinant
section. In this group
we store the CI coefficients in the basis of CSFs, and the
matrix \(\langle D_I  \psi_J \rangle\) needed to project the CSFs in
the basis of Slater determinants.
Variable  Type  Dimensions  Description 

num 
dim readonly 
Number of CSFs  
coefficient 
float buffered 
(csf.num) 
Coefficients \(C_I\) of the CSF expansion 
det_coefficient 
float sparse 
(determinant.num,csf.num) 
Projection on the determinant basis 
5.3 Amplitudes (amplitude group)
The wave function may be expressed in terms of action of the cluster operator \(\hat{T}\):
\[ \hat{T} = \hat{T}_1 + \hat{T}_2 + \hat{T}_3 + \dots \]
on a reference wave function \(\Psi\), where \(\hat{T}_1\) is the single excitation operator,
\[ \hat{T}_1 = \sum_{ia} t_{i}^{a}\, \hat{a}^\dagger_a \hat{a}_i, \]
\(\hat{T}_2\) is the double excitation operator,
\[ \hat{T}_2 = \frac{1}{4} \sum_{ijab} t_{ij}^{ab}\, \hat{a}^\dagger_a \hat{a}^\dagger_b \hat{a}_j \hat{a}_i, \]
etc. Indices \(i\), \(j\), \(a\) and \(b\) denote molecular orbital indices.
Wave functions obtained with perturbation theory or configuration interaction are of the form
\[ \Phi\rangle = \hat{T}\Psi\rangle \]
and coupledcluster wave functions are of the form
\[ \Phi\rangle = e^{\hat{T}} \Psi \rangle \]
The reference wave function is stored using the determinant
and/or
csf
groups, and the amplitudes are stored using the current group.
The attributes with the exp
suffix correspond to exponentialized operators.
The order of the indices is chosen such that
t(i,a)
= \(t_{i}^{a}\).t(i,j,a,b)
= \(t_{ij}^{ab}\),t(i,j,k,a,b,c)
= \(t_{ijk}^{abc}\),t(i,j,k,l,a,b,c,d)
= \(t_{ijkl}^{abcd}\), \(\dots\)
Variable  Type  Dimensions  Description 

single 
float sparse 
(mo.num,mo.num) 
Single excitation amplitudes 
single_exp 
float sparse 
(mo.num,mo.num) 
Exponentialized single excitation amplitudes 
double 
float sparse 
(mo.num,mo.num,mo.num,mo.num) 
Double excitation amplitudes 
double_exp 
float sparse 
(mo.num,mo.num,mo.num,mo.num) 
Exponentialized double excitation amplitudes 
triple 
float sparse 
(mo.num,mo.num,mo.num,mo.num,mo.num,mo.num) 
Triple excitation amplitudes 
triple_exp 
float sparse 
(mo.num,mo.num,mo.num,mo.num,mo.num,mo.num) 
Exponentialized triple excitation amplitudes 
quadruple 
float sparse 
(mo.num,mo.num,mo.num,mo.num,mo.num,mo.num,mo.num,mo.num) 
Quadruple excitation amplitudes 
quadruple_exp 
float sparse 
(mo.num,mo.num,mo.num,mo.num,mo.num,mo.num,mo.num,mo.num) 
Exponentialized quadruple excitation amplitudes 
5.4 Reduced density matrices (rdm group)
The reduced density matrices are defined in the basis of molecular orbitals.
The ↑spin and ↓spin components of the onebody density matrix are given by
\begin{eqnarray*} \gamma_{ij}^{\uparrow} &=& \langle \Psi  \hat{a}^{\dagger}_{j\alpha}\, \hat{a}_{i\alpha}  \Psi \rangle \\ \gamma_{ij}^{\downarrow} &=& \langle \Psi  \hat{a}^{\dagger}_{j\beta} \, \hat{a}_{i\beta}  \Psi \rangle \end{eqnarray*}and the spinsummed onebody density matrix is \[ \gamma_{ij} = \gamma^{\uparrow}_{ij} + \gamma^{\downarrow}_{ij} \]
The \(\uparrow \uparrow\), \(\downarrow \downarrow\), \(\uparrow \downarrow\) components of the twobody density matrix are given by
\begin{eqnarray*} \Gamma_{ijkl}^{\uparrow \uparrow} &=& \langle \Psi  \hat{a}^{\dagger}_{k\alpha}\, \hat{a}^{\dagger}_{l\alpha} \hat{a}_{j\alpha}\, \hat{a}_{i\alpha}  \Psi \rangle \\ \Gamma_{ijkl}^{\downarrow \downarrow} &=& \langle \Psi  \hat{a}^{\dagger}_{k\beta}\, \hat{a}^{\dagger}_{l\beta} \hat{a}_{j\beta}\, \hat{a}_{i\beta}  \Psi \rangle \\ \Gamma_{ijkl}^{\uparrow \downarrow} &=& \langle \Psi  \hat{a}^{\dagger}_{k\alpha}\, \hat{a}^{\dagger}_{l\beta} \hat{a}_{j\beta}\, \hat{a}_{i\alpha}  \Psi \rangle + \langle \Psi  \hat{a}^{\dagger}_{l\alpha}\, \hat{a}^{\dagger}_{k\beta} \hat{a}_{i\beta}\, \hat{a}_{j\alpha}  \Psi \rangle \\ \end{eqnarray*}and the spinsummed onebody density matrix is \[ \Gamma_{ijkl} = \Gamma_{ijkl}^{\uparrow \uparrow} + \Gamma_{ijkl}^{\downarrow \downarrow} + \Gamma_{ijkl}^{\uparrow \downarrow}. \]
The total energy can be computed as: \[ E = E_{\text{NN}} + \sum_{ij} \gamma_{ij} \langle jhi \rangle + \frac{1}{2} \sum_{ijlk} \Gamma_{ijkl} \langle k l  i j \rangle \]
To compress the storage, the Cholesky decomposition of the RDMs can be stored:
\[ \Gamma_{ijkl} = \sum_{\alpha} G_{ij\alpha} G_{kl\alpha} \]
Warning: as opposed to electron repulsion integrals, the decomposition is made such that the Cholesky vectors are expanded in a twoelectron basis \(f_{ij}(\mathbf{r}_1,\mathbf{r}_2) = \phi_i(\mathbf{r}_1) \phi_j(\mathbf{r}_2)\), whereas in electron repulsion integrals each Cholesky vector is expressed in a basis of a oneelectron function \(g_{ik}(\mathbf{r}_1) = \phi_i(\mathbf{r}_1) \phi_k(\mathbf{r}_1)\).
Variable  Type  Dimensions  Description 

1e 
float 
(mo.num, mo.num) 
One body density matrix 
1e_up 
float 
(mo.num, mo.num) 
↑spin component of the one body density matrix 
1e_dn 
float 
(mo.num, mo.num) 
↓spin component of the one body density matrix 
2e 
float sparse 
(mo.num, mo.num, mo.num, mo.num) 
Twobody reduced density matrix (spin trace) 
2e_upup 
float sparse 
(mo.num, mo.num, mo.num, mo.num) 
↑↑ component of the twobody reduced density matrix 
2e_dndn 
float sparse 
(mo.num, mo.num, mo.num, mo.num) 
↓↓ component of the twobody reduced density matrix 
2e_updn 
float sparse 
(mo.num, mo.num, mo.num, mo.num) 
↑↓ component of the twobody reduced density matrix 
2e_cholesky_num 
dim 
Number of Cholesky vectors  
2e_cholesky 
float sparse 
(mo.num, mo.num, rdm.2e_cholesky_num) 
Cholesky decomposition of the twobody RDM (spin trace) 
2e_upup_cholesky_num 
dim 
Number of Cholesky vectors  
2e_upup_cholesky 
float sparse 
(mo.num, mo.num, rdm.2e_upup_cholesky_num) 
Cholesky decomposition of the twobody RDM (↑↑) 
2e_dndn_cholesky_num 
dim 
Number of Cholesky vectors  
2e_dndn_cholesky 
float sparse 
(mo.num, mo.num, rdm.2e_dndn_cholesky_num) 
Cholesky decomposition of the twobody RDM (↓↓) 
2e_updn_cholesky_num 
dim 
Number of Cholesky vectors  
2e_updn_cholesky 
float sparse 
(mo.num, mo.num, rdm.2e_updn_cholesky_num) 
Cholesky decomposition of the twobody RDM (↑↓) 
6 Correlation factors
6.1 Jastrow factor (jastrow group)
The Jastrow factor is an $N$electron function which multiplies the CI expansion: \(\Psi = \Phi \times \exp(J)\),
In the following, we use the notations \(r_{ij} = \mathbf{r}_i  \mathbf{r}_j\) and \(R_{i\alpha} = \mathbf{r}_i  \mathbf{R}_\alpha\), where indices \(i\) and \(j\) refer to electrons and \(\alpha\) to nuclei.
Parameters for multiple forms of Jastrow factors can be saved in
TREXIO files, and are described in the following sections. These
are identified by the type
attribute. The type can be one of the
following:
CHAMP
Mu
6.1.1 CHAMP
The first form of Jastrow factor is the one used in the CHAMP program:
\[ J(\mathbf{r},\mathbf{R}) = J_{\text{eN}}(\mathbf{r},\mathbf{R}) + J_{\text{ee}}(\mathbf{r}) + J_{\text{eeN}}(\mathbf{r},\mathbf{R}) \]
\(J_{\text{eN}}\) contains electronnucleus terms:
\[ J_{\text{eN}}(\mathbf{r},\mathbf{R}) = \sum_{i=1}^{N_\text{elec}} \sum_{\alpha=1}^{N_\text{nucl}}\left[ \frac{a_{1,\alpha}\, f_\alpha(R_{i\alpha})}{1+a_{2,\alpha}\, f_\alpha(R_{i\alpha})} + \sum_{p=2}^{N_\text{ord}^a} a_{p+1,\alpha}\, [f_\alpha(R_{i\alpha})]^p  J_{\text{eN}}^\infty \right] \]
\(J_{\text{ee}}\) contains electronelectron terms:
\[ J_{\text{ee}}(\mathbf{r}) = \sum_{i=1}^{N_\text{elec}} \sum_{j=1}^{i1} \left[ \frac{\frac{1}{2}\big(1 + \delta^{\uparrow\downarrow}_{ij}\big)\,b_1\, f_{\text{ee}}(r_{ij})}{1+b_2\, f_{\text{ee}}(r_{ij})} + \sum_{p=2}^{N_\text{ord}^b} b_{p+1}\, [f_{\text{ee}}(r_{ij})]^p  J_{\text{ee},ij}^\infty \right] \]
\(\delta^{\uparrow\downarrow}_{ij}\) is zero when the electrons \(i\) and \(j\) have the same spin, and one otherwise.
\(J_{\text{eeN}}\) contains electronelectronNucleus terms:
\[ J_{\text{eeN}}(\mathbf{r},\mathbf{R}) = \sum_{\alpha=1}^{N_{\text{nucl}}} \sum_{i=1}^{N_{\text{elec}}} \sum_{j=1}^{i1} \sum_{p=2}^{N_{\text{ord}}} \sum_{k=0}^{p1} \sum_{l=0}^{pk2\delta_{k,0}} c_{lkp\alpha} \left[ g_{\text{ee}}({r}_{ij}) \right]^k \nonumber \\ \left[ \left[ g_\alpha({R}_{i\alpha}) \right]^l + \left[ g_\alpha({R}_{j\alpha}) \right]^l \right] \left[ g_\alpha({R}_{i\,\alpha}) \, g_\alpha({R}_{j\alpha}) \right]^{(pkl)/2} \] \(c_{lkp\alpha}\) are nonzero only when \(pkl\) is even.
The terms \(J_{\text{ee},ij}^\infty\) and \(J_{\text{eN}}^\infty\) are shifts to ensure that \(J_{\text{eN}}\) and \(J_{\text{ee}}\) have an asymptotic value of zero:
\[ J_{\text{eN}}^{\infty} = \frac{a_{1,\alpha}\, \kappa_\alpha^{1}}{1+a_{2,\alpha}\, \kappa_\alpha^{1}} + \sum_{p=2}^{N_\text{ord}^a} a_{p+1,\alpha}\, \kappa_\alpha^{p} \] \[ J_{\text{ee},ij}^{\infty} = \frac{\frac{1}{2}\big(1 + \delta^{\uparrow\downarrow}_{ij}\big)\,b_1\, \kappa_{\text{ee}}^{1}}{1+b_2\, \kappa_{\text{ee}}^{1}} + \sum_{p=2}^{N_\text{ord}^b} b_{p+1}\, \kappa_{\text{ee}}^{p} \]
\(f\) and \(g\) are scaling function defined as
\[ f_\alpha(r) = \frac{1e^{\kappa_\alpha\, r}}{\kappa_\alpha} \text{ and } g_\alpha(r) = e^{\kappa_\alpha\, r}, \]
6.1.2 Mu
MuJastrow is based on a oneparameter correlation factor that has been introduced in the context of transcorrelated methods. This correlation factor imposes the electronelectron cusp, and it is built such that the leading order in \(1/r_{12}\) of the effective twoelectron potential reproduces the longrange interaction of the rangeseparated density functional theory. Its analytical expression reads
\[ J(\mathbf{r}, \mathbf{R}) = J_{\text{eeN}}(\mathbf{r}, \mathbf{R}) + J_{\text{eN}}(\mathbf{r}, \mathbf{R}) \].
The electronelectron cusp is incorporated in the threebody term
\[ J_\text{eeN} (\mathbf{r}, \mathbf{R}) = \sum_{i=1}^{N_\text{elec}} \sum_{j=1}^{i1} \, u\left(\mu, r_{ij}\right) \, \Pi_{\alpha=1}^{N_{\text{nucl}}} \, E_\alpha({R}_{i\alpha}) \, E_\alpha({R}_{j\alpha}), \]
where ww\(u\) is an electronelectron function
\[ u\left(\mu, r\right) = \frac{r}{2} \, \left[ 1  \text{erf}(\mu\, r) \right]  \frac{1}{2 \, \mu \, \sqrt{\pi}} \exp \left[ (\mu \, r)^2 \right]. \]
This electronelectron term is tuned by the parameter \(\mu\) which controls the depth and the range of the Coulomb hole between electrons.
An envelope function has been introduced to cancel out the Jastrow effects between twoelectrons when at least one is close to a nucleus (to perform a frozencore calculation). The envelope function is given by
\[ E_\alpha(R) = 1  \exp\left(  \gamma_{\alpha} \, R^2 \right). \]
In particular, if the parameters \(\gamma_\alpha\) tend to zero, the MuJastrow factor becomes a twobody Jastrow factor:
\[ J_{\text{ee}}(\mathbf{r}) = \sum_{i=1}^{N_\text{elec}} \sum_{j=1}^{i1} \, u\left(\mu, r_{ij}\right) \]
and for large \(\gamma_\alpha\) it becomes zero.
To increase the flexibility of the Jastrow and improve the electron density the following electronnucleus term is added
\[ J_{\text{eN}}(\mathbf{r},\mathbf{R}) = \sum_{i=1}^{N_\text{elec}} \sum_{\alpha=1}^{N_\text{nucl}} \, \left[ \exp\left( a_{\alpha} R_{i \alpha}^2 \right)  1\right]. \]
The parameter \(\mu\) is stored in the ee
array, the parameters
\(\gamma_\alpha\) are stored in the een
array, and the parameters
\(a_\alpha\) are stored in the en
array.
6.1.3 Table of values
Variable  Type  Dimensions  Description 

type 
string 
Type of Jastrow factor: CHAMP or Mu 

en_num 
dim 
Number of Electronnucleus parameters  
ee_num 
dim 
Number of Electronelectron parameters  
een_num 
dim 
Number of Electronelectronnucleus parameters  
en 
float 
(jastrow.en_num) 
Electronnucleus parameters 
ee 
float 
(jastrow.ee_num) 
Electronelectron parameters 
een 
float 
(jastrow.een_num) 
Electronelectronnucleus parameters 
en_nucleus 
index 
(jastrow.en_num) 
Nucleus relative to the eN parameter 
een_nucleus 
index 
(jastrow.een_num) 
Nucleus relative to the eeN parameter 
ee_scaling 
float 
\(\kappa\) value in CHAMP Jastrow for electronelectron distances  
en_scaling 
float 
(nucleus.num) 
\(\kappa\) value in CHAMP Jastrow for electronnucleus distances 
7 Quantum Monte Carlo data (qmc group)
In quantum Monte Carlo calculations, the wave function is evaluated at points of the 3Ndimensional space. Some algorithms require multiple independent walkers, so it is possible to store multiple coordinates, as well as some quantities evaluated at those points.
By convention, the electron coordinates contain first all the electrons of $↑$spin and then all the $↓$spin.
Variable  Type  Dimensions  Description 

num 
dim 
Number of 3Ndimensional points  
point 
float 
(3, electron.num, qmc.num) 
3Ndimensional points 
psi 
float 
(qmc.num) 
Wave function evaluated at the points 
e_loc 
float 
(qmc.num) 
Local energy evaluated at the points 