1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168
#![cfg_attr(
doc,
feature(prelude_import, custom_inner_attributes, proc_macro_hygiene)
)]
#![cfg_attr(doc, katexit::katexit)]
//! Helper crate for einsum algorithm
//!
//! Introduction to einsum
//! -----------------------
//! The Einstein summation rule in theoretical physics and related field
//! including machine learning is a rule for abbreviating tensor operations.
//! For example, one of most basic tensor operation is inner product of
//! two vectors in $n$-dimensional Euclidean space $x, y \in \mathbb{R}^n$:
//! $$
//! (x, y) = \sum_{i \in I} x_i y_i
//! $$
//! where $I$ denotes a set of indices, i.e. $I = \\{0, 1, \ldots, n-1 \\}$.
//! Another example is matrix multiplications.
//! A multiplication of three square matrices $A, B, C \in M_n(\mathbb{R})$
//! can be written as its element:
//! $$
//! ABC_{il} = \sum_{j \in J} \sum_{k \in K} a_{ij} b_{jk} c_{kl}
//! $$
//!
//! Many such tensor operations appear in various field,
//! and we usually define many functions corresponding to each operations.
//! For inner product of vectors, we may defines a function like
//! ```ignore
//! fn inner(a: Array1D<R>, b: Array1D<R>) -> R;
//! ```
//! for matrix multiplication:
//! ```ignore
//! fn matmul(a: Array2D<R>, b: Array2D<R>) -> Array2D<R>;
//! ```
//! or taking three matrices:
//! ```ignore
//! fn matmul3(a: Array2D<R>, b: Array2D<R>, c: Array2D<R>) -> Array2D<R>;
//! ```
//! and so on.
//!
//! These definitions are very similar, and actually,
//! they can be represented in a single manner.
//! These computations multiplicate the element of each tensor with some indices,
//! and sum up them along some indices.
//! So we have to determine
//!
//! - what indices to be used for each tensors in multiplications
//! - what indices to be summed up
//!
//! This can be done by ordering indices for input tensors
//! with a Einstein summation rule, i.e. sum up indices which appears more than once.
//! For example, `inner` is represented by `i,i->`, `matmul` is represented by `ij,jk->ik`,
//! `matmul3` is represented by `ij,jk,kl->il`, and so on
//! where `,` is the separator of each indices
//! and each index must be represented by a single char like `i` or `j`.
//! `->` separates the indices of input tensors and indices of output tensor.
//! If no indices are placed like `i,i->`, it means the tensor is 0-rank, i.e. a scalar value.
//! "einsum" is an algorithm or runtime to be expand such string
//! into actual tensor operations.
//!
//! einsum algorithm
//! -----------------
//! We discuss an overview of einsum algorithm for understanding the structure of this crate.
//!
//! ### Factorize and Memorize partial summation
//! Partial summation and its memorization reduces number of floating point operations.
//! For simplicity, both addition `+` and multiplication `*` are counted as 1 operation,
//! and do not consider fused multiplication-addition (FMA).
//! In the above `matmul3` example, there are $\\#K \times \\#J$ addition
//! and $2 \times \\#K \times \\#J$ multiplications for every indices $(i, l)$,
//! where $\\#$ denotes the number of elements in the index sets.
//! Assuming the all sizes of indices are same and denoted by $N$,
//! there are $O(N^4)$ floating point operations.
//!
//! When we sum up partially along `j`:
//! $$
//! \sum_{k \in K} c_{kl} \left( \sum_{j \in J} a_{ij} b_{jk} \right),
//! $$
//! and memorize its result as $d_{ik}$:
//! $$
//! \sum_{k \in K} c_{kl} d_{ik},
//! \text{where} \space d_{ik} = \sum_{j \in J} a_{ij} b_{jk},
//! $$
//! there are $O(N^3)$ operations for both computing $d_{ik}$ and final summation
//! with $O(N^2)$ memorization storage.
//!
//! When is this factorization possible? We know that above `matmul3` example
//! is also written as associative matrix product $ABC = A(BC) = (AB)C$,
//! and partial summation along $j$ is corresponding to store $D = AB$.
//! This is not always possible. Let us consider a trace of two matrix product
//! $$
//! \text{Tr} (AB) = \sum_{i \in I} \sum_{j \in J} a_{ij} b_{ji}
//! $$
//! This is written as `ij,ji->` in einsum subscript form.
//! We cannot factor out both $a_{ij}$ and $b_{ji}$ out of summation
//! since they contain both indices.
//! Whether this factorization is possible or not can be determined only
//! from einsum subscript form, and we call a subscript is "reducible"
//! if factorization is possible, and "irreducible" if not possible,
//! i.e. `ij,jk,kl->il` is reducible and `ij,ji->` is irreducible.
//!
//! ### Subscript representation
//!
//! To discuss subscript factorization, we have to track which tensors are
//! used as each input.
//! In above `matmul3` example, `ij,jk,kl->il` is factorized into sub-subscripts
//! `ij,jk->ik` and `ik,kl->il` where `ik` in the second subscript uses
//! the output of first subscript. The information of the name of tensors
//! has been dropped from sub-subscripts,
//! and we have to create a mechanism for managing it.
//!
//! We introduce a subscript representation of `matmul3` case with tensor names:
//!
//! ```text
//! ij,jk,kl->il | a b c -> out
//! ```
//!
//! In this form, the factorization can be described:
//!
//! ```text
//! ij,jk->ik | a b -> d
//! ik,kl->il | d c -> out
//! ```
//!
//! To clarify the tensor is given from user or created while factorization,
//! we use `arg{N}` and `out{N}` identifiers:
//!
//! ```text
//! ij,jk->ik | arg0 arg1 -> out0
//! ik,kl->il | out0 arg2 -> out1
//! ```
//!
//! ### Summation order
//!
//! This factorization is not unique.
//! Apparently, there are two ways for `matmul3` case as corresponding to $(AB)C$:
//!
//! ```text
//! ij,jk->ik | arg0 arg1 -> out0
//! ik,kl->il | out0 arg2 -> out1
//! ```
//!
//! and to $A(BC)$:
//!
//! ```text
//! jk,kl->jl | arg1 arg2 -> out0
//! jl,ij->il | out0 arg0 -> out1
//! ```
//!
//! These are different procedure i.e. number of floating operations
//! and required intermediate memories are different,
//! but return same output tensor
//! (we ignore non-associativity of floating numbers on this document).
//! This becomes complicated combinational optimization problem
//! if there are many contraction indicies,
//! and the objective of this crate is to (heuristically) solve this problem.
//!
pub mod codegen;
pub mod parser;
mod namespace;
mod path;
mod subscripts;
pub use namespace::*;
pub use path::*;
pub use subscripts::*;