Crate einsum_codegen
source ·Expand description
Helper crate for einsum algorithm
§Introduction to einsum
The Einstein summation rule in theoretical physics and related field including machine learning is a rule for abbreviating tensor operations. For example, one of most basic tensor operation is inner product of two vectors in $n$-dimensional Euclidean space $x, y \in \mathbb{R}^n$: $$ (x, y) = \sum_{i \in I} x_i y_i $$ where $I$ denotes a set of indices, i.e. $I = \{0, 1, \ldots, n-1 \}$. Another example is matrix multiplications. A multiplication of three square matrices $A, B, C \in M_n(\mathbb{R})$ can be written as its element: $$ ABC_{il} = \sum_{j \in J} \sum_{k \in K} a_{ij} b_{jk} c_{kl} $$
Many such tensor operations appear in various field, and we usually define many functions corresponding to each operations. For inner product of vectors, we may defines a function like
fn inner(a: Array1D<R>, b: Array1D<R>) -> R;
for matrix multiplication:
fn matmul(a: Array2D<R>, b: Array2D<R>) -> Array2D<R>;
or taking three matrices:
fn matmul3(a: Array2D<R>, b: Array2D<R>, c: Array2D<R>) -> Array2D<R>;
and so on.
These definitions are very similar, and actually, they can be represented in a single manner. These computations multiplicate the element of each tensor with some indices, and sum up them along some indices. So we have to determine
- what indices to be used for each tensors in multiplications
- what indices to be summed up
This can be done by ordering indices for input tensors
with a Einstein summation rule, i.e. sum up indices which appears more than once.
For example, inner
is represented by i,i->
, matmul
is represented by ij,jk->ik
,
matmul3
is represented by ij,jk,kl->il
, and so on
where ,
is the separator of each indices
and each index must be represented by a single char like i
or j
.
->
separates the indices of input tensors and indices of output tensor.
If no indices are placed like i,i->
, it means the tensor is 0-rank, i.e. a scalar value.
“einsum” is an algorithm or runtime to be expand such string
into actual tensor operations.
§einsum algorithm
We discuss an overview of einsum algorithm for understanding the structure of this crate.
§Factorize and Memorize partial summation
Partial summation and its memorization reduces number of floating point operations.
For simplicity, both addition +
and multiplication *
are counted as 1 operation,
and do not consider fused multiplication-addition (FMA).
In the above matmul3
example, there are $\#K \times \#J$ addition
and $2 \times \#K \times \#J$ multiplications for every indices $(i, l)$,
where $\#$ denotes the number of elements in the index sets.
Assuming the all sizes of indices are same and denoted by $N$,
there are $O(N^4)$ floating point operations.
When we sum up partially along j
:
$$
\sum_{k \in K} c_{kl} \left( \sum_{j \in J} a_{ij} b_{jk} \right),
$$
and memorize its result as $d_{ik}$:
$$
\sum_{k \in K} c_{kl} d_{ik},
\text{where} \space d_{ik} = \sum_{j \in J} a_{ij} b_{jk},
$$
there are $O(N^3)$ operations for both computing $d_{ik}$ and final summation
with $O(N^2)$ memorization storage.
When is this factorization possible? We know that above matmul3
example
is also written as associative matrix product $ABC = A(BC) = (AB)C$,
and partial summation along $j$ is corresponding to store $D = AB$.
This is not always possible. Let us consider a trace of two matrix product
$$
\text{Tr} (AB) = \sum_{i \in I} \sum_{j \in J} a_{ij} b_{ji}
$$
This is written as ij,ji->
in einsum subscript form.
We cannot factor out both $a_{ij}$ and $b_{ji}$ out of summation
since they contain both indices.
Whether this factorization is possible or not can be determined only
from einsum subscript form, and we call a subscript is “reducible”
if factorization is possible, and “irreducible” if not possible,
i.e. ij,jk,kl->il
is reducible and ij,ji->
is irreducible.
§Subscript representation
To discuss subscript factorization, we have to track which tensors are
used as each input.
In above matmul3
example, ij,jk,kl->il
is factorized into sub-subscripts
ij,jk->ik
and ik,kl->il
where ik
in the second subscript uses
the output of first subscript. The information of the name of tensors
has been dropped from sub-subscripts,
and we have to create a mechanism for managing it.
We introduce a subscript representation of matmul3
case with tensor names:
ij,jk,kl->il | a b c -> out
In this form, the factorization can be described:
ij,jk->ik | a b -> d
ik,kl->il | d c -> out
To clarify the tensor is given from user or created while factorization,
we use arg{N}
and out{N}
identifiers:
ij,jk->ik | arg0 arg1 -> out0
ik,kl->il | out0 arg2 -> out1
§Summation order
This factorization is not unique.
Apparently, there are two ways for matmul3
case as corresponding to $(AB)C$:
ij,jk->ik | arg0 arg1 -> out0
ik,kl->il | out0 arg2 -> out1
and to $A(BC)$:
jk,kl->jl | arg1 arg2 -> out0
jl,ij->il | out0 arg0 -> out1
These are different procedure i.e. number of floating operations and required intermediate memories are different, but return same output tensor (we ignore non-associativity of floating numbers on this document). This becomes complicated combinational optimization problem if there are many contraction indicies, and the objective of this crate is to (heuristically) solve this problem.
Modules§
- Generate einsum implementation
- Parse einsum subscripts
Structs§
- Names of tensors
- Einsum subscripts with tensor names, e.g.
ab,bc->ac | arg0,arg1->out0
Enums§
- Which tensor the subscript specifies