# Crate einsum_codegen

source ·## Expand description

Helper crate for einsum algorithm

### §Introduction to einsum

The Einstein summation rule in theoretical physics and related field including machine learning is a rule for abbreviating tensor operations. For example, one of most basic tensor operation is inner product of two vectors in $n$-dimensional Euclidean space $x, y \in \mathbb{R}^n$: $$ (x, y) = \sum_{i \in I} x_i y_i $$ where $I$ denotes a set of indices, i.e. $I = \{0, 1, \ldots, n-1 \}$. Another example is matrix multiplications. A multiplication of three square matrices $A, B, C \in M_n(\mathbb{R})$ can be written as its element: $$ ABC_{il} = \sum_{j \in J} \sum_{k \in K} a_{ij} b_{jk} c_{kl} $$

Many such tensor operations appear in various field, and we usually define many functions corresponding to each operations. For inner product of vectors, we may defines a function like

`fn inner(a: Array1D<R>, b: Array1D<R>) -> R;`

for matrix multiplication:

`fn matmul(a: Array2D<R>, b: Array2D<R>) -> Array2D<R>;`

or taking three matrices:

`fn matmul3(a: Array2D<R>, b: Array2D<R>, c: Array2D<R>) -> Array2D<R>;`

and so on.

These definitions are very similar, and actually, they can be represented in a single manner. These computations multiplicate the element of each tensor with some indices, and sum up them along some indices. So we have to determine

- what indices to be used for each tensors in multiplications
- what indices to be summed up

This can be done by ordering indices for input tensors
with a Einstein summation rule, i.e. sum up indices which appears more than once.
For example, `inner`

is represented by `i,i->`

, `matmul`

is represented by `ij,jk->ik`

,
`matmul3`

is represented by `ij,jk,kl->il`

, and so on
where `,`

is the separator of each indices
and each index must be represented by a single char like `i`

or `j`

.
`->`

separates the indices of input tensors and indices of output tensor.
If no indices are placed like `i,i->`

, it means the tensor is 0-rank, i.e. a scalar value.
“einsum” is an algorithm or runtime to be expand such string
into actual tensor operations.

### §einsum algorithm

We discuss an overview of einsum algorithm for understanding the structure of this crate.

#### §Factorize and Memorize partial summation

Partial summation and its memorization reduces number of floating point operations.
For simplicity, both addition `+`

and multiplication `*`

are counted as 1 operation,
and do not consider fused multiplication-addition (FMA).
In the above `matmul3`

example, there are $\#K \times \#J$ addition
and $2 \times \#K \times \#J$ multiplications for every indices $(i, l)$,
where $\#$ denotes the number of elements in the index sets.
Assuming the all sizes of indices are same and denoted by $N$,
there are $O(N^4)$ floating point operations.

When we sum up partially along `j`

:
$$
\sum_{k \in K} c_{kl} \left( \sum_{j \in J} a_{ij} b_{jk} \right),
$$
and memorize its result as $d_{ik}$:
$$
\sum_{k \in K} c_{kl} d_{ik},
\text{where} \space d_{ik} = \sum_{j \in J} a_{ij} b_{jk},
$$
there are $O(N^3)$ operations for both computing $d_{ik}$ and final summation
with $O(N^2)$ memorization storage.

When is this factorization possible? We know that above `matmul3`

example
is also written as associative matrix product $ABC = A(BC) = (AB)C$,
and partial summation along $j$ is corresponding to store $D = AB$.
This is not always possible. Let us consider a trace of two matrix product
$$
\text{Tr} (AB) = \sum_{i \in I} \sum_{j \in J} a_{ij} b_{ji}
$$
This is written as `ij,ji->`

in einsum subscript form.
We cannot factor out both $a_{ij}$ and $b_{ji}$ out of summation
since they contain both indices.
Whether this factorization is possible or not can be determined only
from einsum subscript form, and we call a subscript is “reducible”
if factorization is possible, and “irreducible” if not possible,
i.e. `ij,jk,kl->il`

is reducible and `ij,ji->`

is irreducible.

#### §Subscript representation

To discuss subscript factorization, we have to track which tensors are
used as each input.
In above `matmul3`

example, `ij,jk,kl->il`

is factorized into sub-subscripts
`ij,jk->ik`

and `ik,kl->il`

where `ik`

in the second subscript uses
the output of first subscript. The information of the name of tensors
has been dropped from sub-subscripts,
and we have to create a mechanism for managing it.

We introduce a subscript representation of `matmul3`

case with tensor names:

```
ij,jk,kl->il | a b c -> out
```

In this form, the factorization can be described:

```
ij,jk->ik | a b -> d
ik,kl->il | d c -> out
```

To clarify the tensor is given from user or created while factorization,
we use `arg{N}`

and `out{N}`

identifiers:

```
ij,jk->ik | arg0 arg1 -> out0
ik,kl->il | out0 arg2 -> out1
```

#### §Summation order

This factorization is not unique.
Apparently, there are two ways for `matmul3`

case as corresponding to $(AB)C$:

```
ij,jk->ik | arg0 arg1 -> out0
ik,kl->il | out0 arg2 -> out1
```

and to $A(BC)$:

```
jk,kl->jl | arg1 arg2 -> out0
jl,ij->il | out0 arg0 -> out1
```

These are different procedure i.e. number of floating operations and required intermediate memories are different, but return same output tensor (we ignore non-associativity of floating numbers on this document). This becomes complicated combinational optimization problem if there are many contraction indicies, and the objective of this crate is to (heuristically) solve this problem.

## Modules§

- Generate einsum implementation
- Parse einsum subscripts

## Structs§

- Names of tensors
- Einsum subscripts with tensor names, e.g.
`ab,bc->ac | arg0,arg1->out0`

## Enums§

- Which tensor the subscript specifies