1. Einstein summation convention

Einstein notation, or Einstein summation convention, is simply a reduced form of well-known summation notation $\sum_{i=1}^n$ introduced by Albert Einstein in 1916. For example, given two vectors $x,y\in\mathbb{R}^n$, we write the inner product $\langle x,y\rangle=\sum_{i=1}^nx_iy_i$ as in new notation $\langle x,y\rangle=x^iy_i$. At the first glance there is nothing special as just omit the summation notation (This is exactly what I feel when I first saw the notation). But I will show you this reduction brings much more than convenience. Moreover, it indicates the object which the component belongs to. Specifically speaking, it distinguishes the type of the tensor.

Before given the formal statement of the convention, let’s start with a few examples. We will denote vectors of dimension $n$ by lower letters $x,y,\ldots$ and matrices of proper dimensions by capital letters $A,B,\ldots$.

• inner product: $\langle x,y\rangle=\sum_{i=1}^nx_iy_i=x^iy_i$
• bilinear form: $A(x,y)=\sum_{i=1}^n\sum_{j=1}^nA_{ij}x_iy_j=A_{ij}x^iy^j$
• linear transformation: $(Ax)_i=\sum_{j=1}^nA_{ij}x_j=(Ax)^i=A^i_jx^j$
• matrix multiplication: $(AB)_{ij}=\sum_{k=1}^nA_{ik}B_{kj}=(AB)^i_j=A^i_kB^k_j$
• trace of matrix: $\mathrm{tr} A=\sum_{i=1}^nA_{ii}=A^i_i$

Several remarks should be noted here. First, we see summation is taken only for those indices that repeat. And the repeated indices always occur in pairs, one in upper and the other in lower position. Once the summation is taken, all possible values of the repeated indices should be contained. So it makes nonsense of expressing $\sum_{i=1}^{n-1}x_iy_i, \sum_{i=1}^nx_iy_j$.

Second, all indices, including repeated and non-repeated indices, are compatible. Note that repeated indices disappear in the result (left hand side of the identity). We call this sort of indices dummy since they represent nothing in the result, which implies we can replace dummy index by any other allowable (this shan’t conflict with the existed indices) letter. That is, $A^i_jx^j$ is equivalent to $A^i_kx^k$, but not to $A^i_ix^i$. As for non-repeated indices, they appear at the same time on both sides of the identity, and at the same position, both in upper or both in lower positions. Note that both superscript and subscript are indices rather than powers. Say, we always use $x^2$ to denote the second component of vector $x$ rather than $x$ squared.

Third, it is readily to verify (leave to readers) that each component, as a scalar, satisfies all arithmetic laws for a field, ie.

• $A^i_jx_iy^j=(A^i_jx_i)y^j=A^i_j(x_iy^j)\qquad(\text{Associative law})$
• $x_iA^i_jy^j=A^i_jx_iy^j=A^i_jy^jx_i\qquad(\text{Commutative law})$
• $A_{ij}(x^j+y^j)=A_{ij}x^j+A_{ik}y^k\qquad(\text{Distributive law})$

With Einstein notation, we can pay more attention on algebraic computing than checking consistency and then deciding appropriate operations between terms because everything works well all the way that is needless to care. We can sometimes surprisingly find some interesting identities which doesn’t seem obvious using notation of vector and matrix algebra. For instance, a series of expressions $\langle x,y\rangle_A, \langle x^TA,y\rangle, A:xy^T, x^TAy, \langle x, Ay \rangle$ naturally equates from the above identities, where $\langle \cdot,\cdot\rangle_A$ means inner product with respect to $A$ and $\cdot:\cdot$ means inner product of matrices. Till now, we are able to summarize and formally state the following.

Einstein summation convention: In an expression, summation is automatically taken over all the values of repeated index which occurs in pairs, once in upper and once in lower position.

Readers may ask why we use both superscript and subscript to represent vectors (there indeed some authors don’t require this). Roughly speaking, a single superscript represents some component of column vector, while a single subscript represents some component of row vector. Column vector space and row vector space are dual to each other in finite dimensional case. Essentially, we adopt superscripts to denote contravariant components and subscripts to denote covariant components. Contravariance and covariance are also a pair of dual concepts implying different transformation laws of tensors when change of basis of base vector space. There is also mix-variance tensor e.g. linear transformation. But I have to put off talking about details on this topic since it worth a whole post.

To tell the truth, I didn’t find it useful in that it is really too simple to give a notice. This opinion gradually turned when I found many applications in matrix calculus for it. The derivative with respect to a vector or a matrix is not so simple as it with respect to scalar because of noncommutativity of matrix. The multiplications of matrices with different orders usually give different results and even different structures. However, with the aid of index notation, the order doesn’t matter and the derivative is just like the original one we familiar with. Then I don’t need to learn and remember those strange rules in matrix calculus. In some sense it is quite enjoyable to solve a “hard” question just like play children’s stuff.

Philosophically, Nothing comes nowhere. Einstein notation is not a simply an abbreviation of summation. I think it implies the essence of linearity. When linear operation is applied on tensor, which is multi-linear object, the linearity suggests we just need to consider each term, or equivalently, the general term. And the sum preserves automatically. The representation focus on micro view of a tensor, indicating explicitly the components and transformation law of tensor. This view regards tensor as nothing but an array of numbers, which is too tight to show off talents of a genius tensor. I personally emphasize the geometric picture of all mathematical concepts and theorems, which provides with some kind of intuition and imagination. Tensor, instead, can also be a linear operator that is coordinate-free. Then we can talk about the domain and range of a tensor, its inverse, adjoint, spectrum etc, in which cases all indices are meaningless. Although not offering geometric picture, Einstein notation itself has enough power surprising everyone from algebraic perspective once introducing only one naive symbol, Kronecker delta.

2. Kronecker delta & Levi-Civita symbol

We introduce two symbols now just … for fun. Wait a second, “Ci” in “Civita” is pronounced as “tree”. Kronecker delta symbol $\delta$ is an indicating function of identification of two indices.

$\delta_{ij}=\delta^i_j=\delta^{ij}=[i=j]=\begin{cases} 1& \text{if } i=j \\ 0 & \text{if } i \neq j\end{cases}$

where $[\cdot]$ is Iverson bracket giving 1 if $\cdot$ holds and 0 otherwise. Kronecker delta looks like identity matrix and plays role of replacing index. For example, $\delta^i_jx^j=x^i$ leaving $x$ invariant and just replacing $j$ by $i$. And $\delta_{ij}x^j=x_i$ not only replaces the index, but also pulls down superscript, which can be seen as transpose of the vector. Note that $\delta^i_i=n$, where $n$ is the dimension of vector space.

Levi-Civita symbol $\epsilon_{i_1\ldots i_n}$ is defined as the sign of permutation $\sigma=(i_1,\ldots,i_n)$, equivalently, $(-1)^p$ where $p$ is the parity of $\sigma$, the number of inversions in $\sigma$. The symbol gives 0 if any two indices of $\{i_1,\ldots,i_n\}$ are the same.

$\epsilon_{i_1\ldots i_n}=\epsilon^{i_1\ldots i_n}=\begin{cases}+1 & \text{if } (i_1,\ldots,i_n) \text{ is even permutation of } (1,\ldots,n) \\ -1 & \text{if } (i_1,\ldots,i_n) \text{ is odd permutation of } (1,\ldots,n) \\ 0 & \text{otherwise}\end{cases}$

Readers have to bear in mind that $\epsilon_{i_1\ldots i_n}$ is not tensor because of its different transformation law, which we call it pseudo-tensor. The interesting thing with Levi-Civita symbol is to compute $D=\epsilon^{i_1\ldots i_n}a^1_{i_1}\cdots a^n_{i_n}$, where $A=(a^i_j)$ is $n$ by $n$ matrix. Note for all $n^2$ terms contained in summation, only those terms having $n$ components of $A$ taken from different rows and columns as factors don’t vanish. Multiplied by sign of permutation $(i_1,\ldots, i_n)$, we find that $D=\mathop{det}A$. Surprising, right? As for a direct application in vector algebra, we have $(x\times y)_i=\epsilon_{ijk}x^jy^k$, considering the determinant rule for computing cross product.

Another interesting thing also comes from computation. Assume $n$ by $n$ matrix $D=(D^i_j)=(\delta^{\mu(i)}_{\nu(j)})$, where $\mu,\nu$ are permutations of order $n$, compute the determinant $\mathop{det}D$. According to the above,

$\mathop{det}D=\epsilon^{\sigma(1)\ldots\sigma(n)}D^1_{\sigma(1)}\cdots D^n_{\sigma(n)}=\epsilon^{\sigma(1)\ldots\sigma(n)}\delta^{\mu(1)}_{\nu(\sigma(1))}\cdots \delta^{\mu(n)}_{\nu(\sigma(n))}$

Note that $\delta^{\mu(i)}_{\nu(\sigma(j))}=[\mu(i)=\nu(\sigma(j))]=[\nu^{-1}(\mu(i))=\sigma(j)]=\delta^{\nu^{-1}(\mu(i))}_{\sigma(j)}$ and $\delta$ functions as only replace index. This leads us,

$\epsilon^{\sigma(1)\ldots\sigma(n)}\delta^{\mu(1)}_{\nu(\sigma(1))}\cdots \delta^{\mu(n)}_{\nu(\sigma(n))}=\epsilon^{\sigma(1)\ldots\sigma(n)}\delta^{\nu^{-1}(\mu(1))}_{\sigma(1)}\cdots \delta^{\nu^{-1}(\mu(n))}_{\sigma(n)}=\epsilon^{\nu^{-1}(\mu(1))\ldots\nu^{-1}(\mu(n))}$

By the definition of Levi-Civita symbol, it’s not hard to obtain

$\epsilon^{\nu^{-1}(\mu(1))\ldots\nu^{-1}(\mu(n))}=\epsilon_{\nu^{-1}(1)\ldots\nu^{-1}(n)}\epsilon^{\mu(1)\ldots\mu(n)}$

Intuitively, $\mathop{det} D$ gives the sign of permutation $\tau=\begin{pmatrix}\nu(1) & \cdots & \nu(n) \\ \mu(1) & \cdots & \mu(n) \end{pmatrix}$. Also, it’s readily to check $\mathop{det} D=0$ whenever $\mu(i)=\mu(j)$ or $\nu(i)=\nu(j)$ for some $i\neq j$. The permutation $\tau$ is so common worthy a new symbol, called generalized Kronecker delta, defined as

$\delta^{i_1\ldots i_m}_{j_1\ldots j_m}=\begin{cases}+1 & \text{if } (i_1,\ldots,i_m) \text{ is even permutation of } (j_1,\ldots,j_m) \\ -1 & \text{if } (i_1,\ldots,i_m) \text{ is odd permutation of } (j_1,\ldots,j_m) \\ 0 & \text{otherwise}\end{cases}$

Note that integer $m$ doesn’t have to be $n$. When $m=n$, we have $\delta^{i_1\ldots i_n}_{j_1\ldots j_n}=\epsilon^{i_1\ldots i_n}\epsilon_{j_1\ldots j_n}$. When $m, the trick is to add dummy indices and consider $\delta^{i_1\ldots i_m k_{m+1} \ldots k_n}_{j_1 \ldots j_m k_{m+1} \ldots k_n}$. By definition, since the last $(n-m)$ indices are the same, we need only to consider the permutation of the rest $m$ indices. With the same reason, summation is automatically taken and produces $(n-m)!$ copies of permutations of the rest indices. Therefore,

$\delta^{i_1 \ldots i_m}_{j_1 \ldots j_m}=\frac{1}{(n-m)!} \delta^{i_1 \ldots i_m k_{m+1} \ldots k_n}_{j_1 \ldots j_m k_{m+1} \ldots k_n}=\frac{(n-l)!}{(n-m)!} \delta^{i_1 \ldots i_l}_{j_1 \ldots j_l}$

Particularly, we have $\delta^{i_1 \ldots i_n}_{i_1 \ldots i_n}=n!$. Let’s see what the role does generalized Kronecker delta play. It doesn’t simply replace the index any more, otherwise $\delta^{i_1 \ldots i_m}_{j_1 \ldots j_m}=\delta^{i_1}_{j_1} \cdots \delta^{i_n}_{j_n}$, which is obviously wrong. Let $S^{i_1 \ldots i_m k_{m+1} \ldots k_n}=\delta^{i_1 \ldots i_m}_{j_1 \ldots j_m}T^{j_1 \ldots j_m k_{m+1} \ldots k_n}=m!T^{[i_1 \ldots i_m] k_{m+1} \ldots k_n}$, where $T^{[i_1 \ldots i_m] k_{m+1} \ldots k_n}=\frac{1}{m!}\sum_{\sigma\in\mathscr{S}(m)}\mathop{sgn}(\sigma) T^{\sigma(i_1) \ldots \sigma(i_m) k_{m+1} \ldots k_n}$. Note that $S^{i_1 \ldots i_m k_{m+1} \ldots k_n}$ alternates sign when interchange any two indices from $\{i_1,\ldots,i_m\}$, actually $\delta$ anti-symmetrizes part or all of components up to a factorial factor!

I think it’s right time to end this post. These notations and symbols are quite simple and common in differential geometry and modern theoretical physics, and also inevitable step to further study. This is my first post, any comments, corrections or suggestions are greatly welcomed.