An Introduction to Geometric Algebra (Part 1)

1. A little history and philosophy

Numbers and geometry are two fundamental topics from the beginning of mathematics. Thanks to genius René Descartes, one of the greatest philosophers and mathematicians in the 17th century, who invented coordinate geometry (analytical geometry), a solid bridge was established between numbers and geometry. Points are associated to a set of numbers called coordinates, lines and surfaces are to algebraic equations. It seems that Cartesian coordinate system provides a very powerful tool to handle geometric objects and describe transformations. And indeed the idea of coordinate system has influenced the whole math, including the discovery of differential calculus by Newton and Leibniz regarding derivative (algebraic quantity) as the slope of tangent line (geometric quantity), and the emergence of differential geometry of curves and surfaces (geometric object), which can be described by parameterized smooth mappings (algebraic object).

Unfortunately, one has to admit that a lot of computational complexities result from interlude of numbers, which is really costly. In geometry, we mainly concern three aspects: 1. geometric objects themselves, such as shape, size (length, area, volume), angle etc.; 2. relationships between them like intersection, disjoint; 3. transformations, say stretch and contraction, reflection and rotation, also projection and rejection. We find actually all these aspects are independent with coordinate. What does that mean? That means no matter how coordinate system is chosen, these properties are invariant. In this sense, position and orientation of coordinate system doesn’t matter. Coordinate system being either skew or orthogonal, either scaled or normal, has no effect nothing about what we care. Hence a proper coordinate system is anticipated to facilitate the computation, though it is usually hard to get one or even impossible to have one in most cases.

Persons imbedded in Cartesian philosophy may devote their whole energy to finding methods of choosing a good coordinate system. And certainly, there are ones who do not, expecting to describe geometry without coordinate, or more specifically, coordinate-free geometry. Like Josiah Willard Gibbs, another great scientist making indispensable contributions to mathematical physics in the 19th century, found out multiplication of vectors, say dot product and cross product, has interesting geometric interpretations and formally gave them proper notations, which now is well known as part of vector analysis.

Actually, J.W. Gibbs is not the first to discovery an algebra for coordinate-free geometry. William Rowan Hamilton generalized complex number when he walked around Royal Canal in Dublin, Ireland after several years’ thinking, a shape of quaternion emerged into his mind. Then he could not resist his excitement and immediately carved the famous quaternion formula into the stone of Broom Bridge. Quaternion describes rotations in 3 dimension quite well. Inspired by W.R. Hamilton, many algebras such as bicomplex, hypercomplex, biquaternion have been come up with soon, all are associated distinct geometric pictures.

At almost meantime, Herrmann Günther Grassmann, German mathematician, the inventor of multilinear algebra, suggested exterior product in replace of cross product to achieve (multi)vector multiplications in higher dimension. In Grassmann algebra, a p-dimensional object is denoted as a p-vector, associated with magnitude and direction(orientation). For example, 1-vector is as usual a line segment. 2-vector (or bivector) is an area element, with magnitude as the area and direction as the plane which it lies, instead of normal vector in Gibbs’ vector algebra.

Grassmann algebra (also known as exterior algebra) is good enough to answer a lot of geometric questions since it can describe higher dimensional object. But William Kingdon Clifford, mathematician and philosopher, hoped for a unified algebraic framework incorporating all above number systems. His research on extensive Grassmann algebra led him to geometric algebra, which is part of Clifford algebra focusing more on theoretical, rather than geometric aspect. Clifford algebra then offers many great insights in both mathematics and theoretical physics. But it seems its geometric interpretation was forgot by people. Until late 20th century when David Hestenes rediscovery it, geometric algebra resuscitate to apply in more areas than fundamental physics, including image processing and robotics. D. Hestnes said, “geometry without algebra is dumb, algebra without geometry is blind.”

We will in the following briefly look at geometric algebra, hoping for an interesting journey.

2. Inspiration for geometric algebra

Let’s see what happens if multiply two vectors in \mathbb R^3. Let \{e_1, e_2,e_3\} be a set of basis for \mathbb R^3 and two vectors x=x^1e_1+x^2e_2+x^3e_3,\ y=y^1e_1+y^2e_2+y^3e_3, only associative law and distributive law assumed.

\begin{aligned} xy&=(x^1e_1+x^2e_2+x^3e_3)(y^1e_1+y^2e_2+y^3e_3)\\&=x^1y^1e_1e_1+x^1y^2e_1e_2+x^1y^3e_1e_3\\&+x^2y^1e_2e_1+x^2y^2e_2e_2+x^2y^3e_2e_3\\&+x^3y^1e_3e_1+x^3y^2e_3e_2+x^3y^3e_3e_3 \end{aligned}

To endow the product with geometric meaning, we wish to have e_ie_i=1,\ e_ie_j=-e_je_i\ (i,j=1,2,3,\ j\neq i). Then the product becomes

\begin{aligned}xy=&\underbrace{x^1y^1+x^2y^2+x^3y^3}_{x\cdot y}\\+&\underbrace{(\overbrace{x^1y^2-x^2y^1}^{(x\times y)_3})e_1e_2+(\overbrace{x^2y^3-x^3y^2}^{(x\times y)_1})e_2e_3+(\overbrace{x^3y^1-x^1y^3}^{(x\times y)_2})e_3e_1}_{x\times y}\end{aligned}

It’s readily seen that the product, composing of dot product and cross product, also applies to 2 dimensional case. Readers may find out that we add a non-scalar e_ie_j to a scalar. Does that make any sense? What is the result? Actually we have seen similar scenarios before, say a+b\mathbf{i} in complex number, w+x\mathbf{i}+y\mathbf{j}+z\mathbf{k} in quaternion number, and a_0+a_1x+a_2x^2 in quadratic polynomial. The sum here, called formal sum, is just a collector bringing things together. The meaningful part is not the sum itself, instead, each component in the sum. Dot product is usually explained as projection of a vector on the other, but we will see later this is not proper. And cross product can be seen as normal vector of parallelogram with two vectors as sides and its magnitude is the area. However, cross product has no definition for vectors in higher dimension, as which Grassman thought Gibbs’ cross product is flawed. And then he defined anticommutative wedge product, which contributes to

xy=x\cdot y+x\wedge y

Note we also have

\displaystyle{x\cdot y=\frac{xy+yx}{2},\qquad x\wedge y=\frac{xy-yx}{2}}

Equipped with wedge product, we can try to compute triple product with z. Of course, the product with vector is expected to subsume dot product and wedge product.

\begin{aligned}xyz&=(x\cdot y)z+(x\wedge y)z\\&=(x\cdot y)z+(x\wedge y)\cdot z+(x\wedge y)\wedge z\\&=x(y\cdot z)+x(y\wedge z)\\&=(y\cdot z)x+x\cdot(y\wedge z)+x\wedge(y\wedge z)\\&=(y\cdot z)x-(z\cdot x)y+(x\cdot y)z+x\wedge y\wedge z\end{aligned}

The geometric image shows that the sum of any two of the first three terms is perpendicular to the third, and the volume of parallelepiped equals the magnitude of the last term. Similarly, it’s readily to show (show it)

\displaystyle{x\cdot(y\wedge z)=\frac{x(y\wedge z)-(y\wedge z)x}{2}},\qquad x\wedge(y\wedge z)=\frac{x(y\wedge z)+(y\wedge z)x}{2}

3. Some jargons and formulae

Since now we have a first touch on geometric algebra, it’s time to formally describe it. Yes, we don’t define it in this post because it obscures the visual image. Basically a geometric algebra of dimension n, denoted by \mathscr G(V,n), or simply \mathscr G(n) is a vector space V endowed with an associative bilinear product, called geometric product. The element in \mathscr G(n) is called multivector, composing of scalar, vector, area element, etc. A r-vector is a r dimensional geometric object, which is linear combination of r-blades, written as A_r, that each lies in the same subspace as that of r-dimension analogy of parallelepiped with r linearly independent vectors as sides. Formally,

\mathscr G(V,n)=\bigoplus\limits_{r=0}^n\bigwedge^r(V)

Evidently, a multivector A can be resolved into different grades,

\displaystyle{A=\sum_{r=0}^n\langle A\rangle_r=\lambda+v+\sum_{i}A_{2,i} +\cdots +\sum_j A_{n,j}}

where \langle A \rangle_r=\sum_iA_{r,i} takes the degree r component of A, \lambda is scalar, v is vector, and A_r=u_1\wedge \cdots \wedge u_r for r linearly independent vectors u_1,\ldots, u_r. Geometric product with a vector a can be explicitly written as

\displaystyle aA_r=a\cdot A_r+a\wedge A_r,\qquad A_ra=A_r\cdot a+A_r\wedge a

The first part of the product is called inner product and the second part outer product, defined as the following,

\displaystyle a\cdot A_r:=\langle aA_r\rangle_{r-1}=\frac{aA_r-(-1)^rA_ra}{2},\\ a\wedge A_r:=\langle aA_r\rangle_{r+1}=\frac{aA_r+(-1)^rA_ra}{2}

Note that we don’t mention basis, hence coordinate of vector, at all, which embody the property of coordinate-free of geometric algebra.

4. Properties of geometric algebra

We will list a collection of properties in geometric algebra. Readers will see that each property is supported by geometric fact which is very intuitive. By convention, we use lower letters to denote vectors and upper letters for blades.

  1. a^2=a\cdot a,\ a\wedge a = 0
  2. a\cdot A_r=(-1)^{r-1}A_ra,\ a\wedge A_r=(-1)^rA_ra
  3. aA_r=(a^T+a^\perp)A_r=a^T\cdot A_r+a^\perp\wedge A_r
  4. a\cdot(b\wedge c)=a\cdot b\ -a\cdot c\ b
  5. a\cdot b\cdot A_r=(a\wedge b)\cdot A_r

For the limitedness of time, I have to omit the proof and explanation for now. I will make it up in a few days. As we can have seen, inner product with a vector contracts the subspace to one perpendicular to it, while outer product with a vector extends the subspace to one containing it. It is should be emphasized that geometric product can be defined on any two multivectors, which we will discuss it in the following post. Geometric product itself doesn’t have clear geometric insight, yet it collects different geometric facts together.

Vectors: Identifications and Distinctions

Vector is everywhere and plays a fundamental role in different branches of sciences. Though vector has a formal definition from vector space, which is an element of vector space, many still regard vector as an array of numbers with certain magnitude and direction. This classical view, actually, on one hand, doesn’t give a full picture of a true vector, on the other hand, puts something more than a vector should have. Sometimes intuition makes a concept understood easier, and sometimes it makes different concepts look the same. In this post, we will discuss a few sorts of “vectors” and show that although satisfying the definition of the vector, for which many don’t distinguish them, they have so different behaviors under change of basis that worths different names.

1. What is vector

When an object is to be defined, it is usually necessary to give a background in which the object is unambiguous, meaningful and well-defined. Just like in biology, a specie is defined according to its domain, kingdom, phylum, class, order, family, genus. In programming language, a variable is defined according to its environment, domain and type. The background confines the object to a proper extension so that it’s neither too narrow to contain enough information nor too general to attribute required properties. Here, vector is nothing but an object in its background called vector space.

A vector space V is a specific algebraic structure characterized by eight axioms. The element of V, called vector, is denoted by u,v,w etc, and element of a field K, called scalar (or number), is denoted by a,b etc.

  • (u+v)+w=u+(v+w)
  • u+v=v+u
  • \exists 0 \in V s.t. 0+u=u for each u \in V
  • For each u \in V, \exists -u s.t. u+(-u)=0
  • 1u=u, where 1 is multiplicative identity of K
  • (ab)u=a(bu)=(ub)a=u(ba)
  • a(u+v)=au+av
  • (a+b)u=au+bu

From the perspective of universal algebra, a vector space has one 0-ary operation (i.e. constant 0), one unary operation (additive inverse) and two binary operations (vector addition, scalar multiplication), together with several equations such as commutative law, associative law and distributive law etc. The first four axioms state nothing more than that V is essentially an Abelian group. And the fifth axiom, usually ignored by beginners, is necessary to bring into the collection of vectors an external scalar field. Together with the rest ones guaranteeing the compatibility of scalar operations (addition, multiplication) with vector operations (vector addition, vector multiplication by scalar), vector space furnishes a structure of left K-module. Note that I include a new rule a(bu)=(ub)a into the definition to display a very implicit yet natural identification between scalar multiplication from the left side and the right side, namely we don’t distinguish left K-module from right K-module. The feasibility of this identification depends on the commutative law of multiplication of the field, since otherwise ab \neq ba, letting c=ab, u(ba)=(ab)u=cu=uc=u(ab) constructs a contradiction. As we have seen, even the most natural identification may fail when a simple condition doesn’t hold. We should be careful with every intuition before it is rigorously verified from axioms and established theorems. We will encounter many more natural identifications in the following sections, in which readers may gradually feel how unreliable the intuition is.

The abstract definition of vector space is consistent with classical view. If we see vectors as arrows in Euclidean (flat) space (this is necessary to ensure parallel transportation preserves the vector), then vector addition u+v can be viewed as an arrow starting from the initial point of vector u and terminating at the end point of vector v when the end point of u coincides with the initial point of v. Scalar multiplication can be viewed as stretch (prolong and contraction) of vector (with possibly reverse the direction of arrow). Under the view, associative law states that the resultant of vector addition depends only on the initial point of the first vector and the end point of the last vector. Commutative law is simply parallelogram law. There is a special arrow called zero vector 0 such that it starts and terminates at the same point. For each arrow, there is an opposite arrow which just exchanges the initial and the end point.

However, readers should be reminded to distinguish from the classical view of a vector neither having to be associated with magnitude until a norm is defined, nor with a direction with respect to a fixed vector until a non-degenerate inner product is defined. Moreover, vector is not necessarily an array of numbers, called coordinates. For instance, it is readily to verify that the set of all single-variable polynomials of finite degree form a vector space, in which vector addition is polynomial addition (note that it isn’t illegal to adopt polynomial multiplication as vector addition). In this vector space, each polynomial is a vector that cannot be represented by, at least, an array of finite size. In spite of the fact, we can resolve the vector space into direct sum of countable vector subspaces of finite dimension, each of which has coordinate representation for vectors. To consider another example, the collection of all smooth functions defined on real line form a vector space, in which vector addition is function addition. Now a real number, in replace of subset of natural numbers, is used to index “component”. We call this kind of vector spaces (uncountable) infinite dimensional. Sometimes as smooth functions are restricted to analytic ones, we can represent them using power series and hence vector space is of countable infinite (or enumerable) dimensional. [Q: What’s exactly the dimension of vector space of analytic functions on real?] We have seen that coordinate is not essential to a vector, yet it is still a very convenient way to express a vector.

Now we come to see the premise of vector having a coordinate — coordinate system. Basis, as coordinate system, is a linearly independent set of vectors that span the vector space, equivalently, each vector is a unique (finite) linear combination of basis vectors. We spend a little time here claiming that basis always exists, even for infinite dimensional space.

TheoremEvery vector space has a basis, assume Zorn’s lemma.

Proof: Let \mathscr L be the collection of linearly independent subsets. We furnish \mathscr L with partial order structure induced by inclusion. For each chain C \subset \mathscr L, \bar C = \bigcup C is apparently an upper bound of C. We shall require \bar C \in \mathscr L for applying Zorn’s lemma, that is, vectors in \bar C are linear independent. To this end, observe that for any finite subsets of vectors X = \{ v_1,\ldots,v_n \} \subset \bar C, there is an element of the chain c \in C such that v_i \in c for all 1 \leq i \leq n. But c \in \mathscr L is linearly independent set, so is X. By Zorn’s lemma, there exists (at least) a maximal element \bar c \in \mathscr L. We claim \mathscr L spans the vector space. Suppose it doesn’t, then take v\in V \setminus \mathop{span}(\bar c). We have \mathop{span}(\bar c) \cup \{v\} \subset V be linearly independent and hence belongs to \mathscr L, contradicting to maximality of \bar c.    \square

The converse of the theorem, existence of bases of vector space implies Axiom of Choice (AC) has been proved by Andreas Blass in 1984. And thus AC is equivalent to existence of basis.

Now we restrict to finite dimensional case. With the theorem, for each vector v \in V, we can associate a unique coordinate (v^1,\ldots,v^n) as identificator under a specific set of basis vectors \mathscr B = \{e_1,\ldots,e_n\}, and write v=v^ie_i (Here we adopt Einstein summation convention). Obviously, coordinate depends on basis. A vector under different bases may have different coodinates. So let’s see what will happen to coordinate under change of basis.

2. Dual vector: Contravariance vs. Covariance

Suppose we need to express vector v=v^ie_i \in V with respect to a new set of basis \mathscr B' = \{e'_i\}_{1 \leq i \leq n}, where \{ a_j{}^i \} is the transformation matrix, namely e'_j=a_j{}^ie_i. In such basis, vector v=v'^je'_j. Since vector itself doesn’t change with basis (change of basis is essentially passive transformation), we should have

v=v^ie_i=v'^je'_j=v'^ja_j{}^ie_i

and hence v^i=v'^ja_j{}^i, or

v'^j=a^j{}_iv^i

where a^j{}_i is the inverse of a_j{}^i. We find that coordinate of vector changes inversely to transformation law of change of basis. So in general, we speak of vector referring to contravariant vector.

Now we consider dual vector. Define functionals u^* : V \to K, v \mapsto u^*(v) = \langle u,v \rangle for each vector u \in V. Obviously, the set of all such functionals form a vector space V^*, called dual space of V, in which the basis is a set of vectors \{e^{*j}\}_{1 \leq j \leq n} such that e^*{*j}(e_i) = \langle e^{*j},e_i \rangle = \delta^j{}_i, namely the inverse of basis matrix \mathscr B. A dual vector u^* then can be written as u^*=u_je^{*i}. If we change basis from \mathscr B to \mathscr B', letting e'^{*j} = a'^j{}_ie^{*i}, we still have

\delta^j{}_i = \langle e'^{*j}, e'_i \rangle = \langle a'^j{}_le^{*l}, a_i{}^ke_k \rangle = a'^j{}_la_i{}^k \langle e^{*l}, e_k \rangle = a'^j{}_la_i{}^k \delta^l{}_k = a'^j{}_ka_i{}^k

So a'^j{}_k=a^j{}_k is inverse matrix of change of basis and e'^{*j} = a^j{}_ie^{*i}. Then,

u^* = u_ie^{*i} = u'_je'^{*j}=u'_ja^j{}_ie^{*i}

That is, u_i=u'_ja^j{}_i, equivalently,

u'_j=a_j{}^iu_i

As we have seen, the coordinate of dual vector co-vary with the change of basis, which we call it covariant vector. Common examples of dual vectors are row vectors and differential 1-forms. Many think row vectors are just transpose of column vectors and naturally identify two things. Differential 1-forms can also be identified with vectors through musical isomorphism. I will exemplify this natural identification with gradient. Let U \subset \mathbb R^n be an open set, on which a vector field v=v^i\frac{\partial}{\partial x^i} is defined. Suppose f: U \to \mathbb R is a differentiable function with df=\frac{\partial f}{\partial x^j}dx^j, then we have

\langle v, df \rangle =\displaystyle{\langle v^i\frac{\partial}{\partial x^i},\frac{\partial f}{\partial x^j}dx^j \rangle = v^i \frac{\partial f}{\partial x^j} \frac{\partial x^j}{\partial x^i} = v^i\frac{\partial f}{\partial x^i}} = \langle v, \nabla f \rangle

Note that the bracket on the left denotes pairing of vector and dual vector while the right bracket denotes inner product of two vectors. We see that the differential of f is identified with the gradient of f. In mathematical language,

(df)^\sharp = \nabla f, \qquad (\nabla f)^\flat = df

3. Pseudovector: Hodge dual of vector

Think about this, a ball involves counterclockwise around a fixed point on the table. If one sets a mirror vertically to the table, then the virtual image of ball in the mirror involves clockwise. The angular momentum of the real ball, according to right-hand rule, is upward, hence it shall be still upward for the image since the mirror doesn’t reflect up and down. However, as applying right-hand rule directly for virtual ball, we find the angular momentum downward, which is unreasonable. It seems that angular momentum doesn’t belong to contravariant or covariant vector. If we are careful enough, it’s not hard to find it slightly weird using a vector for a planar motion. I mean why not think about a straightforward representation of plan in which the ball moves. If an ant happens to walk around on the table, it must think it crazy to express angular momentum using an external vector which it cannot touch and see. The same thought would exist for a monster living in higher dimensional space, how could it possible using only one vector to indicate angular momentum. The fact is, we feel it so natural to implicitly identify a vector with an area element perpendicular to it, since we live in three dimension!

This identification establish another sort of dual concept, Hodge dual. Consider two vectors u,\ v \in \mathbb R^3 and we multiply them formally, denoted by \wedge.

u \wedge v=(\sum\limits_{i=1}^3u^ie_i)\wedge (\sum\limits_{j=1}^3v^je_j)=\sum\limits_{i=1}^3u^iv^ie_i \wedge e_i+\sum\limits_{i \neq j}u^iv^je_i \wedge e_j

If we require e_i \wedge e_j=-e_j \wedge e_i, then the expression reduces to

u \wedge v=(u^1v^2-u^2v^1) e_1\wedge e_2+(u^2v^3-u^3v^2)e_2 \wedge e_3+(u^3v^1-u^1v^3)e_3 \wedge e_1

We call this kind of quantity bivector. Compared to cross product u \times v, there is a natural identification relation,

e_1 \longleftrightarrow e_2 \wedge e_3, \qquad e_2 \longleftrightarrow e_3 \wedge e_1, \qquad e_3 \longleftrightarrow e_1 \wedge e_2

Note that the order does matter. Let *:\Lambda^p(V) \to \Lambda^{n-p}(V), (p=1 \text{or} n-1) be hodge star operator mapping between vector and pseudovector (bivector for n=3) space, then

\alpha \wedge \star \beta = \langle \alpha, \beta \rangle \omega

where \omega = \sqrt{|g|} e_1\wedge \cdots \wedge e_n be volume form, g=\mathop{det} (\langle e_i, e_j \rangle) be determinant of metric matrix. We can use this identity to compute any hodge dual, even for each 1 \leq p \leq n. Suppose \beta = \frac{1}{p!}\beta^{i_1 \ldots i_p}e_{i_1} \wedge \cdots \wedge e_{i_p} and \star \beta = \frac{1}{(n-p)!}\gamma^{i_{p+1} \ldots i_n} e_{i_{p+1}} \wedge \cdots \wedge e_{i_n}, take \alpha = e_{j_1} \wedge \cdots \wedge e_{j_p}, then

\begin{matrix}\alpha \wedge \star \beta & = & e_{j_1} \wedge \cdots \wedge e_{j_p} \wedge \frac{1}{(n-p)!} \gamma^{i_{p+1} \dots i_n} e_{i_{p+1}} \wedge \cdots \wedge e_{i_n} \\ & = & \frac{1}{(n-p)!} \gamma^{i_{p+1} \ldots i_n} \epsilon_{j_1 \ldots j_p i_{p+1} \dots i_n} e_1 \wedge \cdots \wedge e_n \end{matrix}

\begin{matrix} \langle \alpha, \beta \rangle \omega & = & \langle e_{j_1}\wedge \cdots \wedge e_{j_p}, \frac{1}{p!}\beta^{i_1 \dots i_p}e_{i_1} \wedge \cdots \wedge e_{i_p} \rangle \omega \\ & = & \frac{1}{p!} \beta^{i_1 \dots i_p} \langle e_{j_1} \wedge \cdots \wedge e_{j_p}, e_{i_1} \wedge \cdots \wedge e_{i_p} \rangle \omega \\ & = & \frac{1}{p!}\beta^{i_1 \dots i_p}\delta_{j_1 \dots j_p,\ i_1 \ldots i_p}\omega \\ & = & \frac{1}{p!(n-p)!} \beta^{i_1 \dots i_p} \delta_{j_1 \dots j_p i_{p+1} \dots i_n,\ i_1 \dots i_p} {}^{i_{p+1} \dots i_n} \omega \\ & = & \frac{1}{p!(n-p)!} \beta^{i_1 \dots i_p} \epsilon_{j_1 \dots j_p i_{p+1} \dots i_n} \epsilon_{i_1 \dots i_p} {}^{i_{p+1} \dots i_n} \omega \end{matrix}

Identify two results, we have

\displaystyle{\gamma^{i_{p+1} \dots i_n} = \frac{1}{p!} \sqrt{|g|} \beta^{i_1 \dots i_p} \epsilon_{i_1 \dots i_p}{}^{i_{p+1} \dots i_n}}

\star \beta = \displaystyle{\frac{1}{p!(n-p)!}\sqrt{|g|}\beta^{i_1 \ldots i_p} \epsilon_{i_1 \dots i_p}{}^{i_{p+1} \dots i_n} e_{i_{p+1}} \wedge \cdots \wedge e_{i_n}}

Remark that hodge dual is not involution, so that hodge dual of hodge dual doesn’t have to return back to the original space. To see this, assume \alpha \in \Lambda^p(V), \beta \in \Lambda^{n-p}(V), then

\alpha \wedge \star \star \beta = \langle \alpha, \star \beta \rangle \omega = \langle \beta \wedge \alpha, \omega \rangle \omega = (-1)^{p(n-p)} \alpha \wedge \beta \langle \omega, \omega \rangle = (-1)^{p(n-p)+s} \alpha \wedge \beta

where s is the signature of metric. This identification by Hodge dual, though looks complicated for computation, is natural just like turn a ladder upside down, The first stage becomes the last and vice versa. Because of the basis-independent property of Levi-Civita symbol in Hodge dual, for every improper rotation, pseudovectors gain a minus sign. That’s why bivector, hence the angular momentum, doesn’t belong to contravariant or covariant vector!

We have seen three kinds of vectors with different transformation law under change of basis. As vectors, they all satisfy eight axioms of vector space. While as detailed objects in particular basis, they have different behaviors. Readers should recapitulate three natural identifications and appreciate whomever tells them apart.

Einstein notation and generalized Kronecker symbol

1. Einstein summation convention

Einstein notation, or Einstein summation convention, is simply a reduced form of well-known summation notation \sum_{i=1}^n introduced by Albert Einstein in 1916. For example, given two vectors x,y\in\mathbb{R}^n, we write the inner product \langle x,y\rangle=\sum_{i=1}^nx_iy_i as in new notation \langle x,y\rangle=x^iy_i. At the first glance there is nothing special as just omit the summation notation (This is exactly what I feel when I first saw the notation). But I will show you this reduction brings much more than convenience. Moreover, it indicates the object which the component belongs to. Specifically speaking, it distinguishes the type of the tensor.

Before given the formal statement of the convention, let’s start with a few examples. We will denote vectors of dimension n by lower letters x,y,\ldots and matrices of proper dimensions by capital letters A,B,\ldots.

  • inner product: \langle x,y\rangle=\sum_{i=1}^nx_iy_i=x^iy_i
  • bilinear form: A(x,y)=\sum_{i=1}^n\sum_{j=1}^nA_{ij}x_iy_j=A_{ij}x^iy^j
  • linear transformation: (Ax)_i=\sum_{j=1}^nA_{ij}x_j=(Ax)^i=A^i_jx^j
  • matrix multiplication: (AB)_{ij}=\sum_{k=1}^nA_{ik}B_{kj}=(AB)^i_j=A^i_kB^k_j
  • trace of matrix: \mathrm{tr} A=\sum_{i=1}^nA_{ii}=A^i_i

Several remarks should be noted here. First, we see summation is taken only for those indices that repeat. And the repeated indices always occur in pairs, one in upper and the other in lower position. Once the summation is taken, all possible values of the repeated indices should be contained. So it makes nonsense of expressing \sum_{i=1}^{n-1}x_iy_i, \sum_{i=1}^nx_iy_j.

Second, all indices, including repeated and non-repeated indices, are compatible. Note that repeated indices disappear in the result (left hand side of the identity). We call this sort of indices dummy since they represent nothing in the result, which implies we can replace dummy index by any other allowable (this shan’t conflict with the existed indices) letter. That is, A^i_jx^j is equivalent to A^i_kx^k, but not to A^i_ix^i. As for non-repeated indices, they appear at the same time on both sides of the identity, and at the same position, both in upper or both in lower positions. Note that both superscript and subscript are indices rather than powers. Say, we always use x^2 to denote the second component of vector x rather than x squared.

Third, it is readily to verify (leave to readers) that each component, as a scalar, satisfies all arithmetic laws for a field, ie.

  • A^i_jx_iy^j=(A^i_jx_i)y^j=A^i_j(x_iy^j)\qquad(\text{Associative law})
  • x_iA^i_jy^j=A^i_jx_iy^j=A^i_jy^jx_i\qquad(\text{Commutative law})
  • A_{ij}(x^j+y^j)=A_{ij}x^j+A_{ik}y^k\qquad(\text{Distributive law})

With Einstein notation, we can pay more attention on algebraic computing than checking consistency and then deciding appropriate operations between terms because everything works well all the way that is needless to care. We can sometimes surprisingly find some interesting identities which doesn’t seem obvious using notation of vector and matrix algebra. For instance, a series of expressions \langle x,y\rangle_A, \langle x^TA,y\rangle, A:xy^T, x^TAy, \langle x, Ay \rangle naturally equates from the above identities, where \langle \cdot,\cdot\rangle_A means inner product with respect to A and \cdot:\cdot means inner product of matrices. Till now, we are able to summarize and formally state the following.

Einstein summation convention: In an expression, summation is automatically taken over all the values of repeated index which occurs in pairs, once in upper and once in lower position.

Readers may ask why we use both superscript and subscript to represent vectors (there indeed some authors don’t require this). Roughly speaking, a single superscript represents some component of column vector, while a single subscript represents some component of row vector. Column vector space and row vector space are dual to each other in finite dimensional case. Essentially, we adopt superscripts to denote contravariant components and subscripts to denote covariant components. Contravariance and covariance are also a pair of dual concepts implying different transformation laws of tensors when change of basis of base vector space. There is also mix-variance tensor e.g. linear transformation. But I have to put off talking about details on this topic since it worth a whole post.

To tell the truth, I didn’t find it useful in that it is really too simple to give a notice. This opinion gradually turned when I found many applications in matrix calculus for it. The derivative with respect to a vector or a matrix is not so simple as it with respect to scalar because of noncommutativity of matrix. The multiplications of matrices with different orders usually give different results and even different structures. However, with the aid of index notation, the order doesn’t matter and the derivative is just like the original one we familiar with. Then I don’t need to learn and remember those strange rules in matrix calculus. In some sense it is quite enjoyable to solve a “hard” question just like play children’s stuff.

Philosophically, Nothing comes nowhere. Einstein notation is not a simply an abbreviation of summation. I think it implies the essence of linearity. When linear operation is applied on tensor, which is multi-linear object, the linearity suggests we just need to consider each term, or equivalently, the general term. And the sum preserves automatically. The representation focus on micro view of a tensor, indicating explicitly the components and transformation law of tensor. This view regards tensor as nothing but an array of numbers, which is too tight to show off talents of a genius tensor. I personally emphasize the geometric picture of all mathematical concepts and theorems, which provides with some kind of intuition and imagination. Tensor, instead, can also be a linear operator that is coordinate-free. Then we can talk about the domain and range of a tensor, its inverse, adjoint, spectrum etc, in which cases all indices are meaningless. Although not offering geometric picture, Einstein notation itself has enough power surprising everyone from algebraic perspective once introducing only one naive symbol, Kronecker delta.

2. Kronecker delta & Levi-Civita symbol

We introduce two symbols now just … for fun. Wait a second, “Ci” in “Civita” is pronounced as “tree”. Kronecker delta symbol \delta is an indicating function of identification of two indices.

\delta_{ij}=\delta^i_j=\delta^{ij}=[i=j]=\begin{cases} 1& \text{if } i=j \\ 0 & \text{if } i \neq j\end{cases}

where [\cdot] is Iverson bracket giving 1 if \cdot holds and 0 otherwise. Kronecker delta looks like identity matrix and plays role of replacing index. For example, \delta^i_jx^j=x^i leaving x invariant and just replacing j by i. And \delta_{ij}x^j=x_i not only replaces the index, but also pulls down superscript, which can be seen as transpose of the vector. Note that \delta^i_i=n, where n is the dimension of vector space.

Levi-Civita symbol \epsilon_{i_1\ldots i_n} is defined as the sign of permutation \sigma=(i_1,\ldots,i_n), equivalently, (-1)^p where p is the parity of \sigma, the number of inversions in \sigma. The symbol gives 0 if any two indices of \{i_1,\ldots,i_n\} are the same.

\epsilon_{i_1\ldots i_n}=\epsilon^{i_1\ldots i_n}=\begin{cases}+1 & \text{if } (i_1,\ldots,i_n) \text{ is even permutation of } (1,\ldots,n) \\ -1 & \text{if } (i_1,\ldots,i_n) \text{ is odd permutation of } (1,\ldots,n) \\ 0 & \text{otherwise}\end{cases}

Readers have to bear in mind that \epsilon_{i_1\ldots i_n} is not tensor because of its different transformation law, which we call it pseudo-tensor. The interesting thing with Levi-Civita symbol is to compute D=\epsilon^{i_1\ldots i_n}a^1_{i_1}\cdots a^n_{i_n}, where A=(a^i_j) is n by n matrix. Note for all n^2 terms contained in summation, only those terms having n components of A taken from different rows and columns as factors don’t vanish. Multiplied by sign of permutation (i_1,\ldots, i_n), we find that D=\mathop{det}A. Surprising, right? As for a direct application in vector algebra, we have (x\times y)_i=\epsilon_{ijk}x^jy^k, considering the determinant rule for computing cross product.

Another interesting thing also comes from computation. Assume n by n matrix D=(D^i_j)=(\delta^{\mu(i)}_{\nu(j)}), where \mu,\nu are permutations of order n, compute the determinant \mathop{det}D. According to the above,

\mathop{det}D=\epsilon^{\sigma(1)\ldots\sigma(n)}D^1_{\sigma(1)}\cdots D^n_{\sigma(n)}=\epsilon^{\sigma(1)\ldots\sigma(n)}\delta^{\mu(1)}_{\nu(\sigma(1))}\cdots \delta^{\mu(n)}_{\nu(\sigma(n))}

Note that \delta^{\mu(i)}_{\nu(\sigma(j))}=[\mu(i)=\nu(\sigma(j))]=[\nu^{-1}(\mu(i))=\sigma(j)]=\delta^{\nu^{-1}(\mu(i))}_{\sigma(j)} and \delta functions as only replace index. This leads us,

\epsilon^{\sigma(1)\ldots\sigma(n)}\delta^{\mu(1)}_{\nu(\sigma(1))}\cdots \delta^{\mu(n)}_{\nu(\sigma(n))}=\epsilon^{\sigma(1)\ldots\sigma(n)}\delta^{\nu^{-1}(\mu(1))}_{\sigma(1)}\cdots \delta^{\nu^{-1}(\mu(n))}_{\sigma(n)}=\epsilon^{\nu^{-1}(\mu(1))\ldots\nu^{-1}(\mu(n))}

By the definition of Levi-Civita symbol, it’s not hard to obtain

\epsilon^{\nu^{-1}(\mu(1))\ldots\nu^{-1}(\mu(n))}=\epsilon_{\nu^{-1}(1)\ldots\nu^{-1}(n)}\epsilon^{\mu(1)\ldots\mu(n)}

Intuitively, \mathop{det} D gives the sign of permutation \tau=\begin{pmatrix}\nu(1) & \cdots & \nu(n) \\ \mu(1) & \cdots & \mu(n) \end{pmatrix}. Also, it’s readily to check \mathop{det} D=0 whenever \mu(i)=\mu(j) or \nu(i)=\nu(j) for some i\neq j. The permutation \tau is so common worthy a new symbol, called generalized Kronecker delta, defined as

\delta^{i_1\ldots i_m}_{j_1\ldots j_m}=\begin{cases}+1 & \text{if } (i_1,\ldots,i_m) \text{ is even permutation of } (j_1,\ldots,j_m) \\ -1 & \text{if } (i_1,\ldots,i_m) \text{ is odd permutation of } (j_1,\ldots,j_m) \\ 0 & \text{otherwise}\end{cases}

Note that integer m doesn’t have to be n. When m=n, we have \delta^{i_1\ldots i_n}_{j_1\ldots j_n}=\epsilon^{i_1\ldots i_n}\epsilon_{j_1\ldots j_n}. When m<n, the trick is to add dummy indices and consider \delta^{i_1\ldots i_m k_{m+1} \ldots k_n}_{j_1 \ldots j_m k_{m+1} \ldots k_n}. By definition, since the last (n-m) indices are the same, we need only to consider the permutation of the rest m indices. With the same reason, summation is automatically taken and produces (n-m)! copies of permutations of the rest indices. Therefore,

\delta^{i_1 \ldots i_m}_{j_1 \ldots j_m}=\frac{1}{(n-m)!} \delta^{i_1 \ldots i_m k_{m+1} \ldots k_n}_{j_1 \ldots j_m k_{m+1} \ldots k_n}=\frac{(n-l)!}{(n-m)!} \delta^{i_1 \ldots i_l}_{j_1 \ldots j_l}

Particularly, we have \delta^{i_1 \ldots i_n}_{i_1 \ldots i_n}=n!. Let’s see what the role does generalized Kronecker delta play. It doesn’t simply replace the index any more, otherwise \delta^{i_1 \ldots i_m}_{j_1 \ldots j_m}=\delta^{i_1}_{j_1} \cdots \delta^{i_n}_{j_n}, which is obviously wrong. Let S^{i_1 \ldots i_m k_{m+1} \ldots k_n}=\delta^{i_1 \ldots i_m}_{j_1 \ldots j_m}T^{j_1 \ldots j_m k_{m+1} \ldots k_n}=m!T^{[i_1 \ldots i_m] k_{m+1} \ldots k_n}, where T^{[i_1 \ldots i_m] k_{m+1} \ldots k_n}=\frac{1}{m!}\sum_{\sigma\in\mathscr{S}(m)}\mathop{sgn}(\sigma) T^{\sigma(i_1) \ldots \sigma(i_m) k_{m+1} \ldots k_n}. Note that S^{i_1 \ldots i_m k_{m+1} \ldots k_n} alternates sign when interchange any two indices from \{i_1,\ldots,i_m\}, actually \delta anti-symmetrizes part or all of components up to a factorial factor!

I think it’s right time to end this post. These notations and symbols are quite simple and common in differential geometry and modern theoretical physics, and also inevitable step to further study. This is my first post, any comments, corrections or suggestions are greatly welcomed.