Vectors: Identifications and Distinctions

Vector is everywhere and plays a fundamental role in different branches of sciences. Though vector has a formal definition from vector space, which is an element of vector space, many still regard vector as an array of numbers with certain magnitude and direction. This classical view, actually, on one hand, doesn’t give a full picture of a true vector, on the other hand, puts something more than a vector should have. Sometimes intuition makes a concept understood easier, and sometimes it makes different concepts look the same. In this post, we will discuss a few sorts of “vectors” and show that although satisfying the definition of the vector, for which many don’t distinguish them, they have so different behaviors under change of basis that worths different names.

1. What is vector

When an object is to be defined, it is usually necessary to give a background in which the object is unambiguous, meaningful and well-defined. Just like in biology, a specie is defined according to its domain, kingdom, phylum, class, order, family, genus. In programming language, a variable is defined according to its environment, domain and type. The background confines the object to a proper extension so that it’s neither too narrow to contain enough information nor too general to attribute required properties. Here, vector is nothing but an object in its background called vector space.

A vector space V is a specific algebraic structure characterized by eight axioms. The element of V, called vector, is denoted by u,v,w etc, and element of a field K, called scalar (or number), is denoted by a,b etc.

  • (u+v)+w=u+(v+w)
  • u+v=v+u
  • \exists 0 \in V s.t. 0+u=u for each u \in V
  • For each u \in V, \exists -u s.t. u+(-u)=0
  • 1u=u, where 1 is multiplicative identity of K
  • (ab)u=a(bu)=(ub)a=u(ba)
  • a(u+v)=au+av
  • (a+b)u=au+bu

From the perspective of universal algebra, a vector space has one 0-ary operation (i.e. constant 0), one unary operation (additive inverse) and two binary operations (vector addition, scalar multiplication), together with several equations such as commutative law, associative law and distributive law etc. The first four axioms state nothing more than that V is essentially an Abelian group. And the fifth axiom, usually ignored by beginners, is necessary to bring into the collection of vectors an external scalar field. Together with the rest ones guaranteeing the compatibility of scalar operations (addition, multiplication) with vector operations (vector addition, vector multiplication by scalar), vector space furnishes a structure of left K-module. Note that I include a new rule a(bu)=(ub)a into the definition to display a very implicit yet natural identification between scalar multiplication from the left side and the right side, namely we don’t distinguish left K-module from right K-module. The feasibility of this identification depends on the commutative law of multiplication of the field, since otherwise ab \neq ba, letting c=ab, u(ba)=(ab)u=cu=uc=u(ab) constructs a contradiction. As we have seen, even the most natural identification may fail when a simple condition doesn’t hold. We should be careful with every intuition before it is rigorously verified from axioms and established theorems. We will encounter many more natural identifications in the following sections, in which readers may gradually feel how unreliable the intuition is.

The abstract definition of vector space is consistent with classical view. If we see vectors as arrows in Euclidean (flat) space (this is necessary to ensure parallel transportation preserves the vector), then vector addition u+v can be viewed as an arrow starting from the initial point of vector u and terminating at the end point of vector v when the end point of u coincides with the initial point of v. Scalar multiplication can be viewed as stretch (prolong and contraction) of vector (with possibly reverse the direction of arrow). Under the view, associative law states that the resultant of vector addition depends only on the initial point of the first vector and the end point of the last vector. Commutative law is simply parallelogram law. There is a special arrow called zero vector 0 such that it starts and terminates at the same point. For each arrow, there is an opposite arrow which just exchanges the initial and the end point.

However, readers should be reminded to distinguish from the classical view of a vector neither having to be associated with magnitude until a norm is defined, nor with a direction with respect to a fixed vector until a non-degenerate inner product is defined. Moreover, vector is not necessarily an array of numbers, called coordinates. For instance, it is readily to verify that the set of all single-variable polynomials of finite degree form a vector space, in which vector addition is polynomial addition (note that it isn’t illegal to adopt polynomial multiplication as vector addition). In this vector space, each polynomial is a vector that cannot be represented by, at least, an array of finite size. In spite of the fact, we can resolve the vector space into direct sum of countable vector subspaces of finite dimension, each of which has coordinate representation for vectors. To consider another example, the collection of all smooth functions defined on real line form a vector space, in which vector addition is function addition. Now a real number, in replace of subset of natural numbers, is used to index “component”. We call this kind of vector spaces (uncountable) infinite dimensional. Sometimes as smooth functions are restricted to analytic ones, we can represent them using power series and hence vector space is of countable infinite (or enumerable) dimensional. [Q: What’s exactly the dimension of vector space of analytic functions on real?] We have seen that coordinate is not essential to a vector, yet it is still a very convenient way to express a vector.

Now we come to see the premise of vector having a coordinate — coordinate system. Basis, as coordinate system, is a linearly independent set of vectors that span the vector space, equivalently, each vector is a unique (finite) linear combination of basis vectors. We spend a little time here claiming that basis always exists, even for infinite dimensional space.

TheoremEvery vector space has a basis, assume Zorn’s lemma.

Proof: Let \mathscr L be the collection of linearly independent subsets. We furnish \mathscr L with partial order structure induced by inclusion. For each chain C \subset \mathscr L, \bar C = \bigcup C is apparently an upper bound of C. We shall require \bar C \in \mathscr L for applying Zorn’s lemma, that is, vectors in \bar C are linear independent. To this end, observe that for any finite subsets of vectors X = \{ v_1,\ldots,v_n \} \subset \bar C, there is an element of the chain c \in C such that v_i \in c for all 1 \leq i \leq n. But c \in \mathscr L is linearly independent set, so is X. By Zorn’s lemma, there exists (at least) a maximal element \bar c \in \mathscr L. We claim \mathscr L spans the vector space. Suppose it doesn’t, then take v\in V \setminus \mathop{span}(\bar c). We have \mathop{span}(\bar c) \cup \{v\} \subset V be linearly independent and hence belongs to \mathscr L, contradicting to maximality of \bar c.    \square

The converse of the theorem, existence of bases of vector space implies Axiom of Choice (AC) has been proved by Andreas Blass in 1984. And thus AC is equivalent to existence of basis.

Now we restrict to finite dimensional case. With the theorem, for each vector v \in V, we can associate a unique coordinate (v^1,\ldots,v^n) as identificator under a specific set of basis vectors \mathscr B = \{e_1,\ldots,e_n\}, and write v=v^ie_i (Here we adopt Einstein summation convention). Obviously, coordinate depends on basis. A vector under different bases may have different coodinates. So let’s see what will happen to coordinate under change of basis.

2. Dual vector: Contravariance vs. Covariance

Suppose we need to express vector v=v^ie_i \in V with respect to a new set of basis \mathscr B' = \{e'_i\}_{1 \leq i \leq n}, where \{ a_j{}^i \} is the transformation matrix, namely e'_j=a_j{}^ie_i. In such basis, vector v=v'^je'_j. Since vector itself doesn’t change with basis (change of basis is essentially passive transformation), we should have


and hence v^i=v'^ja_j{}^i, or


where a^j{}_i is the inverse of a_j{}^i. We find that coordinate of vector changes inversely to transformation law of change of basis. So in general, we speak of vector referring to contravariant vector.

Now we consider dual vector. Define functionals u^* : V \to K, v \mapsto u^*(v) = \langle u,v \rangle for each vector u \in V. Obviously, the set of all such functionals form a vector space V^*, called dual space of V, in which the basis is a set of vectors \{e^{*j}\}_{1 \leq j \leq n} such that e^*{*j}(e_i) = \langle e^{*j},e_i \rangle = \delta^j{}_i, namely the inverse of basis matrix \mathscr B. A dual vector u^* then can be written as u^*=u_je^{*i}. If we change basis from \mathscr B to \mathscr B', letting e'^{*j} = a'^j{}_ie^{*i}, we still have

\delta^j{}_i = \langle e'^{*j}, e'_i \rangle = \langle a'^j{}_le^{*l}, a_i{}^ke_k \rangle = a'^j{}_la_i{}^k \langle e^{*l}, e_k \rangle = a'^j{}_la_i{}^k \delta^l{}_k = a'^j{}_ka_i{}^k

So a'^j{}_k=a^j{}_k is inverse matrix of change of basis and e'^{*j} = a^j{}_ie^{*i}. Then,

u^* = u_ie^{*i} = u'_je'^{*j}=u'_ja^j{}_ie^{*i}

That is, u_i=u'_ja^j{}_i, equivalently,


As we have seen, the coordinate of dual vector co-vary with the change of basis, which we call it covariant vector. Common examples of dual vectors are row vectors and differential 1-forms. Many think row vectors are just transpose of column vectors and naturally identify two things. Differential 1-forms can also be identified with vectors through musical isomorphism. I will exemplify this natural identification with gradient. Let U \subset \mathbb R^n be an open set, on which a vector field v=v^i\frac{\partial}{\partial x^i} is defined. Suppose f: U \to \mathbb R is a differentiable function with df=\frac{\partial f}{\partial x^j}dx^j, then we have

\langle v, df \rangle =\displaystyle{\langle v^i\frac{\partial}{\partial x^i},\frac{\partial f}{\partial x^j}dx^j \rangle = v^i \frac{\partial f}{\partial x^j} \frac{\partial x^j}{\partial x^i} = v^i\frac{\partial f}{\partial x^i}} = \langle v, \nabla f \rangle

Note that the bracket on the left denotes pairing of vector and dual vector while the right bracket denotes inner product of two vectors. We see that the differential of f is identified with the gradient of f. In mathematical language,

(df)^\sharp = \nabla f, \qquad (\nabla f)^\flat = df

3. Pseudovector: Hodge dual of vector

Think about this, a ball involves counterclockwise around a fixed point on the table. If one sets a mirror vertically to the table, then the virtual image of ball in the mirror involves clockwise. The angular momentum of the real ball, according to right-hand rule, is upward, hence it shall be still upward for the image since the mirror doesn’t reflect up and down. However, as applying right-hand rule directly for virtual ball, we find the angular momentum downward, which is unreasonable. It seems that angular momentum doesn’t belong to contravariant or covariant vector. If we are careful enough, it’s not hard to find it slightly weird using a vector for a planar motion. I mean why not think about a straightforward representation of plan in which the ball moves. If an ant happens to walk around on the table, it must think it crazy to express angular momentum using an external vector which it cannot touch and see. The same thought would exist for a monster living in higher dimensional space, how could it possible using only one vector to indicate angular momentum. The fact is, we feel it so natural to implicitly identify a vector with an area element perpendicular to it, since we live in three dimension!

This identification establish another sort of dual concept, Hodge dual. Consider two vectors u,\ v \in \mathbb R^3 and we multiply them formally, denoted by \wedge.

u \wedge v=(\sum\limits_{i=1}^3u^ie_i)\wedge (\sum\limits_{j=1}^3v^je_j)=\sum\limits_{i=1}^3u^iv^ie_i \wedge e_i+\sum\limits_{i \neq j}u^iv^je_i \wedge e_j

If we require e_i \wedge e_j=-e_j \wedge e_i, then the expression reduces to

u \wedge v=(u^1v^2-u^2v^1) e_1\wedge e_2+(u^2v^3-u^3v^2)e_2 \wedge e_3+(u^3v^1-u^1v^3)e_3 \wedge e_1

We call this kind of quantity bivector. Compared to cross product u \times v, there is a natural identification relation,

e_1 \longleftrightarrow e_2 \wedge e_3, \qquad e_2 \longleftrightarrow e_3 \wedge e_1, \qquad e_3 \longleftrightarrow e_1 \wedge e_2

Note that the order does matter. Let *:\Lambda^p(V) \to \Lambda^{n-p}(V), (p=1 \text{or} n-1) be hodge star operator mapping between vector and pseudovector (bivector for n=3) space, then

\alpha \wedge \star \beta = \langle \alpha, \beta \rangle \omega

where \omega = \sqrt{|g|} e_1\wedge \cdots \wedge e_n be volume form, g=\mathop{det} (\langle e_i, e_j \rangle) be determinant of metric matrix. We can use this identity to compute any hodge dual, even for each 1 \leq p \leq n. Suppose \beta = \frac{1}{p!}\beta^{i_1 \ldots i_p}e_{i_1} \wedge \cdots \wedge e_{i_p} and \star \beta = \frac{1}{(n-p)!}\gamma^{i_{p+1} \ldots i_n} e_{i_{p+1}} \wedge \cdots \wedge e_{i_n}, take \alpha = e_{j_1} \wedge \cdots \wedge e_{j_p}, then

\begin{matrix}\alpha \wedge \star \beta & = & e_{j_1} \wedge \cdots \wedge e_{j_p} \wedge \frac{1}{(n-p)!} \gamma^{i_{p+1} \dots i_n} e_{i_{p+1}} \wedge \cdots \wedge e_{i_n} \\ & = & \frac{1}{(n-p)!} \gamma^{i_{p+1} \ldots i_n} \epsilon_{j_1 \ldots j_p i_{p+1} \dots i_n} e_1 \wedge \cdots \wedge e_n \end{matrix}

\begin{matrix} \langle \alpha, \beta \rangle \omega & = & \langle e_{j_1}\wedge \cdots \wedge e_{j_p}, \frac{1}{p!}\beta^{i_1 \dots i_p}e_{i_1} \wedge \cdots \wedge e_{i_p} \rangle \omega \\ & = & \frac{1}{p!} \beta^{i_1 \dots i_p} \langle e_{j_1} \wedge \cdots \wedge e_{j_p}, e_{i_1} \wedge \cdots \wedge e_{i_p} \rangle \omega \\ & = & \frac{1}{p!}\beta^{i_1 \dots i_p}\delta_{j_1 \dots j_p,\ i_1 \ldots i_p}\omega \\ & = & \frac{1}{p!(n-p)!} \beta^{i_1 \dots i_p} \delta_{j_1 \dots j_p i_{p+1} \dots i_n,\ i_1 \dots i_p} {}^{i_{p+1} \dots i_n} \omega \\ & = & \frac{1}{p!(n-p)!} \beta^{i_1 \dots i_p} \epsilon_{j_1 \dots j_p i_{p+1} \dots i_n} \epsilon_{i_1 \dots i_p} {}^{i_{p+1} \dots i_n} \omega \end{matrix}

Identify two results, we have

\displaystyle{\gamma^{i_{p+1} \dots i_n} = \frac{1}{p!} \sqrt{|g|} \beta^{i_1 \dots i_p} \epsilon_{i_1 \dots i_p}{}^{i_{p+1} \dots i_n}}

\star \beta = \displaystyle{\frac{1}{p!(n-p)!}\sqrt{|g|}\beta^{i_1 \ldots i_p} \epsilon_{i_1 \dots i_p}{}^{i_{p+1} \dots i_n} e_{i_{p+1}} \wedge \cdots \wedge e_{i_n}}

Remark that hodge dual is not involution, so that hodge dual of hodge dual doesn’t have to return back to the original space. To see this, assume \alpha \in \Lambda^p(V), \beta \in \Lambda^{n-p}(V), then

\alpha \wedge \star \star \beta = \langle \alpha, \star \beta \rangle \omega = \langle \beta \wedge \alpha, \omega \rangle \omega = (-1)^{p(n-p)} \alpha \wedge \beta \langle \omega, \omega \rangle = (-1)^{p(n-p)+s} \alpha \wedge \beta

where s is the signature of metric. This identification by Hodge dual, though looks complicated for computation, is natural just like turn a ladder upside down, The first stage becomes the last and vice versa. Because of the basis-independent property of Levi-Civita symbol in Hodge dual, for every improper rotation, pseudovectors gain a minus sign. That’s why bivector, hence the angular momentum, doesn’t belong to contravariant or covariant vector!

We have seen three kinds of vectors with different transformation law under change of basis. As vectors, they all satisfy eight axioms of vector space. While as detailed objects in particular basis, they have different behaviors. Readers should recapitulate three natural identifications and appreciate whomever tells them apart.