Usually when people first study special relativity or general relativity, they will get introduced to the concept of covariant and contravariant coordinates. They will learn that these two sets of coordinates are connected through the metric tensor $g_{ij}$ and that these coordinates are tensor themselves. As always in physics, this notion and the way to talk about it is obscured and the connection to a proper mathematical treatment is left out. This post is a short introdcution into covariant and contravariant coordinates in terms of linear algebra.

# Vectors

Every (finite dimensional) vector space $V$ over the real numbers comes naturally with it’s dual vector space $W$ of linear maps to the real numbers, e.g.

$$ \varphi \in W :\Leftrightarrow \varphi: V \rightarrow \mathbb{R} ~ \text{ with } ~ \varphi(a \cdot v + u) = a \cdot \varphi(v) + \varphi(u) ~ ~ \forall v,u \in V ~ ~ \forall a \in \mathbb R$$

Let $B := \{e_1, e_2, \dots, e_n \}$ be a basis of $V$ then for $x \in V$ the expansion

$$ x = \sum_{i=1}^n X^i e_i $$

yields the so called contravariant coordinates $X^i$ of $x$.

The dual basis to $B$ is defined as the basis $\tilde{B} = \{ \varepsilon^1, \varepsilon^2, \dots, \varepsilon^n \}$ of $W$ such that

$$ \varepsilon^i(e_j) = \delta^i_j $$

which defines $\tilde{B}$ completely because every linear map is already fixed if it is given for all basis vectors of $V$. Here, $\delta^i_j$ is the Kronecker delta. Then for a $y \in W$ the expansion

$$ y = \sum_{i=1}^n Y_i \varepsilon^i $$

yields the so called covariant coordinates $Y_i$ of $y$.

If on the vector space $V$ there is an inner product $(~.~,~.~): V \times V \rightarrow \mathbb{R}$ (for instance the usual dot product) then there is a natural map $\tau: V \rightarrow W$ between $V$ and $W$:

Let $x \in V$ then

$$ \tau(x) = (x,~.~) \in W$$

e.g. for $v \in V$ we have

$$ \tau(x)(v) = (x,v) $$

Let’s define the following matrix $g$

$$ g_{i j} := (e_i, e_j) $$

which are the (covariant) components of the twice linear inner product map, also called the metric tensor.

In general the basis $B$ is not orthonormal, that is, in general $ g_{i j} \ne \delta_{ij} $ and therefore $\tau(e_i) \ne \varepsilon^i$

Define the inverse matrix of $g_{ij}$ by (if it wasn’t invertible, $B$ would be no basis because some $e_i$ were linear combinations of the others)

$$ g^{ij} := (g^{-1})^{ij} $$

then we have

$$ \sum_{k=1}^n g^{ik} \tau(e_k)(e_j) = \sum_{k=1}^n g^{ik} (e_k,e_j) = \sum_{k=1}^n g^{ik} g_{kj} = \delta^i_j $$

So indeed we have found:

$$ \varepsilon^i = \sum_{k=1}^n g^{ik} \tau(e_k) $$

Let’s have a quick recap what we have done so far.

- Every (real) vector space $V$ has a dual vector space $W$ attached to it
- The inner product $(~.~,~.~)$ defines a natural map $\tau: V \rightarrow W$
- The metric tensor $g_{ij}$ can be used to map basis vectors $e_k$ to dual basis vectors $\varepsilon^i$

Now we have everything at hand to express the covariant (dual) coordinates $X_i$ in terms of the contravariant coordinates $X^i$ of a vector $x \in V$. Remember:

$$ x = \sum_{i=1}^n X^i e_i $$

Now map it to the dual vector $\tau(x)$ which is a representation of $x$ in the dual vector space $W$:

\begin{equation}

\begin{split}

\tau(x) & = (x, ~.~) = (\sum_{i=1}^n X^i e_i, ~.~) = \sum_{i=1}^n X^i (e_i, ~.~) \\

& = \sum_{i=1}^n X^i \tau(e_i) = \sum_{i=1}^n \sum_{j=1}^n X^i \delta^j_i \tau(e_j) \\

& = \sum_{i=1}^n \sum_{j=1}^n \sum_{k=1}^n X^i g_{ik} g^{kj} \tau(e_j) \\

& = \sum_{k=1}^n \left( \sum_{i=1}^n X^i g_{ik} \right) \left( \sum_{j=1}^n g^{kj} \tau(e_j) \right) \\

& = \sum_{k=1}^n \underbrace{\left( \sum_{i=1}^n X^i g_{ik} \right)}_{=X_k} ~ \varepsilon^k

\end{split}

\end{equation}

And there you have it

$$ X_k = \sum_{i=1}^n X^i g_{ik} $$

the covariant coordinates are given by the contraction of $X^i$ and $g_{ik}$. One can now show that the natural map $\tau$ is invertible and proof the similar result

$$ X^i = \sum_{k=1}^n g^{ik} X_k $$

but this should be clear since $g_{ij}$ is invertible.

The crucial thing to observe is that $X^i$ and $X_i$ are not some random numbers but two different things: The former being the coordinates of $x$ in $V$ while the later being the coordinates of $\tau(x)$ in $W$.

We can now go ahead and define a scalar product of $W$ and $V$ by

$$ \langle ~.~,~.~ \rangle: W \times V \rightarrow \mathbb{R}, ~ ~ \langle y,x \rangle= y(x) $$

which enables us to have two different kinds of projections on a vector $x \in V$:

\begin{align}

& (e_i, x) = X_i \\

& \langle \varepsilon^i, x \rangle = X^i

\end{align}

So what’s the difference? $X_i$ is the length of $x$ *orthogonal* to $e_i$, e.g. we define angles via the inner product. $X^i$ is the length of one side of the rhomboid which spans around the vector of $x$, each side being parallel to one of the basis vectors $e_i$.

The following figure is an illustration of this fact:

Similar to what we did with $\tau$ and the inner product, we can define a natural map $\tilde{\tau}$ on $V$ with the twice linear scalar product

\begin{equation}

\tilde{\tau}: V \rightarrow U, ~ ~ \tilde{\tau}(x) := \langle ~.~, x \rangle

\end{equation}

where $U$ is the bidual vector space to $V$ (which is the dual vector space to $W$, e.g. an element $u \in U$ maps a linear map on $V$ to a real number).

However, the coordinates of $\tilde{\tau}(x)$ expressed in the bidual basis (the dual basis to the $\varepsilon^i$) are

$$ \tilde{X}^i = \tilde{\tau}(x)(\varepsilon^i) = \langle \varepsilon^i,x \rangle = X^i $$

the same as the coordinates in $V$ (compare to the expression $X_i = \tau(x)(e_i) = (x,e_i)$).

It is natural to make the identification $V = U$, due to the trivial relation between the coordinates of $x$ and $\tilde{\tau}(x)$ and just omit $\tilde{\tau}$. Hence, every vector $x \in V$ is a linear map on $W$ to the real numbers just as every vector $y \in W$ is a linear map on $V$ to the real numbers.

# Tensors

As we have seen before, one can interpret any kind of vector as a linear map from its dual space to the real numbers. This can be generalized to the so called notion of tensors.

Consider the following map:

$$ T: A_1 \times A_2 \times \dots \times A_m \rightarrow \mathbb{R} $$

where $A_i \in \{ V, W \}$. $T$ is a $m$ times linear map on the vector space $V$ and its dual space $W$. It is called a tensor of rank $m$. Similar, vectors are called tensors of rank $1$.

We define the tensor product of vectors $x \in V$ and $y \in W$ (or any other combination of $V$ and $W$) to be the tensor of rank 2

\begin{equation}

\begin{split}

& x \otimes y: W \times V \rightarrow \mathbb{R} \\

& x \otimes y (a,b) := x(a) \cdot y(b)

\end{split}

\end{equation}

One can easily imagine the generalization for tensors of rank $\ne 1$. Do note that the order of the tensor product matters since in general

$$ x \otimes y (a,b) = x(a) \cdot y(b) \ne y(a) \cdot x(b) = y \otimes x (a,b) $$

Like any linear map, tensor are already completely defined by their values on the basis of their domain. These values are called coordinates. For instance take $x = \sum_{i=1}^n X^i e_i$ then $x(\varepsilon^j) = X^j$ the contravariant coordinate of $x$. In a similar way we have for a tensor of rank $4$

$$ T(e_i, e_j, \varepsilon^k, e_l) = T_{i j ~~~~ l}^{~~~ k} $$

the coordinates of that tensor. Hence we can write:

$$ T = \sum_{i,j,k,l=1}^n T_{i j ~~~~ l}^{~~~ k} ~ \cdot ~ \varepsilon^i \otimes \varepsilon^j \otimes e_k \otimes \varepsilon^l $$

One can lower and lift the indices in much the same way as it was done for vectors. For every tensor $T$ with coordinates $T_{i j ~~~~ l}^{~~~ k}$ there exists another tensor $\tilde{T}$ given by

$$ \tilde{T} = \sum_{i,j,k,l=1}^n \tilde{T}_{~~ j ~~~ l}^{i ~~ k} ~ \cdot ~ e_i \otimes \varepsilon^j \otimes e_k \otimes \varepsilon^l $$

where

$$ \tilde{T}_{~~ j ~~~ l}^{i ~~ k} = \sum_{q=1}^n g^{iq} \cdot T_{q j ~~~~ l}^{~~~ k} $$

are the new coordinates and hence $T(a,b,c,d) = \tilde{T}(\tau(a),b,c,d)$.