Three interpretations of matrix products

70 points by nivter 2 years ago · 27 comments

Reader

mkl95 2 years ago

This site looks like a similar concept to https://immersivemath.com/ which is one of my favorite sites ever.

ribit 2 years ago

It is also interesting how this relates to hardware implementations. Nvidia and Intel AMX appear to be using dot product engines under the hood and do a matrix multiplication in a single instruction. Apple AMX and ARM SME use outer product engines and require multiple instructions to do a single matrix multiplication.

adrian_b 2 years ago

Nobody is using dot product engines, because dot product throughput is limited by the latency of fused multiply-add, instead of by the clock frequency.
Moreover, dot product throughput is limited by the memory read throughput.
Any matrix-matrix product implementation is best done based on tensor products of vectors, because each such product is composed of independent operations, so their latencies can be hidden. Moreover, a tensor product requires a number of multiplications equal to the product of the sizes of the operands, but a number of loads from memory equal to the sum of the sizes of the operands.
With enough registers to store the matrix result, it is easy to ensure that the product of the operand sizes is greater than the sum of the operand sizes, so that the throughput of the memory reads does not limit the attainable performance.
Where a matrix-matrix product is done by a single instruction, it normally also uses tensor products. Only when both the input operands and the result are stored in registers, the instruction could also be implemented by AXPY operations (where the fused multiply-add operations are also independent), but not by dot products (with dependent FMAs that prevent pipelining).
"AXPY" is a name that comes from the BLAS library and it refers to an operation fundamental in linear algebra, "A times vector X Plus vector Y". There are many cases when it is possible to choose between AXPY and scalar products. AXPY is normally the right choice, because it is composed of independent FMAs, which can be interleaved and pipelined.
- ribit 2 years ago
  
  Thank you, very insightful and makes perfect sense! I do wonder however why Nvidia and Intel chose not to expose an AXPY/outer product instruction if they use these kinds of operations under the hood. I can imagine them being useful in their own right. My best guess is that this gives them freedom to change the implementation details later on (e.g. the order of swizzles)?

bmacho 2 years ago

The first two interpretations are about matrix-vector products, and the third is about matrix-matrix product. This is a bit surprising, as I was expecting different interpretations of the same thing.

Anyway these are basic stuff, but it's nice to see them formulated!

nivterOP 2 years ago

Originally I had planned to add a line or two explaining how a matrix-matrix product is just a _list_ of matrix-vector products but then dropped the idea to keep it solely focused on the interpretations. I will probably add it to make it all about matrix-matrix products.
adrian_b 2 years ago

The 3 interpretations correspond to the 3 possible choices for the innermost loop when you reorder the 3 nested loops of a matrix-matrix product (tensor product of 2 vectors, AXPY operation of 2 vectors and scalar product of 2 vectors).
For a matrix-vector product, where there are only 2 nested loops, only 2 of these choices are applicable (AXPY operation of 2 vectors and scalar product of 2 vectors).
For a vector-vector product, where there is a single loop, only 1 of these choices is applicable (scalar product of 2 vectors).
For numeric computations, where possible, AXPY is preferable to scalar product, and where possible, tensor product is preferable to AXPY. Therefore matrix-vector products should be done with AXPY and matrix-matrix products should be done with tensor products of vectors.
I strongly dislike the misuse of the term "outer product" for the tensor product of 2 vectors, like in the parent article, because this is ambiguous. The original definition of the outer product of 2 vectors (due to Grassmann) is related to the so-called vector product of 2 vectors and "outer" refers to the fact that one vector is multiplied by the component of the other vector that is orthogonal to it, so it points outwards. The tensor product, whose value is a matrix, is completely different. For 2 vector operands, there are 3 distinct products, the first is inner a.k.a. scalar, the second is outer product or (only in 3 dimensions) its variant "vector product" and the third is the tensor product.
The name "tensor product" is also historically and semantically incorrect, but at least it is not ambiguous.
The "tensor product" should have been named "Zehfuss product", after the mathematician who has defined it in 1858 (about 60 years before the word "tensor" had any relationship with it). "Tensor" was originally a name for symmetric matrices, which correspond to affine transformations where a body is stretched in the directions of certain axes. The name was introduced by Hamilton, together with scalar (similarity affine transformation), vector (translation affine transformation) and versor (rotation affine transformation). For unknown reasons Einstein has chosen to use "tensor" with the meaning of "multi-dimensional array", breaking with the tradition, then everybody has imitated him (due to the huge popularity of the Theory of Relativity among non-mathematicians and non-physicists, after the end of WWI, which prompted some book editors to insert the word "tensor" everywhere in several mathematics books and advertise them as being useful to understand the theory of Einstein).

dionysus8 2 years ago

I like the fourth interpretation of matrix multiplication from geometric algebra, which brilliantly encapsulates geometric transformations. This approach shifts our focus from just numbers to the geometry of space, revealing how matrices can elegantly describe rotations, reflections, and scaling. It’s a vivid and intuitive perspective that brings matrix operations to life, especially in fields where visualizing these transformations is key, like in computer graphics or physics. It’s like watching math and geometry dance together!

eigenket 2 years ago

The rest of your comment doesn't really seem specific to geometric algebra. You're just interpreting matrix multiplication as composition of linear maps, which is indeed a very useful and sometimes intuitive perspective.
nivterOP 2 years ago

Yup, I consider this interpretation as a matrix being a function that takes in objects like a line, a circle or a convex shape and spits out objects like some other line, an ellipse or another convex shape. It is a level of abstraction where you no longer care _how_ matrix multiplication works - you mostly care about what a matrix does to geometric objects. I covered this aspect not in the above article but in a separate one: https://www.linearalgebraforprogrammers.com/la/3_mat_vec_mul
- eigenket 2 years ago
  
  I feel like matrix multiplication/linear maps are kinda the "wrong" things to be considering when you're talking about applying them to lines or shapes in euclidean space.
  For example there is no linear map which maps a line (or segment) through the origin to a parallel line (or segment of the same length) that doesn't pass through the origin, even though these are clearly just the same object shifted around a bit.
  A much more natural set of operations is (IMO) the affine transformations since then I can move things around as you expect. I find dealing with linear maps of lines or circles or polygons a bit unintuitive.
  - ReleaseCandidat 2 years ago
    
    > For example there is no linear map which maps a line (or segment) through the origin to a parallel line (or segment of the same length) that doesn't pass through the origin, even though these are clearly just the same object shifted around a bit.
    That's why god (well, projective geometrists) made homogeneous coordinates. Without them geometry isn't much fun when using linear algebra (as you have way too many special cases).
  - whiterknight 2 years ago
    
    Affine transforms for N dimensions can be represented as matrices/linear maps in dimension N+1
    
    eigenket 2 years ago
    
    Sure, but that isn't what they're doing on that webpage. They're just directly applying linear maps to shapes in R^2.
    You can represent a wild variety of things by linear maps on suitably enlarged spaces.
    
    whiterknight 2 years ago
    
    Agreed. Context: This sub thread is discussing how to interpret matrices as an abstract thing that does something as opposed to the mechanics of multiplying
bmacho 2 years ago

You probably mean "linear algebra". Geometric algebra[0] is a very specific, also vastly different machinery.
[0] : https://en.wikipedia.org/wiki/Geometric_algebra
sva_ 2 years ago

3blue1brown's linear algebra 'course' is excellent on this https://www.3blue1brown.com/topics/linear-algebra

greatgib 2 years ago

Honestly I don't find this article to be very clear or understandable about matrix products. The animations are very confusing.

Probably when you already know how to work with matrices you can understand but otherwise I'm quite sure that you would not understand.

ivansavz 2 years ago

You can also find some nice visualizations of matrix and vector operations in the "The Art of Linear Algebra" by Kenji Hiranabe:

https://github.com/kenjihiranabe/The-Art-of-Linear-Algebra/b...

rlupi 2 years ago

Gil Strang lectures are great if you want to a bit more depth.

https://youtu.be/nTwRjQ4xqUc?si=6jO6mO3M98_QX2Kx

adamnemecek 2 years ago

I really dig the integral transform one.

https://ncatlab.org/nlab/show/integral+transform

nivterOP 2 years ago

This was a joyful epiphany for me when I encountered continuous linear systems for the first time. Another corollary of this is when the kernel is δ(x-y), the resulting integral has the value f(x). I like to see it as a continuous version of expressing a vector as a sum of its components.
- adamnemecek 2 years ago
  
  Yeah sifting property is cool.

blagie 2 years ago

This is enough detail that:

- If you understand it, it's a nice visualization but you don't learn anything new

- If you don't, you won't understand it.

Going one click in brings up a paywall, with no pricing. You need to give up your email to get a price.

This feel like a not-very-good business model. This would make a lot more sense as either:

- A fully-baked business model, competitive with other paid resources

- An open-source project on github.

To be a fully-baked business model, it would need:

- Enough teaser content to get people hooked and for people to be able to reshare content

- Things to do (e.g. writing Python code), a place to do it (e.g. an online repl, like most other similar systems), and ways to evaluate it for correctness.

- Clear marketing / branding copy (who did it? what's the privacy policy? what's it cost? etc.)

As an open-source thing, it could slot into a community of similar projects which fill those gaps. It has very nice interactives, but it takes a "tell" rather than a "do" approach, which is helpful in context, but isn't adequate for learning by itself.

nivterOP 2 years ago

Thanks, this is a very helpful feedback.

Settings

Three interpretations of matrix products

Keyboard Shortcuts