c++ - Why is it faster to perform float by float matrix multiplication compared to int by int? -

July 15, 2010

having 2 int matrices , b, more 1000 rows , 10k columns, need convert them float matrices gain speedup (4x or more).

i'm wondering why case? realize there lot of optimization , vectorizations such avx, etc going on float matrix multiplication. yet, there instructions such avx2, integers (if i'm not mistaken). and, can't 1 make use of sse , avx integers?

why isn't there heuristic underneath matrix algebra libraries such numpy or eigen capture , perform integer matrix multiplication faster float?

about accepted answer: while @sascha's answer informative , relevant, @chatz's answer actual reason why int int multiplication slow irrespective of whether blas integer matrix operations exist.

if compile these 2 simple functions calculate product (using eigen library)

#include <eigen/core>  int mult_int(const eigen::matrixxi& a, eigen::matrixxi& b) {     eigen::matrixxi c= a*b;     return c(0,0); }  int mult_float(const eigen::matrixxf& a, eigen::matrixxf& b) {     eigen::matrixxf c= a*b;     return c(0,0); }

using flags -mavx2 -s -o3 see similar assembler code, integer , float version. main difference vpmulld has 2-3 times latency , 1/2 or 1/4 throughput of vmulps. (on recent intel architectures)

reference: intel intrinsics guide, "throughput" means reciprocal throughput, i.e., how many clock-cycles used per operation, if no latency happens (somewhat simplified).

Search This Blog

Insert

c++ - Why is it faster to perform float by float matrix multiplication compared to int by int? -

Comments

Post a Comment

Popular posts from this blog

vue.js - Create hooks for automated testing -

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

serial port - hub4com OVERRUN Error -