c++ - Why is it faster to perform float by float matrix multiplication compared to int by int? -
having 2 int matrices , b, more 1000 rows , 10k columns, need convert them float matrices gain speedup (4x or more).
i'm wondering why case? realize there lot of optimization , vectorizations such avx, etc going on float matrix multiplication. yet, there instructions such avx2, integers (if i'm not mistaken). and, can't 1 make use of sse , avx integers?
why isn't there heuristic underneath matrix algebra libraries such numpy or eigen capture , perform integer matrix multiplication faster float?
about accepted answer: while @sascha's answer informative , relevant, @chatz's answer actual reason why int int multiplication slow irrespective of whether blas integer matrix operations exist.
if compile these 2 simple functions calculate product (using eigen library)
#include <eigen/core> int mult_int(const eigen::matrixxi& a, eigen::matrixxi& b) { eigen::matrixxi c= a*b; return c(0,0); } int mult_float(const eigen::matrixxf& a, eigen::matrixxf& b) { eigen::matrixxf c= a*b; return c(0,0); }
using flags -mavx2 -s -o3
see similar assembler code, integer , float version. main difference vpmulld
has 2-3 times latency , 1/2 or 1/4 throughput of vmulps
. (on recent intel architectures)
reference: intel intrinsics guide, "throughput" means reciprocal throughput, i.e., how many clock-cycles used per operation, if no latency happens (somewhat simplified).
Comments
Post a Comment