Dot-product with SSE4

I am currently working on a new project where I need to un-pack raw data very fast and efficiently. Moreover, I need to do several mathematical operations (FFTs, etc.) on batches of the data. Although it is planned to move a lot of this stuff to the GPU in future, I am trying to develop sophisticated functions that get the maximum out of the CPU. Therefore, SSE is inevitable and SSE4 gives you the total performance kick. With the nice interface for the dot-product you can multiply two arrays of floats (4 in each vector) and sum up whatever indices you want, depending on how you set the mask parameter. Here’s a simple example

#include "stdio.h"
#include "smmintrin.h"
int main ()
{
    __m128 a ;
    __m128 b ;
    float x[4], y[4];
    x[0]= 1.0;   x[1]= 2.0;   x[2]= 3.0;   x[3]= 4.0;
    y[0]=-1.0;   y[1]=-2.0;   y[2]=-3.0;   y[3]=-4.0;
    // copy the data 
    a = _mm_load_ps(&x[0]);
    b = _mm_load_ps(&y[0]);

    // multiply and sum all 4 values (1111)
    // and store them at 0 index (0001)
    // 11110001 = 0xf1
    const int mask = 0xf1; 
    __m128 res = _mm_dp_ps(a, b, mask);    

    union { __m128 v; float f[4]; } uf; // a trick to access the 4 floats
    uf.v = a;
    printf("Original a: %f\t%f\t%f\t%f\n", uf.f[0],uf.f[1],uf.f[2],uf.f[3]);
    uf.v = b;
    printf("Original b: %f\t%f\t%f\t%f\n", uf.f[0],uf.f[1],uf.f[2],uf.f[3]);
    uf.v = res;
    printf("Result    : %f\t%f\t%f\t%f\n", uf.f[0],uf.f[1],uf.f[2],uf.f[3]);
    return 0;
}

Tagged with: , , ,
Posted in computer, math, programming

Leave a Reply

Your email address will not be published. Required fields are marked *

*


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

About commenting on this blog

Since March 2013 commenting is possible for anyone. However, ALL comments will be moderated and since I don't have Internet access 24/7 your comments will be shown a little later.
I take the liberty to reject comments which are against the policy of this blog or violate existing laws.
I appreciate your understanding.

Tag Cloud
Archives
Categories
Visitor map