Most of the data I am dealing with is bit-sampled, i.e. each voltage reading is packed into one (or sometimes two) bit. However, several channels get merged into one data-stream and when the samples are written to disk, the bits are usually ordered like ch0, ch1, ch2, ch3, ch0, ch1, ch2, ch3, … The problem is that when it comes to processing of such data, one needs to separate the stream into the orginal channels again. Since I am recently dealing with SSE I was thinking if this technology can be used for such a purpose. Assume you have 4 channels sampled with one-bit. For simplicity, we deal with a byte (char) which is the smallest quantity that is adressable from C/C++. The concept how to split the data into four binary streams is written here in Python.
#!/usr/bin/python #channel 0, all 4 possible values x0 = [0b00000000, 0b00000001, 0b00010000 , 0b00010001]; #channel 1, all 4 possible values x1 = [0b00000000, 0b00000010, 0b00100000 , 0b00100010]; #channel 2, all 4 possible values x2 = [0b00000000, 0b00000100, 0b01000000 , 0b01000100]; #channel 3, all 4 possible values x3 = [0b00000000, 0b00001000, 0b10000000 , 0b10001000]; # correct results of the 4 values y = [0b00, 0b01, 0b10 , 0b11]; mask01 = 0b00000001; mask02 = 0b00010000; mask11 = 0b00000010; mask12 = 0b00100000; mask21 = 0b00000100; mask22 = 0b01000000; mask31 = 0b00001000; mask32 = 0b10000000; r0=[0,0,0,0];r1=[0,0,0,0];r2=[0,0,0,0];r3=[0,0,0,0]; # this can be coded with SSE for i in range (0,4): r0[i] = (x0[i] & mask01) + ((x0[i] & mask02)>>3); r1[i] = (x1[i] & mask11) + ((x1[i] & mask12)>>3); r2[i] = (x2[i] & mask21) + ((x2[i] & mask22)>>3); r3[i] = (x3[i] & mask31) + ((x3[i] & mask32)>>3); # as the n-th channel is offset by n bits, # the results need to be shifted as well for i in range (0,4): r0[i] = r0[i] >> 0; r1[i] = r1[i] >> 1; r2[i] = r2[i] >> 2; r3[i] = r3[i] >> 3; # output for i in range (0,4): print "channel 0, result = %s, correct value = %s" \ % (bin( r0[i] ), bin(y[i])) print "channel 1, result = %s, correct value = %s" \ % (bin( r1[i] ), bin(y[i])) print "channel 2, result = %s, correct value = %s" \ % (bin( r2[i] ), bin(y[i])) print "channel 3, result = %s, correct value = %s" \ % (bin( r3[i] ), bin(y[i]))
I am going to code this in C/C++ and SSE for several data-sizes, sampling options and channel settings and see how things perform. Unfortunately the last bit-shift is depdening on the channel number and thus there is likely not much chance to get this done into parallel fashion with SSE.