Most of the data I am dealing with is bit-sampled, i.e. each voltage reading is packed into one (or sometimes two) bit. However, several channels get merged into one data-stream and when the samples are written to disk, the bits are usually ordered like ch0, ch1, ch2, ch3, ch0, ch1, ch2, ch3, … The problem is that when it comes to processing of such data, one needs to separate the stream into the orginal channels again. Since I am recently dealing with SSE I was thinking if this technology can be used for such a purpose. Assume you have 4 channels sampled with one-bit. For simplicity, we deal with a byte (char) which is the smallest quantity that is adressable from C/C++. The concept how to split the data into four binary streams is written here in Python.
#!/usr/bin/python
#channel 0, all 4 possible values
x0 = [0b00000000, 0b00000001, 0b00010000 , 0b00010001];
#channel 1, all 4 possible values
x1 = [0b00000000, 0b00000010, 0b00100000 , 0b00100010];
#channel 2, all 4 possible values
x2 = [0b00000000, 0b00000100, 0b01000000 , 0b01000100];
#channel 3, all 4 possible values
x3 = [0b00000000, 0b00001000, 0b10000000 , 0b10001000];
# correct results of the 4 values
y = [0b00, 0b01, 0b10 , 0b11];
mask01 = 0b00000001;
mask02 = 0b00010000;
mask11 = 0b00000010;
mask12 = 0b00100000;
mask21 = 0b00000100;
mask22 = 0b01000000;
mask31 = 0b00001000;
mask32 = 0b10000000;
r0=[0,0,0,0];r1=[0,0,0,0];r2=[0,0,0,0];r3=[0,0,0,0];
# this can be coded with SSE
for i in range (0,4):
r0[i] = (x0[i] & mask01) + ((x0[i] & mask02)>>3);
r1[i] = (x1[i] & mask11) + ((x1[i] & mask12)>>3);
r2[i] = (x2[i] & mask21) + ((x2[i] & mask22)>>3);
r3[i] = (x3[i] & mask31) + ((x3[i] & mask32)>>3);
# as the n-th channel is offset by n bits,
# the results need to be shifted as well
for i in range (0,4):
r0[i] = r0[i] >> 0;
r1[i] = r1[i] >> 1;
r2[i] = r2[i] >> 2;
r3[i] = r3[i] >> 3;
# output
for i in range (0,4):
print "channel 0, result = %s, correct value = %s" \
% (bin( r0[i] ), bin(y[i]))
print "channel 1, result = %s, correct value = %s" \
% (bin( r1[i] ), bin(y[i]))
print "channel 2, result = %s, correct value = %s" \
% (bin( r2[i] ), bin(y[i]))
print "channel 3, result = %s, correct value = %s" \
% (bin( r3[i] ), bin(y[i]))
I am going to code this in C/C++ and SSE for several data-sizes, sampling options and channel settings and see how things perform. Unfortunately the last bit-shift is depdening on the channel number and thus there is likely not much chance to get this done into parallel fashion with SSE.