Visit The New, Modern Unix Linux Community


Thread Tools Search this Thread
Special Forums UNIX and Linux Applications High Performance Computing Vectorization
# 1  
Signal Vectorization


I have the following vectorized code:
long valor = 0, i=0;

 __m128i vsum, vecPi, vecCi, vecQCi;

 vsum = _mm_set1_epi32(0);

 int32_t * const pA = A->data;
 int32_t * const pB = B->data;

 int sumDot[1];

 for( ; i<SIZE-3 ;i+=4){

 vecPi = _mm_loadu_si128((__m128i *)&(pA)[i] );
 vecCi = _mm_loadu_si128((__m128i *)&(pB)[i] );
 vecQCi = _mm_mullo_epi32(vecPi,vecCi);
 vsum = _mm_add_epi32(vsum,vecQCi);

 vsum = _mm_hadd_epi32(vsum, vsum);
 vsum = _mm_hadd_epi32(vsum, vsum);
 _mm_storeu_si128((__m128i *)&(sumDot), vsum);

 for( ; i<SIZE; i++)
 valor += A->data[i] * B->data[i];   valor += sumDot[0];

However, as I get overflows, I need to handle those cases. Could you please help me with that?


Last edited by bartus11; 03-28-2014 at 06:06 PM.. Reason: Please use [code][/code] tags.
# 2  
What compiler is this?
# 3  
ipcp and g++, I use both.

---------- Post updated 03-29-14 at 11:29 AM ---------- Previous update was 03-28-14 at 04:52 PM ----------


Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #455
Difficulty: Medium
Java was originally developed at Oracle starting in December 1990.
True or False?

Featured Tech Videos