My favorites | Sign in
Project Home Downloads Wiki Issues Source Code Search
New issue   Search
  Advanced search   Search tips   Subscriptions
Issue 2228: Generate SIMD instructions for SIMD-like code patterns
7 people starred this issue and may be notified of changes. Back to list
Status:  New
Owner:  ----

Sign in to add a comment
Reported by, Jul 11, 2012
Turning typed arrays code like 

function add_ps(a, b) {
  a[0] += b[0];
  a[1] += b[1];
  a[2] += b[2];
  a[3] += b[3];

function mul_ps(a, b) {
  a[0] *= b[0];
  a[1] *= b[1];
  a[2] *= b[2];
  a[3] *= b[3];

var vectors = new Float32Array(400000);
var adds = new Float32Array(4000);

for (var i=0; i<vectors.length; i+=4) {
  var a = new Float32Array(vectors.buffer, i*4, 4);
  for (var j=0; j<adds.length; j+=4) {
    var b = new Float32Array(adds.buffer, j*4, 4);
    add_ps(a, b);
    mul_ps(a, b);

into something like the following C snippet

float *vectors;
float *adds;

// ... initialize vectors and adds

__m128 a, b;

for (int i=0; i<vectors_length; i+=4) {
  a = _mm_load_ps(vectors+i);
  for (int j=0; j<adds_length; j+=4) {
    b = _mm_load_ps(adds+j);
    a = _mm_add_ps(a, b);
    a = _mm_mul_ps(a, b);
  _mm_store_ps(vectors+i, a);

would be nice.
Jul 11, 2012
Project Member #1
Yes, that would be nice.

Unfortunately, due to the nature of JavaScript, there are very few situations where this is actually possible, and detecting them is pretty expensive. Remember that for JavaScript, compile time is run time, and even C/C++ compilers have a hard time applying this optimization automatically.

I don't think anyone will work on this is the foreseeable future, which is why I'm closing the issue. We do accept patches and can always reopen, though, if you want to figure out a way how to do this!
Status: WorkingAsIntended
Jul 11, 2012
Project Member #2
This also requires a deeper integration with WebKit's typed arrays than we have now: currently we can't see through typed array constructor to understand how their backing stores are related. Here you need that plus a basic escape analysis infrastructure accompanied with scalar replacement/object explosion before you can even get to vectorization itself, which always boils down to pattern matching.

[That said I don't think a vectorization pass recognizing limited amount of cases will be very costly.]
Jul 23, 2012
If this bug requires other things to be fixed first, can we file those other things and make this one dependent on them instead of closing this one? Now that CPU hz has plateaued, I have a hard time taking languages seriously that don't support parallel CPU features.

Regarding the runtime overhead of detection, I suggested in an old email thread that we do this for obvious for loops:

  // v, a and b are 32-bit typed arrays.
  for (i = 0; i < 4; ++i) v[i] = a[i] * b[i];

Gating the detection by a constant for loop of 4 iterations (ex) seems like it would mostly solve the overhead problem, no? I guess this would make code pretty fugly without inline or macro support, but maybe it's a start.

Jul 24, 2012
Project Member #4
(No comment was entered for this change.)
Status: New
Labels: Type-FeatureRequest Priority-Low
Dec 4, 2013
Recently I wrote a program in C. During code execution, data calculation is bottleneck. I would like to solve the problem using SIMD, but have some problems.Would like to help me? My Email is
Sign in to add a comment

Powered by Google Project Hosting