Benchmarking Vector API – Arrays, collections and data structures

0 Comments 7:00 AM

105. Benchmarking Vector API

Benchmarking Vector API can be accomplished via JMH. Let’s consider three Java arrays (x, y, z) each of 50,000,000 integers, and the following computation:

z[i] = x[i] + y[i];
w[i] = x[i] * z[i] * y[i];
k[i] = z[i] + w[i] * y[i];

So, the final result is stored in a Java array named k. And, let’s consider the following benchmark containing four different implementations of this computation (using a mask, no mask, unrolled, and plain scalar Java with arrays):

@OutputTimeUnit(TimeUnit.MILLISECONDS)
@BenchmarkMode({Mode.AverageTime, Mode.Throughput})
@Warmup(iterations = 3, time = 1)
@Measurement(iterations = 5, time = 1)
@State(Scope.Benchmark)
@Fork(value = 1, warmups = 0,
      jvmArgsPrepend = {“–add-modules=jdk.incubator.vector”})
public class Main {
  private static final VectorSpecies<Integer> VS
    = IntVector.SPECIES_PREFERRED;
  …
  @Benchmark
  public void computeWithMask(Blackhole blackhole) {…}
  @Benchmark
  public void computeNoMask(Blackhole blackhole) {…}
  @Benchmark
  public void computeUnrolled(Blackhole blackhole) {…}
  @Benchmark
  public void computeArrays(Blackhole blackhole) {…}
}

Running this benchmark on Intel(R) Core(TM) i7-3612QM CPU @ 2.10GHz machine with Windows 10 has produced the following results:

Figure 5.9 – Benchmark results

Overall, executing the computation using data-parallel capabilities perform the best having the highest throughput and best average time.

106. Applying Vector API to compute FMA

In Java Coding Problems, First Edition, Chapter 1, Problem 38 we have covered the Fused Multiply Add (FMA). In a nutshell, FMA is the mathematical computation (a*b) + c which is heavily exploited in matrix multiplications.Implementing FMA via Vector API can be done via the fma(float b, float c) or fma(Vector<Float> b, Vector<Float> c) operation which is the one used here shortly.Let’s assume that we have the following two arrays:

float[] x = new float[]{1f, 2f, 3f, 5f, 1f, 8f};
float[] y = new float[]{4f, 5f, 2f, 8f, 5f, 4f};

Computing FMA(x, y) can be express as the following sequence: 4 + 0 = 4 → 10 + 4 = 14 → 6 + 1 4 = 20 → 40 + 20 = 60 → 5 + 60 = 65 → 32 + 65 = 97. So, FMA(x, y) = 97. Expressing this sequence via Vector API can be done as in the following code:

public static float vectorFma(float[] x, float[] y) {
  int upperBound = VS.loopBound(x.length);
  FloatVector sum = FloatVector.zero(VS);
  int i = 0;
  for (; i < upperBound; i += VS.length()) {
    FloatVector xVector = FloatVector.fromArray(VS, x, i);
    FloatVector yVector = FloatVector.fromArray(VS, y, i);
    sum = xVector.fma(yVector, sum);
  }
  if (i <= (x.length – 1)) {
    VectorMask<Float> mask = VS.indexInRange(i, x.length);
    FloatVector xVector = FloatVector.fromArray(
      VS, x, i, mask);
    FloatVector yVector = FloatVector.fromArray(
      VS, y, i, mask);
    sum = xVector.fma(yVector, sum);
  }
  float result = sum.reduceLanes(VectorOperators.ADD);
  return result;
}

Have you noticed the code line, sum = xVector.fma(yVector, sum)? This is equivalent to sum = xVector.mul(yVector).add(sum).The novelty here consists of the line:

float result = sum.reduceLanes(VectorOperators.ADD);

This is an associative cross-lane reduction operation (see figure 5.6). Before this line, the sum vector looks as follows:

sum = [9.0, 42.0, 6.0, 40.0]

By applying the reduceLanes(VectorOperators.ADD) we sum the values of this vector and reduce it to the final result, 97.0. Cool, right?!

Leave a Reply

Your email address will not be published. Required fields are marked *

Covering Vector API structure and terminology 2 – Arrays, collections and data structuresCovering Vector API structure and terminology 2 – Arrays, collections and data structures

The Vector lanes A Vector<E> is like a fixed-sized Java array made of lanes. The lane count is returned by the length() method and is called VLENGTH. The lane count

Adding more artifacts in a record Certification Exams of Java Java Exams Tackling records in Spring Boot Understanding records serialization

Introducing the canonical and compact constructors for records 2 – Record and record patternIntroducing the canonical and compact constructors for records 2 – Record and record pattern

Reassigning components Via an explicit canonical/compact constructor we can reassign components. For instance, when we create a MelonRecord we provide its type (for instance, Cantaloupe) and its weight in grams

Adding more artifacts in a record Certification Exams of Java Getting a list from a stream Java Exams Tackling guarded record patterns Tackling records in Spring Boot Understanding records serialization