103. Summing two arrays via Vector API
Summing two arrays is the perfect start for applying what we’ve learned in the preceding two problems. Let’s assume that we have the following Java arrays:
int[] x = new int[]{1, 2, 3, 4, 5, 6, 7, 8};
int[] y = new int[]{4, 5, 2, 5, 1, 3, 8, 7};
For computing z=x+y via the Vector API, we have to create two Vector instances and rely on the add() operation, z=x.add(y). Since the Java arrays hold integer scalars, we can use the IntVector specialization as follows:
IntVector xVector = IntVector.fromArray(
IntVector.SPECIES_256, x, 0);
IntVector yVector = IntVector.fromArray(
IntVector.SPECIES_256, y, 0);
In Java, an integer needs 4 bytes, so 32 bits. Since x and y hold 8 integers, we need 8*32=256 bits to represent them in our vector. So, relying on SPECIES_256 is the right choice.Next, we can apply the add() operation as follows:
IntVector zVector = xVector.add(yVector);
Done! It is time for JVM to generate the optimal set of instructions (data-parallel accelerated code) that will compute our addition. The result will be a vector as [5, 7, 5, 9, 6, 9, 15, 15].This was a simple case but not quite realistic. Who will employ parallel computational capabilities for summing up two arrays having a couple of elements?! In the real world, x and y may have much more than 8 elements. Most probably, x and y have millions of items and are involved in multiple calculation cycles. That is exactly when we can leverage the power of parallel computation.But, for now, let’s assume that x and y are as follows:
x = {3, 6, 5, 5, 1, 2, 3, 4, 5, 6, 7, 8, 3, 6, 5, 5, 1, 2, 3,
4, 5, 6, 7, 8, 3, 6, 5, 5, 1, 2, 3, 4, 3, 4};
y = {4, 5, 2, 5, 1, 3, 8, 7, 1, 6, 2, 3, 1, 2, 3, 4, 5, 6, 7,
8, 3, 6, 5, 5, 1, 2, 3, 4, 5, 6, 7, 8, 2, 8};
If we apply the previous code (based on SPECIES_256), the result will be the same because our vectors can accommodate only the first 8 scalars and will ignore the rest. If we apply the same logic but use SPECIES_PREFERRED then the result is unpredictable since the vector’s shape is specific to the current platform. However, we can intuit that we will accommodate the first n (whatever that n will be) scalars but not all.This time, we need to chunk the arrays and use a loop to traverse the arrays and compute z_chunk = x_chunk + y_chunk. The result of summing two chunks is collected in a third array (z) until all chunks are processed. We define a method that starts as follows:
public static void sum(int x[], int y[], int z[]) {
…
But, how big a chuck should be? The first challenge is represented by the loop design. The loop should start from 0, but what are the upper bound and the step? Typically, the upper bound is the length of x, so 34. But, using x.length is not exactly useful because it doesn’t guarantee that our vectors will accommodate as many scalars as possible from the arrays. What we are looking for is the largest multiple of VLENGTH (vector’s length) that is less than or equal to x.length. In our case, that is the largest multiple of 8 that is less than 34, so 32. This is exactly what the loopBound() method returns, so we can write the loop as follows:
int upperBound = VSPREF.loopBound(x.length);
for (int i = 0; i < upperBound; i += VS256.length()) {
…
}