Exploring DL4J – Ops & Executors

Ops, or operations, are simple blocks of code defining how a specific calculation or operation should occur. They are a core piece of ND4J, and therefore of Deeplearning4j.

opsAndExec copy

Ops define calculations at an element-wise level. This aspect is central to both Nd4j and Dl4j. But before we get too deep into the specifics of Ops, let us examine the framework and all of the components.


Not all ops satisfy the element-wise specification, but those that don’t will have a flag to mark this: “passThrough”.

For Ops structurally on the interface side, we have:

  1. Ops
    1. TransformOp
    2. ScalarOp
    3. Accumulation
      1. LossFunction

There are base implementations of each of the interfaces above:

  • BaseOp
  • BaseTransformOp
  • BaseAccumulation
  • BaseScalarOp
  • BaseLossFunction

These provide foundational method implementation; that is, they are abstract implementations that solve one portion of the methods required, which can then be extended to address specific problems.

TransformOps generally apply a function that transforms the input, and include tanH, sigmoid, rectified linear (RELU), and so forth. ScalarOps provide either simple scalar operation definitions or scalar comparison operations. Accumulations are operations that provide functions such as Sum, Mean, Variance or StandardDev. Distance functions also fall under this category.

Getting Started

The op classes are not used directly by the programmer. Instead, we use an executor to handle the actual use of the ops. Executors are written to take advantage of the system resources and distribute a task to whatever resources are available. In nearly all cases, we will access an executor via


Before we talk ourselves in circles about the technicalities, let’s examine some code and output to get familiar with the basic premise of the Ops.

First we’ll create an INDArray full of values:

INDArray values = Nd4j.linspace(1, 20, 20).reshape(4,5);

Here we tell Nd4j to populate an array of 20 elements, linearly spanning from 1 to 20. Then we reshape our INDArray, making it a matrix with 4 rows and 5 columns.


Now let’s do something with them!

Single Input Ops

Nd4j offers convenient methods to access Ops and execute with an Executor. For example, the interface INDArray gives us Sum, Mean, Norm1, STD and so on.

These are all examples of methods that, under the hood, construct an Op, initialize that Op and execute it with an Executor. Here, we will use the Accumulation Op, Sum, first through the convenient method(s), and then later through explicit setup and execution.

So if we remember the code and output from above we have:


Any INDArray allows us to call a sum function on itself, which asks for the dimensions we’d like to sum across. We need to provide at least one value here, because doing otherwise would throw an exception. If we remember, Nd4j structures its dimensions like this, rows = 0, columns = 1, depth = 2, and so on.

Let’s first sum across the rows, meaning 1 + 6 + 11+ 16 for the first row, and so on via


The resulting INDArray looks like this:

[ 34.00, 38.00, 42.00, 46.00, 50.00]

Likewise if we wanted to sum across columns, we do


The resulting INDArray looks like this:

[ 15.00, 40.00, 65.00, 90.00]

What if we wanted to sum the entire array? There are a few options. The first is to pass in both dimensions to the sum call via array.sum(0,1). This works, but we have to be careful.

If we had a three-dimensional array, we’d sum along the third dimension, giving us a series of summed elements. The INDArray interface also includes a method that will return a number, array.sumNumber(). Or we could call the static method Nd4j.sum(array) to sum all elements within the array.

Interestingly enough, both the sumNumber() and this static method calls array.sum(Integer.MAX_VALUE). The Integer.MAX_VALUE is a key constant value used by the Executor to compute the entire array and return it as a Scalar INDArray.

What if we don’t have a readily available way to call upon an Op? We can simply construct a new instance of that op and pass it to the Executor. Again, we’ll use sum, since we have the expected outcomes above.

Sum sum = new Sum(values);
Nd4j.getExecutioner().exec(sum, 0);

The syntax is essentially the same: we pass in our INDArray object, values, to the Sum operator. Then we tell the executor to execute along dimension 0. Since Ops are only stateful for the duration of the execution call, we discard the Op when we are done. Thus, we could condense this to one line:

Nd4j.getExecutioner().exec(new Sum(values), 0);

We can see that the Op is simply an object that defines the operation, but is unable to do anything further with out the assistance of an Executor. The above call is only the beginning of what Ops look like.

Dual Input Operations – Euclidean

Ops are comprised of two input INDArrays, when needed, and a single buffer INDArray: x, y and z, respectively. In a previous tutorial, LBI: KNN, we used the Euclidean Op, which is a great example of a two-input Op. Lets take a look at how we would use a euclidean Op. First, we’ll create another set of values with the same shape.

INDArray values2 = Nd4j.linspace(2,21,20).reshape(4,5);

Now we’ll create our Op, but this time we’ll use Nd4j’s OpsFactory. The Ops factory is a convenient way to construct various Ops. Throughout DL4J you’ll see Ops being used once and discarded, so we will conform with this design here.


If we do happen to have access to an instantiated Op and want to construct another of the same type, we can simply ask for its name. All Ops will give their name in String format, and we can feed that into the OpsFactory.

Accumulation accum = Nd4j.getOpFactory().createAccum("euclidean", values, values2);

So we specify the Accumulation Op (Euclidean), and the two inputs we wish to work with. That’s about it when it comes to setting up the Op. Now we can pass our Distance operation into an executor and compute a result.

INDArray distanceResult = Nd4j.getExecutioner().exec(accum, 0);
[ 2.00, 2.00, 2.00, 2.00, 2.00]

Because the two arrays we created are offset by one, we can expect the difference between the two arrays at any index to be 1 (1-20 & 2 – 21). Since we are calculating the euclidean distance over the row dimension we get an output vector of the length 5 (the length of the remaining dimension, which happens to be the column here). When we add up the difference over the rows we get sqrt(1 + 1 + 1 +1) or sqrt(4) for each column. So the output above is 2, for each column.

We used dimension 0 for both Sum and Euclidean, and both results output 5 (the number of columns we had) values. As we stated before, specifying dimension 0 tells the executor to operate across the rows. In our case, we had a matrix with 4 rows and 5 columns. If we specify the dimension to be 1, we’ll get an INDArray with 4 values, corresponding with the remaining dimensions size.

[ 2.24, 2.24, 2.24, 2.24]

Now, what exactly is happening when we call upon our trusty executor? There are two general flows that the executor will follow. If we specify an operation to be executing using the key constant MAX_VALUE, then the default implementation is to do a linear loop over all the values for the op. If the executor is executing over dimensions, then it will make use of a parallel executor. The default implementation makes use of Java’s fork and join framework. When an op is called upon for a dimension, the stock implemented Ops all have code similar to this:

public Op opForDimension(int index, int dimension) {
INDArray xAlongDimension = x.vectorAlongDimension(index, dimension);
if (y() != null)
 return new BatchLoss(xAlongDimension, y.vectorAlongDimension(index, dimension), xAlongDimension.length(), lossFunction);
 return new BatchLoss(x.vectorAlongDimension(index, dimension), lossFunction);

Where instead of BatchLoss, it would be whichever op is being called. This essentially creates a duplication of the original op, but for a specific dimension, which allows the operations to be carried out in parallel.


We will conclude here, but there’s plenty more nuanced discussion to be had about Ops and Executors. The key take away here is that Ops and Executors are a central part to the Nd4j framework, and that they are designed to be mindful of distributed computing. They are small operation bits that should be used in conjunction with each other to accomplish various complex goals. As long as you use them, you will be able to interchange the backend and get the most out of the your Nd4j experience. I encourage you to poke around in the Ops and Executors for yourself to get a deeper understanding of their inner workings.

Lastly, I want to thank Chris Nicholson for the additional help and probing for me to get this finally up.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s