Thursday 1 October 2015

OpenCL lambda enqueue

Just had a thought on an alternative api for CLCommandQueue in zcl. No this has nothing to do with lambda calculus in OpenCL.

An inconvenience in the current api is that all the enqueue functions take a lot of arguments, many of which are typically default values. This can be addressed using function overloading but this just adds additional inconvenience as there are also simply a lot of functions to overload. A related issue is things like extensions can add additional entry points which are object-orientedly resident on the queue object but placing them there doesn't necessarily fit.

And finally new compound operations need to be placed elsewhere but also fit a similar semantic model of enqueing a task to a specific queue.

So the thought is to instead to use java's lambda expressions to create queueable objects which know how to run themselves, and then at least the waiters/events parameter overload can be handled in one place.

So rather than:

// some compound task
  public void runop(CLCommandQueue q, CLImage src, CLImage dst,
      CLEventList waiters, CLEventList events) {
     ... enqueue one or more jobs ...
  }
  public void runop(CLCommandQueue q, CLImage src, CLImage dst) {
     runop(q, src, dst, null, null);
  }
  public void runop(CLCommandQueue q, CLImage src, CLImage dst,
      CLEventList event) {
     runop(q, src, dst, null, event);
  }

// usages
 runop(q, src, dst, waiters, events);
 runop(q, src, dst, events);
 runop(q, src, dst);

I can do:

// the interface
interface CLTask {
  public void enqueue(CLCommandQueue q, CLEventList w, CLEventList e);
}

// the creation (only one required)
 public CLTask of(CLImage src, CLImage dst) {
   return (q, w, e) -> {
     ... enqueue one or more jobs ...
   };
 }

// usages
 q.run(op.of(src, dst));
 q.run(op.of(src, dst), events);
 q.run(op.of(src, dst), waiters, events);

This could extend throughout the rest of the api so that for example a CLBuffer would provide it's own read task factories:

  public CLBuffer {

    public CLTask ofRead(byte[] target) {
      return (q, w, e) -> {
        q.enqueueReadBuffer(this, true, 0, target.length, target, 0, w, e);
      };
    }
  }

// usages
  q.run(buffer.ofRead(target));
  q.run(buffer.ofRead(target), events);
  q.run(buffer.ofRead(target), waiters, events);

vs

// typical usage (without overloading)
  q.enqueueReadBuffer(this, true, 0, target.length, target, 0, null, null);
  q.enqueueReadBuffer(this, true, 0, target.length, target, 0, null, events);
  q.enqueueReadBuffer(this, true, 0, target.length, target, 0, waiters, events);

I think this would provide a way to add the convenience of overloading without a method count explosion. But the real question is whether it would actually improve the api in any meaningful way or merely make it different. Probably at this point it's a tentative yes on that one for many of the same reasons lambdas are convenient such as encapsulation and reuse.

There are some issues of resolving state at point-of-execution and threads but these are already an issue with OpenCL code to some extent and definitely with lambdas in general.

One could keep going:

// the interface
interface CLTask {
  public void enqueue(CLCommandQueue q, CLEventList w, CLEventList e);

  public default void on(CLCommandQueue q) {
    enqueue(q, null, null);
  }
  public default void on(CLCommandQueue q, CLEventList w, CLEventList e) {
    enqueue(q, w, e);
  }
}

// usage
 buffer.ofRead(target).on(q);

Despite this having the benefit of layering in isolation above the base api I think it starts to get a little absurd and turns into "much of a muchness" deckchair shuffling.

Although this addition is probably useful:

// the interface
interface CLTask {
  public void enqueue(CLCommandQueue q, CLEventList w, CLEventList e);

  public default CLTask andThen(CLTask after) {
     return (q, w, e) -> {
        enqueue(q, w, e);
        after.enqueue(q, w, e);
     };
  }
}

// usage
 q.run( buffer1.ofRead(target).andThen(buffer2.ofWrite(target)) );

Actually I didn't really intend it as an outcome but this also becomes a lot more usable if the resources in questions are automatically reclaimable via gc as per my last post. Whole state and work spaces can be retained and reused through nothing more than a CLTask reference.

I think i've convinced myself of the utility now but either way it takes very little code to try it.

No comments: