## Sunday, May 30, 2010

### Sub-array indexing

One of the things I have been missing in CL was convenient indexing for sub-arrays. Languages like R and Octave all have convenient sub-array indexing. I have experimented with various designs before (some of them I made public, eg AFFI and xarray), but none of them were entirely satisfactory and I kept searching for new solutions.

This is primarily an issue of syntactic convenience: I want to be able to select elements from arrays without writing (nested) loops. I also want to be able to copy elements from arrays to other arrays, similarly selected. Speed is welcome, but at this stage it is secondary: so far, profiling indicates that these operations comprise a tiny fraction of execution time in my programs, but having the ability to capture operations using (sub)arrays is an enormous timesaver and an invaluable semantic abstraction.

## Introduction to sub

My latest attempt at tackling this problem is the generic function sub in my cl-num-utils library (BTW, this library is the successor of my earlier cl-numlib, which is now deprecated, all useful functionality will end up in other libraries, eventually). The syntax is as follows: (sub object &rest ranges). Each range is either a fixnum (selecting a single index), (cons start end) (selecting indexes in that range, excluding end as is usually done for CL library functions), t for all indexes, and finally, a vector of fixnums for anything else.

Let's see some examples:

CLNU> (defparameter *a* #2A((1 2 3 4) (5 6 7 8) (9 10 11 12)))
*A*
CLNU> *a*
#2A((1 2 3 4) (5 6 7 8) (9 10 11 12))
CLNU> (sub *a* 1 t) ; second row
#(5 6 7 8)
CLNU> (sub *a* '(1 . 2) t) ; second row as matrix
#2A((5 6 7 8))


Notice how a single fixnum will drop that dimension (making a vector from a matrix), while selecting the same row with a cons doesn't.

For all other cases, you can use vectors:

CLNU> (sub *a* t #(0 3)) ; first and last columns
#2A((1 4) (5 8) (9 12))
CLNU> (sub *a* t #(3 0)) ; last and first columns
#2A((4 1) (8 5) (12 9))


You can also use negative numbers for indexes: if (minusp index), it will select the column (- dimensions index):

CLNU> (sub *a* -1 t) ; last row
#(9 10 11 12)
CLNU> (sub *a* #(-2 -1) #(-2 -1)) ; bottom right 2x2 matrix
#2A((7 8) (11 12))


Finally, 0 in a cons corresponds to the largest possible index in that dimension:

CLNU> (sub *a* '(-2 . 0) t) ; last two rows
#2A((5 6 7 8) (9 10 11 12))


## Setting subarrays

sub can also be used with setf:

CLNU> *a*
#2A((1 2 3 4) (5 6 7 8) (9 10 11 12))
CLNU> (setf (sub *a* 1 t) #(-1 -2 -3 -4)) ; replace second row
#(-1 -2 -3 -4)
CLNU> *a*
#2A((1 2 3 4) (-1 -2 -3 -4) (9 10 11 12))
CLNU> (setf (sub *a* 2 t) (map 'vector #'+ (sub *a* 0 t) (sub *a* 1 t)))
#(0 0 0 0)
CLNU> *a*
#2A((1 2 3 4) (-1 -2 -3 -4) (0 0 0 0))


Note that the ranks have to be equal, make sure that you keep/drop dimensions as needed.

## Macros for implementing sub

In cl-num-utils, you also find the macro with-range-indexing, which preprocesses the range arguments and implements an internal counter for row-major indexing. For example, sub for arrays is implemented as

(defmethod sub ((array array) &rest ranges)
(declare (optimize debug (speed 0)))
(with-range-indexing (ranges (array-dimensions array) next-index
:end? end?
:range-dimensions dimensions)
(let ((result (make-array (coerce dimensions 'list)
:element-type
(array-element-type array))))
(iter
(until end?)
(for result-index :from 0)
(setf (row-major-aref result result-index)
(row-major-aref array (next-index))))
result)))


The macro takes the range specification as the ranges argument, and also needs the original dimensions of the array. next-index should be the name of the function that will be used to query the next index: it increments the internal counter and delivers the next index. The internal counter is an array of fixnums, and the index is calculated with an affine mapping, but the algorithm allows to update the sum only for the indexes which actually changed. All this is hidden by the macro. end? is a boolean, which can be used to query when all the elements have been traversed. dimensions will be bound to the dimensions of the result, after allowing for dimension droppings.

This macro is pretty versatile: I used it in LLA to implement (sub dense-matrix-like ...), ((setf sub) array dense-matrix-like ...) and ((setf sub) dense-matrix-like array ...) methods, even though LLA is column-major. The trick is to swap dimensions.

## Monday, May 17, 2010

### Upgraded array element types and pinned arrays (updated)

I am planning to do a major rewrite of my LLA library in the near future. LLA, which stands for Lisp Linear Algebra, uses BLAS and LAPACK to perform operations on matrices (and vectors), ranging from matrix multiplication to singular value decompositions. LLA is experimental, and its aim is to provide semantics which recognize special matrix types both as inputs and outputs. Maybe I will blog about it sometime, but that's not the focus of this post.

Currently, LLA uses Lisp arrays, but wrapped in a class which has a slot signifying the element type of the arrays. This is because it is meant to be portable, and not all implementations have upgraded array element types for every type LAPACK recognizes (see below). Also, LLA matrix representation is currently column major, and Lisp arrays are row-major.

However, I realized that I can use Lisp arrays in more places - LAPACK handles row-major arrays just fine (as transposed column-major arrays). Besides, having a native Lisp array is nice, eg one can just use functions like REDUCE or REPLACE, without dealing with a special type. However, there is a price to pay: when LAPACK routines are called, the array element type has to be detectable. For implementations which upgrade element types use by LLA to themselves or something specific, this is a cheap operation. For implementations that don't, it requires an extra pass.

Another issue is whether your implementation has what SBCL calls "pinned" arrays. Pinned arrays can make their contents available to foreign routines directly with negligible overhead for a limited duration (or maybe indefinitely, but I prefer the first).

I asked for implementation-specific details on c.l.l, and got the following information:

ImplementationMachineSDCSCDI32I64pinned?
SBCL (1.0.38)64-bit******yes
Lispworks (6.0.1)32-bit**TT**?
Lispworks Personal Edition (5.1.1)32-bit**TT*T?
Clozure CL (1.4, 1.6)32-bit**TT*Tyes
Clozure CL (1.6)64-bit**TT**yes
ECL (10.3.1)32-bit**TTTTyes
ABCL (0.13.0)64-bitTTTTTT?
Allegco CL Enterprise Edition (8.1 & 8.2, Linux and Windows)32-bit*****T?
Allegco CL Enterprise Edition (8.1 & 8.2, Linux and Windows)64-bit******?
CLISP (2.48)bothTTTTTT?
CMUCL (19e)32-bit*****Tyes
Corman Common Lisp (3.01, Windows)?**TTTT?

The interpretation is this: S, D, CS and CD correspond to (complex) single and double float. I32 and I64 are signed-byte types of 32 and 64 bits. * means that the element type is upgraded to itself, otherwise the table shows the upgraded type.

The following table gives some details on the pinning mechanism when the implementation has it:

ImplementationPinning constructNotes
SBCLsb-sys:with-pinned-objects with sb-sys:vector-sappins object only where GC granularity allows
Clozure CLccl:with-pointer-to-ivectordisables GC
ECL?conservative GC, data can be used as long as the object is alive

So LLA is likely to have the following optimization model: it will be blazingly fast (basically the speed of LAPACK, with a tiny bit of overhead) on implementations which support all upgraded types and pinning, and a bit slower on other ones. If your implementation supports upgrading some of the above element types to themselves, those arrays will require no element type detection so they will be faster. I64 is only needed on 64-bit machines.

The motivation behind this decision is that if you are doing serious numerical work, your implementation should support all relevant array element types and also pinning. SBCL does, and since I am using that, the new version of LLA will take advantage of all its nice facilities — it already does, but currently it only runs on SBCL so that was natural. LLA will be fully functional on other implementations, but may be slower.

Please keep sending in information for the implementations you don't see in the table above. I am especially interested in which implementations support pinning. You can use this code snipped to generate output on array element type upgrading:

(flet ((check-upgraded (type)