Sunday, December 20, 2009

A tour of xarray: part 1

A tour of xarray

I frequently come across situations which require taking slices from arrays, and sometimes I need other high-level array manipulations, such as index permutations or projections to row- or column major. In an earlier attempt called affi, I handled these things using affine mappings, but now I realize that those are not general enough, and my need for these manipulations to be super-efficient was mostly imaginary. I started working on a library, which evolved to xarray, which is able to do these things (and much more—see the next post!) for any kind of array-like object, provided that they have the required minimal interface, defined as a few CLOS method. This blog post is the first of a series which is meant to give a tour of xarray.

The minimal interface

Xarray handles objects which can be addressed using a rectilinear coordinate system. No assumptions are made about the storage model, and if you want some class to be xarray-compatible, just define methods xdims, which gives a list of dimensions, and xref (also (setf xref), if your object is modifiable, but this is not required for basic functionality), which is a generic function pretty much like aref. Finally, there is the generic function xelttype to query element type. This is all you need to provide.

Based on xdims, reasonable defaults are provided for xrank, which returns an integer for the number of dimensions, xsize, which returns the total number of elements in the object, and (xdim object i), which is defined as (nth (xdims object) i) by default. Feel free to define these, though, if there is a "natural" way to do this for your objects (like it is for arrays, below).

To demonstrate how easy it is to do this for any kind of object, here are the definitions from array.lisp, which defines these methods for Common Lisp arrays:

(defmethod xelttype ((object array))
  (array-element-type object))

(defmethod xrank ((object array))
  (array-rank object))

(defmethod xdims ((object array))
  (array-dimensions object))

(defmethod xdim ((object array) axis-number)
  (array-dimension object axis-number))

(defmethod xsize ((object array))
  (array-total-size object))

(defmethod xref ((object array) &rest subscripts)
  (apply #'aref object subscripts))

(defmethod (setf xref) (value (object array) &rest subscripts)
  (setf (apply #'aref object subscripts) value))

In case a specialized method is not provided for an object, the default fallback methods (in atoms.lisp) just handle objects as atoms, ie arrays of rank 0. After you define the methods xdims, xref and xelttype—and nothing more!—for your class, you can use xarray's views. The (somewhat sloppy) terminology of xarray is to call classes which provide a minimal interface xrefable.

Views

Technically, views are functions mapping a set of rectilinear indexes to a (sub)set of an xrefable object called the ancestor. Practically, they are objects that walk and quack like arrays, but are usually a thin layer on other array-like objects.

Let's see some examples. I will use a Common Lisp array as a starting point, but of course these examples would work with any object as long as it is xrefable.

Slices

Slicing is the most obvious view that you can have for an array:

CL-USER> (defparameter *a* (make-array '(3 4) :initial-contents
                              '((1 2 3 4)
                                (5 6 7 8)
                                (9 10 11 12))))
CL-USER> (slice *a* 2 :all)
#<SLICE-VIEW 
#(9 10 11 12)  {B1025A1}>

Instead of an array, you get a slice-view object. Print-object for views usually just prints the elements as if they were an array, even if the ancestor isn't (this was simple to do, fancy printing may be introduced later at some point, but don't hold your breath). 2 selects the elements of same index along the 0th dimension, while :all selects all elements along the 1st.

You can get an array using take:

CL-USER> (take t (slice *a* 2 :all))
#(9 10 11 12)

The first argument tells take to return an object similar to the ancestor—we will tackle the concept of similarity and object creation in the next post about xarray.

For now, notice how slice dropped a dimension when you asked for a single index. You can avoid that by making it a list:

CL-USER>
(slice *a* '(2) :all)
#<SLICE-VIEW 
#2A((9 10 11 12))  {B9EBF59}>

You can drop unit dimensions with (drop object).

Lists of two elements select ranges (inclusive), which can also reverse elements, for example, ='(2 0)= would select the third, second and first elements along that dimensions. Negative integers count from the largest index (which is -1; think of -i as (- (xdim object d) i)) backwards, and :rev reverses all elements (so it is equivalent to \'(0 -1)). Finally, a vector just picks individual indexes (which can repeat, occur in any order, be negative following the conventions above, etc):

CL-USER> (slice *a* -1 :all)
#<SLICE-VIEW 
#(9 10 11 12)  {AC4FF31}>
CL-USER> (slice *a* :rev '(1 3))
#<SLICE-VIEW 
#2A((10 11 12) (6 7 8) (2 3 4))  {BAC83F9}>
CL-USER> (slice *a* '2 #(1 1 3))
#<SLICE-VIEW 
#(10 10 12)  {BB66859}>

Permutations

A permutation permutes the dimensions of an xrefable object:

CL-USER> (take t (permutation *a* 1 0))
#2A((1 5 9) (2 6 10) (3 7 11) (4 8 12))

Transposing is a special case of permutations, with indexes 1 and 0 (as above). Of course, you can combine views:

CL-USER> (slice (permutation *a* 1 0) :rev :rev)
#<SLICE-VIEW 
#2A((12 8 4) (11 7 3) (10 6 2) (9 5 1))  {AAAC849}>

Miscellaneous views

Two other special views are column-major-projection and flat views. The first one provides a projection of an object as if the elements of the two were arranged in column-major order (note that this may not be the actual storage model, as xarray does not make assumptions the latter but uses xref; however, for some special objects faster methods may be provided):

CL-USER> (column-major-projection *a* 6 2)
#<COLUMN-MAJOR-PROJECTION-VIEW 
#2A((1 3) (5 7) (9 11) (2 4) (6 8) (10 12))  {BE28921}>

Flat views just return an object with a single dimension, with the order of elements unspecified. This is advantageous if the object has a natural representation in some storage model, and we want to perform operations where the order of elements does not matter (eg sum the elements).

Semantics of views

(setf xref), when provided, sets elements in the ancestor of an object: if that is another view, the original ancestor (ie the object which is not a view) is modified. (Thus views always share structure. If you want to avoid this, use take to create a new object.

All the operations shown above are generic functions, you are free to specialize them. These methods

  1. don't have to return subclasses of view, they are free to return any xrefable object as long as it shares structure,
  2. are allowed to combine views.

For example, some CL array slices with contiguous row-major indexes could be implemented as displaced arrays, or a combination of a transpose and a slice can be folded into one view if the object supports that. Currently, these efficiency hacks are not implemented as I don't really need them, but the infrastructure is there.

Next time, I will address object creation and some generic operations on xrefable objects.

5 comments:

  1. Tamas,

    I have been working on a system called "grid structured data" which looks similar in intent to your xarrays, and it is based on your affi system. It is not published yet, but it is getting close. I am in the process of breaking up GSLL so that it will not only be easier to base it on GSD, but that other foreign array package interfaces can be built easier.

    Perhaps we should look into merging our systems?

    Liam

    ReplyDelete
  2. You should really look at how NumPy implements this. I applaud the idea, but you're going to need a much richer interface to get decent speed out of such a framework. (NumPy essentially macro-expands key algorithms out for each UPGRADED-ARRAY-ELEMENT-TYPE; you could do the same thing much easier in CL by defining "generic" methods and then specializing on "interesting" xarray types--arrays of floats, doubles, fixnums are the ones that come immediately to mind.)

    ReplyDelete
  3. Liam: I am interested. Let me know when you make sources public. Xarray is experimental, and I am open to incorporating stuff, or merging. However, I am under the illusion that it is very general: is there anything that you are missing from it? It was written partially with GSLL (which I like very much) in mind and I was hoping that GSLL could one day provide a conforming interface.

    AFFI is a dead end, especially you want to do things like

    (slice object '(1 3 3 2) '(4 2 -1))

    BTW, like the slice DSL especially :-)

    ReplyDelete
  4. Nathan: thanks for the comment, but I am not going down that route. Currently, my goal is to develop a nice interface/DSL that I like using, instead of doing things manually and being dazzled by how fast my hand-optimized code is, then realizing that I wasted a couple of hours writing it.

    Xarray generic operations do not have to be superfast, and I am quite satisfied with the speed of the current (unoptimized) version. I want the framework to be optimizable, but that is a minor concern at this point, since I am still developing the interface to be useful and intuitive. If I did optimizations at this stage, I would have a lot of extra baggage that would prevent change. Quite a few CL array packages fell into that trap and got nowhere.

    Also, I don't think that the NumPy experience is super-relevant to CLers. Pythonistas have to fight an insane language/implementation to get any appreciable speed, and are forced to commit strange hacks in the process. CL is much faster to start with. Still, if you find that something concrete is very inefficient, let me know and I will see what I can do about it.

    ReplyDelete
  5. This structure is pretty interesting, that helps you to calculate sums or any other operations of the rows and columns. generic viagra

    ReplyDelete