A tour of xarray
I frequently come across situations which require taking slices from
arrays, and sometimes I need other high-level array manipulations,
such as index permutations or projections to row- or column major. In
an earlier attempt called affi
, I handled these things using affine
mappings, but now I realize that those are not general enough, and my
need for these manipulations to be super-efficient was mostly
imaginary. I started working on a library, which evolved to xarray,
which is able to do these things (and much more—see the next post!)
for any kind of array-like object, provided that they have the
required minimal interface, defined as a few CLOS method. This blog
post is the first of a series which is meant to give a tour of
xarray
.
The minimal interface
Xarray
handles objects which can be addressed using a rectilinear
coordinate system. No assumptions are made about the storage model,
and if you want some class to be xarray-compatible, just define
methods xdims
, which gives a list of dimensions, and xref
(also
(setf xref)
, if your object is modifiable, but this is not required
for basic functionality), which is a generic function pretty much like
aref
. Finally, there is the generic function xelttype
to query
element type. This is all you need to provide.
Based on xdims
, reasonable defaults are provided for xrank
, which
returns an integer for the number of dimensions, xsize
, which
returns the total number of elements in the object, and (xdim object i)
, which is defined as (nth (xdims object) i)
by default. Feel
free to define these, though, if there is a "natural" way to do this
for your objects (like it is for arrays, below).
To demonstrate how easy it is to do this for any kind of object, here
are the definitions from array.lisp
, which defines these methods for
Common Lisp arrays:
(defmethod xelttype ((object array)) (array-element-type object)) (defmethod xrank ((object array)) (array-rank object)) (defmethod xdims ((object array)) (array-dimensions object)) (defmethod xdim ((object array) axis-number) (array-dimension object axis-number)) (defmethod xsize ((object array)) (array-total-size object)) (defmethod xref ((object array) &rest subscripts) (apply #'aref object subscripts)) (defmethod (setf xref) (value (object array) &rest subscripts) (setf (apply #'aref object subscripts) value))
In case a specialized method is not provided for an object, the
default fallback methods (in atoms.lisp
) just handle objects as
atoms, ie arrays of rank 0. After you define the methods xdims
,
xref
and xelttype
—and nothing more!—for your class, you can
use xarray
's views. The (somewhat sloppy) terminology of xarray
is to call classes which provide a minimal interface xrefable.
Views
Technically, views are functions mapping a set of rectilinear indexes to a (sub)set of an xrefable object called the ancestor. Practically, they are objects that walk and quack like arrays, but are usually a thin layer on other array-like objects.
Let's see some examples. I will use a Common Lisp array as a starting point, but of course these examples would work with any object as long as it is xrefable.
Slices
Slicing is the most obvious view that you can have for an array:
CL-USER> (defparameter *a* (make-array '(3 4) :initial-contents '((1 2 3 4) (5 6 7 8) (9 10 11 12)))) CL-USER> (slice *a* 2 :all) #<SLICE-VIEW #(9 10 11 12) {B1025A1}>
Instead of an array, you get a slice-view
object. Print-object
for views usually just prints the elements as if they were an array,
even if the ancestor isn't (this was simple to do, fancy printing may
be introduced later at some point, but don't hold your breath). 2
selects the elements of same index along the 0th dimension, while
:all
selects all elements along the 1st.
You can get an array using take
:
CL-USER> (take t (slice *a* 2 :all)) #(9 10 11 12)
The first argument tells take
to return an object similar to the
ancestor—we will tackle the concept of similarity and object
creation in the next post about xarray
.
For now, notice how slice dropped a dimension when you asked for a single index. You can avoid that by making it a list:
CL-USER> (slice *a* '(2) :all) #<SLICE-VIEW #2A((9 10 11 12)) {B9EBF59}>
You can drop unit dimensions with (drop object)
.
Lists of two elements select ranges (inclusive), which can also
reverse elements, for example, ='(2 0)= would select the third,
second and first elements along that dimensions. Negative integers
count from the largest index (which is -1
; think of -i
as (- (xdim object d) i)
) backwards, and :rev
reverses all elements (so
it is equivalent to \'(0 -1)
). Finally, a vector just picks
individual indexes (which can repeat, occur in any order, be negative
following the conventions above, etc):
CL-USER> (slice *a* -1 :all) #<SLICE-VIEW #(9 10 11 12) {AC4FF31}> CL-USER> (slice *a* :rev '(1 3)) #<SLICE-VIEW #2A((10 11 12) (6 7 8) (2 3 4)) {BAC83F9}> CL-USER> (slice *a* '2 #(1 1 3)) #<SLICE-VIEW #(10 10 12) {BB66859}>
Permutations
A permutation permutes the dimensions of an xrefable object:
CL-USER> (take t (permutation *a* 1 0)) #2A((1 5 9) (2 6 10) (3 7 11) (4 8 12))
Transposing is a special case of permutations, with indexes 1
and 0
(as above). Of course, you can combine views:
CL-USER> (slice (permutation *a* 1 0) :rev :rev) #<SLICE-VIEW #2A((12 8 4) (11 7 3) (10 6 2) (9 5 1)) {AAAC849}>
Miscellaneous views
Two other special views are column-major-projection
and flat
views. The first one provides a projection of an object as if the
elements of the two were arranged in column-major order (note that
this may not be the actual storage model, as xarray does not make
assumptions the latter but uses xref
; however, for some special
objects faster methods may be provided):
CL-USER> (column-major-projection *a* 6 2) #<COLUMN-MAJOR-PROJECTION-VIEW #2A((1 3) (5 7) (9 11) (2 4) (6 8) (10 12)) {BE28921}>
Flat
views just return an object with a single dimension, with the
order of elements unspecified. This is advantageous if the object has
a natural representation in some storage model, and we want to perform
operations where the order of elements does not matter (eg sum the
elements).
Semantics of views
(setf xref)
, when provided, sets elements in the ancestor of an
object: if that is another view, the original ancestor (ie the
object which is not a view) is modified. (Thus views always share structure. If you want to avoid this, use take
to create a new
object.
All the operations shown above are generic functions, you are free to specialize them. These methods
-
don't have to return subclasses of
view
, they are free to return any xrefable object as long as it shares structure, - are allowed to combine views.
For example, some CL array slices with contiguous row-major indexes could be implemented as displaced arrays, or a combination of a transpose and a slice can be folded into one view if the object supports that. Currently, these efficiency hacks are not implemented as I don't really need them, but the infrastructure is there.
Next time, I will address object creation and some generic operations on xrefable objects.
Tamas,
ReplyDeleteI have been working on a system called "grid structured data" which looks similar in intent to your xarrays, and it is based on your affi system. It is not published yet, but it is getting close. I am in the process of breaking up GSLL so that it will not only be easier to base it on GSD, but that other foreign array package interfaces can be built easier.
Perhaps we should look into merging our systems?
Liam
You should really look at how NumPy implements this. I applaud the idea, but you're going to need a much richer interface to get decent speed out of such a framework. (NumPy essentially macro-expands key algorithms out for each UPGRADED-ARRAY-ELEMENT-TYPE; you could do the same thing much easier in CL by defining "generic" methods and then specializing on "interesting" xarray types--arrays of floats, doubles, fixnums are the ones that come immediately to mind.)
ReplyDeleteLiam: I am interested. Let me know when you make sources public. Xarray is experimental, and I am open to incorporating stuff, or merging. However, I am under the illusion that it is very general: is there anything that you are missing from it? It was written partially with GSLL (which I like very much) in mind and I was hoping that GSLL could one day provide a conforming interface.
ReplyDeleteAFFI is a dead end, especially you want to do things like
(slice object '(1 3 3 2) '(4 2 -1))
BTW, like the slice DSL especially :-)
Nathan: thanks for the comment, but I am not going down that route. Currently, my goal is to develop a nice interface/DSL that I like using, instead of doing things manually and being dazzled by how fast my hand-optimized code is, then realizing that I wasted a couple of hours writing it.
ReplyDeleteXarray generic operations do not have to be superfast, and I am quite satisfied with the speed of the current (unoptimized) version. I want the framework to be optimizable, but that is a minor concern at this point, since I am still developing the interface to be useful and intuitive. If I did optimizations at this stage, I would have a lot of extra baggage that would prevent change. Quite a few CL array packages fell into that trap and got nowhere.
Also, I don't think that the NumPy experience is super-relevant to CLers. Pythonistas have to fight an insane language/implementation to get any appreciable speed, and are forced to commit strange hacks in the process. CL is much faster to start with. Still, if you find that something concrete is very inefficient, let me know and I will see what I can do about it.
This structure is pretty interesting, that helps you to calculate sums or any other operations of the rows and columns. generic viagra
ReplyDelete