Wednesday, July 20, 2011

LLA reorganization almost finished, new version on Github

I have almost completed the reorganization of LLA and merged it to the main branch on Github (sorry, no installation via Quicklisp until the library is fully stable, for the moment you have to check it out from Github). Make sure that you get the latest dependencies, most importantly cl-num-utils and let-plus.

This version of LLA boasts a reorganized interface: most importantly, now it works with plain vanilla Common Lisp arrays:

LLA> (defparameter *a* #2A((1 2) (3 4)))
*A*
LLA> (mm t *a*) ; same as A^T A, but LLA recognizes that the result is Hermitian
#<HERMITIAN-MATRIX 
  10.00000        *
  14.00000 20.00000>
LLA> (elements *)  ; Lisp arrays inside all special matrices
#2A((10.0d0 0.0d0) (14.0d0 20.0d0))
LLA> (svd *a* :thin)
#S(SVD
   :U #2A((-0.40455358483375703d0 0.9145142956773045d0)
          (-0.9145142956773045d0 -0.40455358483375703d0))
   :D #<DIAGONAL 
  5.46499       .
        . 0.36597>
   :VT #2A((-0.5760484367663207d0 -0.817415560470363d0)
           (-0.817415560470363d0 0.5760484367663208d0)))
LLA> (elements (svd-d *)) ; Lisp arrays again
#(5.464985704219043d0 0.3659661906262576d0)

LLA uses clever tricks to avoid transposing when it can. Row-major and column-major representations are transposes of each other, and instead of calculating an SVD as $$A=UDV^T$$ LLA justs calculates $$A^T={V^T}^T D^T U^T$$ and returns \(U\) as \(V^T\) and vice versa. Of course transposing cannot be avoided all the time, but it is not needed often and should not impose a large performance penalty. So LLA can work with row-major Common Lisp arrays natively — in fact, that's the only array representation it supports.

This version of LLA has a reorganized DSL for BLAS/LAPACK calls which makes it really easy to wrap Fortran functions in general, not just LAPACK/BLAS functions for which it has extensions. For example, this is how Cholesky factorization is implemented:

(defmethod cholesky ((a hermitian-matrix))
  (let+ ((a (elements a))
         ((a0 a1) (array-dimensions a)))
    (assert (= a0 a1))
    (lapack-call ("potrf" (common-float-type a)
                          (make-instance 'cholesky :left-square-root 
                                         (make-lower-triangular-matrix l)))
                 #\U (&integer a0) (&array a :output l) (&integer a0) &info)))

The first two lines are just bookkeeping: the array A containing the elements of the hermitian (symmetric) matrix (in the lower triangle) is extracted, and its dimensions are examined. LAPACK-CALL figures out whether to use SPOTRF, DPOTRF, CPOTRF, ZPOTRF from the element type of the matrix, allocates memory for atoms (recall that Fortran takes pointers) using the macros, and automatically examines the INFO parameter at &info to catch exceptions. The #\U in the code suggests that storage is upper-triangular, even though in CL we store elements of the hermitian matrix in the lower triangle: this is because we avoid transposing. I find that it is really easy to write wrappers to Fortran code using this macro family (there is also a BLAS-CALL and a LAPACK-CALL-W/QUERY that queries workspace sizes), and it should be easy to write a FORTRAN-CALL for general Fortran code, or even a wrapper for C functions that handle arrays. Let me know if you are interested in any of these, I would be happy to add them or even factor out this part of LLA into another library.

By the way, contrary to what my previous post on LLA said, you no longer need the C wrappers in MKL or CLAPACK, so ATLAS or something similar is just fine. The README has instructions on how to set up these libraries (unless you want to compile your own, a single apt-get install or similar is enough to satisfy LLA's dependencies) and configure custom library locations.

Even though LLA is now perfectly usable and all the unit tests work just fine, there are still some things left to do:

Write the array implementation-specific sharing macros.
LLA uses a few macros to make arrays available for foreign functions, and all the wrappers are built around these. The idea is to avoid copying if the implementation supports it, and otherwise fall back to copying. I decided to focus my efforts on getting LLA to work again first, and this means that currently all operations fall back to copying, which should entail a slight performance penalty. I will of course remedy this at some point, first for SBCL and ECL and then for the other distributions, but I am waiting for the rest of LLA to stabilize before I start fiddling with low-level optimizations.
Eigenvalues/eigenvectors are not yet implemented.
LAPACK's handling of eigenvalues and especially eigenvectors is a mess, and I need to dust off the code that picks out paired eigenvectors. It should be fairly easy, but I prefer not to add functionality without thorough testing, so currently I am postponing this until someone needs it (just drop me a line).

Also, even though I wrote a lot of unit tests, it is of course possible (and probable) that LLA has bugs. Please report them on Github.

I would like to thank all users of LLA for helpful suggestions and for the gentle nudging I received during LLA's reorganization. I believe that LLA fills an important niche, and I will continue to work on it.

No comments:

Post a Comment