Tamas K Papp's blog

Tuesday, January 7, 2014

Simple scripting in Common Lisp

I know enough Bash to read simple scripts, but I am always having a hard time when I am trying to write a script that does something that is more complicated than a one-liner that substitutes arguments into a template. I can eventually figure things out, but I have never bothered to develop my Bash skills, programming in it for me is error-prone and time-consuming.

So recently I have started experimenting with scripting in Common Lisp. It turns out to be very, very easy and convenient if you already know CL. I usually start scripting with a skeleton that looks like this:

(cl:defpackage #:script
  (:use #:cl)
  (:export #:run))

(cl:in-package :script)

;;; code comes here

I run things from SLIME and experiment with the REPL. When I think that I am approaching a solution, I write the main function run:

(defun run ()
  ...)

Then I use the following Makefile to compile it with cl-launch:

myscript: myscript.lisp
        cl-launch --lisp sbcl --file myscript.lisp --dump '!'   \
        --output myscript -i '(script:run)'

You can of course also include ASDF libraries, check the help of cl-launch.

For interfacing with the environment (pathnames, files, OS, etc), I find the UIOP compatibility library really helpful. It is included with ASDF, so most likely it is already on your computer.

Tuesday, December 10, 2013

cl-data-frame pretty printed column summaries

I have been refining printed summaries of data frames for my Common Lisp library cl-data-frame. I found that the following approach works best for me for quick eyeballing of data before any processing or analysis:

Real numbers should be summarized by their range and the three quartiles (25%, 50%, 75%). This provides enough information to assess the variation and the "typical" values of the data.
All other values should be summarized by their count and frequency. This is ideal for categorical data (called "factor" in R), and also for various encodings of missing data.
When the column has both numbers and non-numbers, print both of the above. However, when it has very few distinct numbers, don't use quartiles for numbers, just print the frequencies.

For example,

(dframe:df :a #(nil nil nil 1 1 2 3 "missing" "missing"))

prints as

#<CL-DATA-FRAME:DATA-FRAME (1 x 7)
  :A 3 (43%) x NIL, 2 (29%) x 1, 1 (14%) x 2, 1 (14%) x 3>

while

(dframe:df :a (concatenate 'vector
                           #(nil nil nil "missing" "missing")
                           (clnu:numseq 0 100 :by 1/100)))

prints as

#<CL-DATA-FRAME:DATA-FRAME (1 x 10006)
  :A 10001 reals, min=0, q25=24.9975, q50=50, q75=75.0025, max=100;
     3 (0%) x NIL, 2 (0%) x "missing">

A more realistic example with a dataset I am currently working on that has both numeric, categorical, and missing ("") data:

#<CL-DATA-FRAME:DATA-FRAME (20 x 1082)
  :IDHH 1082 reals, min=7, q25=1351, q50=2548, q75=4073, max=5434
  :IDPERS bits, ones: 1082 (100%)
  :AGE 1082 reals, min=25, q25=41.17526, q50=49.492752, q75=59.814816, max=63
  :GENDER 832 (77%) x "weiblich", 250 (23%) x "maennlich"
  :MARITAL 515 (48%) x "geschieden",
           331 (31%) x "ledig",
           153 (14%) x "verwitwet",
           83 (8%) x "dauernd getrennt lebend"
  :SCHOOL 419 (39%) x "mittlere reife, realschulabschluss",
          338 (31%) x "volksschul-/hauptschulabschluss",
          217 (20%) x "abitur (hochschulreife)",
          87 (8%) x "fachoberschule, fachabitur",
          13 (1%) x "keine angabe",
          8 (1%) x "schule ohne abschluss verlassen"
  :WTTYPN 756 (70%) x "montag bis freitag",
          172 (16%) x "samstag",
          154 (14%) x "sonntag"
  :TMW 1082 reals, min=0, q25=0, q50=4.151436, q75=105, max=740
  :HOURS_MAINJOB 1082 reals, min=0, q25=0, q50=3.0288463, q75=9.53125, max=690
  :HOURS_ADDJOB 1082 reals, min=0, q25=0, q50=1.388621, q75=17.039537, max=450
  :THP 1082 reals, min=0, q25=141.15384, q50=263.13727, q75=386.55173, max=760
  :JUKIGR 664 (61%) x "anderer wert/trifft nicht zu oder kein wert vorhanden",
          165 (15%) x "10 bis unter 15",
          104 (10%) x "6 bis unter 10",
          80 (7%) x "18 bis unter 27",
          53 (5%) x "15 bis unter 18",
          16 (1%) x "27 und älter"
  :LEISURE 1082 reals, min=70, q25=437.74194, q50=601.8919, q75=752.069, max=960
  :USUAL_HOURS 1082 reals, min=300, q25=25138.87, q50=82259.11, q75=99997.82,
               max=99999
  :HHTYPE 664 (61%) x 1, 418 (39%) x 2
  :WORK bits, ones: 285 (26%)
  :MAINWAGE 1078 reals, min=0, q25=0, q50=36.24454, q75=153.93013, max=4100;
            4 (0%) x ""
  :ADDWAGE 1063 reals, min=0, q25=0, q50=6.7724867, q75=34.89418, max=1250;
           19 (2%) x ""
  :WAGE 1059 reals, min=0, q25=0, q50=16.293531, q75=49.222637, max=4100;
        23 (2%) x ""
  :NONWAGE_INCOME 1082 reals, min=0, q25=619.7917, q50=935.9649, q75=1293.9656,
                  max=4800>

This is the first time I used CL's pretty printer, so there might be a few bugs in there. The code is in the repository, you also need to update cl-num-utils.

Sunday, March 31, 2013

Deprecated libraries removed

I had some Common Lisp libraries on Github that I have abandonned a while ago, mostly because I have either

found a better library by someone else,
decided to follow alternative approaches, or
write a better version with an incompatible API.

These libraries have been undergoing some quiet bit-rot for a while which makes them unusable without minor or not so minor fixes, for which I have no time; also, there are better libraries out there for the same purpose.

Having abandonned Common Lisp libraries out there is confusing and goes against the principle of consolidating Common Lisp Libraries, so I decided to remove some repositories. In particular, I have removed

cl-2d, which didn't see any new development in the last two years. My recommended replacement is cl-flexplot, which uses PGF (a LaTeX package) as a backend.
cl-text-tables was simply abandonned, I recommend the excellent cl-csv instead.
cl-numlib is superseded by cl-num-utils.

The latter two suggested replacements are in Quicklisp.

If, for some strange reason, anyone is interested in the code of the dead libraries, just write me an e-mail. But I would prefer if these libraries stayed dead, as there are much better replacements out there.

It is very likely that I will remove other deprecated libraries in the future.

Monday, March 25, 2013

LLA update

Lisp Linear Algebra was updated today. This is a major update, so I increased the version number to 0.2. Compared to the previous version, all of the changes are internal and do not affect the already existing API, with the exception of an experimental destructive BLAS interface, which was contributed by Gábor Melis.

Gábor also made the necessary changes to the internals (which were not designed for destructive updates before), and re-enabled the array pinning interface for SBCL. This means that if you arrays have the same float type, and it is one of the four supported float types (single-float, double-float, and their complex counterparts), SBCL will not copy your arrays unless it has to (eg because they would be overwritten or need to be transposed). I am grateful for Gábor's contributions.

The destructive BLAS interface is still experimental, so the API may be refined in future versions.

Wednesday, February 20, 2013

updates to Common Lisp libraries

In the past month I updated many of my Common Lisp libraries. Here is a brief summary of the changes:

cl-num-utils has been growing by accretion for a long time, which did not lead to the cleanest code. I decided to reorganize the library and split off functionally independent parts which may be useful on their own:
- functions that operate on Common Lisp arrays have been moved to array-operations,
- functions that handle array slices are now in cl-slice,
- data frames are in cl-data-frame (which is still experimental, but works fine).
The first two are meant to have very few dependencies, and in particular do not depend on cl-num-utils. I hope that cl-slice will serve as a basis for a Common Lisp API for slicing arrays and array-like objects.

Parts of cl-num-utils have been disabled for now because I didn't have time to update their unit tests and I don't like to release untested code. If you need any of those, please open an issue on Github.
lla had to be rewritten a bit since it depends on cl-num-utils. I moved special matrices from the former to the latter since they do not require foreign functions per se, and moved array stacking code to array-operations.
cl-random also depends on cl-num-utils and had to be rewritten a bit. Again, not all functionality is enabled yet, feel free to open an issue if you need something urgently. I will re-enable everything in due time, but I would like to think more about testing multivariate distributions.
Minor changes were made to cl-rmath and let-plus.

Monday, January 7, 2013

announcing trivial-project-pathname

I have been copy-pasting the same code over and over and decided to wrap it up in trivial-project-pathname, which is a very simple Common Lisp library for resolving pathnames relative to some directory, usually the sources.

The typical use case looks like this: I am working on, say, some data analysis and I have the following directory structure:


data/	where data files are kept
latex/	LaTeX sources
lisp/	Common Lisp code, including ASDF system definition
plots/	plots (potentially generated from Lisp)

I want to be able to write something like

(analyze-data-and-save-plot (project-path "mydata.csv" :data)
                            (project-path "myplot.pdf" :plots))

and have the data read from data/mydata.csv and the plot saved to plots/myplot.pdf.

This is how I would do it using this library:

(project-pathname:define project-path (:asdf "myproject" '(:relative :up))
  (:data "data/")
  (:plots "plots/"))

This finds the directory for the .asd file, takes its parent directory, and then defines a function that maps symbols to subdirectories.

Unless otherwise specified, NIL, the default for the optional parameter, would map to the base directory. You don't even need to use subdirectories, you can just

(project-pathname:define project-path2 (:asdf "myproject"))
;;; and then use
(project-path2 "my-data-file.csv")

You can also use *load-truename* to define *base-directory* in the .asd file as recommended by Zach Beane and then define

(project-pathname:define project-path (:directory *base-directory*))

Wednesday, November 7, 2012

array-operations resurrected

After reading a few papers on array languages, I decided to resurrect my array-operations library. Nope, this does not make Common Lisp into APL or Nial, but it does provide a concise DSL for array operations which should integrate well into Common Lisp.

The github page provides a quick tour of the library. The code is (or should be) portable, ANSI-conformant Common Lisp, and comes with unit tests for all functions.

The library previously available under this name (which has been dormant for 3 years) is still available, but it is deprecated.

Wednesday, October 17, 2012

cl-num-utils and utf8

I find Unicode math symbols very useful for mixing a bit of math and text — even though they don't even begin to approach the flexibility of LaTeX, all editors and browsers should be able to render them by now. When writing Lisp code, I only use them in docstrings. For example, cl-num-utils contains comments like this:

(defclass interval/infinite-left ()
  ()
  (:documentation "Left endpoint is -∞."))

Anton Vodonosov has recently reported that this causes a problem when the default external format is not utf8, also suggesting that I include an :encoding clause in the system definition. I have just pushed this change and I hope that this fixes the issue for everyone.

I appreciate the work Anton and others have put into cl-test-grid. I usually use a single implementation (the latest released SBCL), but I strive to make my code compatible with all implementations. This is much easier to achieve with cl-test-grid.

Wednesday, October 3, 2012

Chebyshev polynomials in cl-num-utils

I just pushed an update to the cl-num-utils repository, merging a branch which implements function approximation with Chebyshev polynomials.

For example,

(defun myfun (x)
  "Function that we want to approximate."
  (expt x -2))

(defparameter *myapprox* (chebyshev-approximate #'myfun (interval 2 10) 8)
  "Chebyshev approximation for MYFUN on the interval [2,10], using 8 Chebyshev
  polynomials.")

(myfun 5d0)            ; 0.04d0
(funcall *myapprox* 5) ; 0.04018023309369832d0

Approximation with Chebyshev polynomials is very fast and stable because it does not require a matrix inversion (Chebyshev polynomials are orthogonal), and thus it is a very nice procedure if you want to approximate a "smooth" function that is expensive to evaluate.

I also extended the interval code in cl-num-utils, now it handles open and infinite endpoints. This extension is experimental and not all functions understand it at the moment, but you can use my new extended-reals library to denote infinites, for example

(chebyshev-approximate #'sqrt (interval 0 (xr:inf)) 10)

would approximate $x\mapsto\sqrt{x}$ on $[0,\infty)$. I will blog about extended-reals soon.

Tuesday, October 2, 2012

updates to let-plus

I have just pushed some updates to the let-plus Github repository. The most important change is the handling of ignored values — let+ used to ignore nil, for example

(let+ (((a nil b) '(1 2 3)))   ; OLD SYNTAX, NO LONGER VALID
  ...)

would ignore the second element of the list. The implementation of let+ would simply replace nil with gensyms in the AST and declare them ignored. This caused some unexpected problems, for example in

(let+ (((a nil &optional (c nil c?)) '(1 2 3))) ; OLD SYNTAX, NO LONGER VALID
  ...)

nil is a value, not a placeholder for a variable. I could have extended let+ to work handle this, but that would have required destructuring the lambda lists internally instead of simply relying on destructuring-bind. Being lazy, I decided to change the syntax. From now on, users should use &ign to ignore variables. For example,

(let+ ((#(a &ign b) (vector 1 2 3)))
  (list a b)) ; => (1 3), no warning about the middle element not being used

The change is incompatible with the previous version (nil is no longer understood as being ignored and will most likely cause an error during macroexpansion), so you need to update your code if you are using let+. I am sorry about any inconvenience this may cause.

I also extended let+ to understand nesting in more places (when applicable, mostly for read-only forms). For example,

(let+ (((&slots-r/o ((&complex a b) bar))
        (make-instance 'foo :bar (complex 1 2))))
  (list 1 2)) ; => (1 2)

As usual, bug reports are welcome.

Thursday, January 26, 2012

cl-cairo2: new maintainer, new license

cl-cairo2 was one of the first Common Lisp libraries I wrote, but I haven't been using it much for the last year or so (currently I am experimenting with cl-pdf as a backend for my new plotting library, which will be released soon). I have been pretty busy with research, so I didn't have time to merge (and test) patches, also, I didn't even contemplate updating the library to make use of the latest version of cairo. So when Ryan Pavlik contacted me about adding compatibility with cl-freetype2, I asked him whether he wants to take over as a maintaner. He kindly agreed, so I have transferred the repository to him, and he already merged a lot of patches.

One last thing that I wanted to fix before handing the repository over is the license. Originally, the library was licensed under the GPL — in retrospect, I think that

I was very naive about software licenses, and didn't really understand what GPL means in the context if Common Lisp (I still don't claim that I do :-P), and
I didn't realize that there are a lot of commercial applications in the Common Lisp world, whose authors are wary of depending on GPL'ed libraries.

Consequently, I received many complaints about the license of the library, and decided to change it. I picked the Boost Software License, and contacted all who contributed to the library for permission. All contributors approved the change, so now the library has a simple, permissive non-copyleft free software license. However, it is always possible that I missed someone, so if you contributed to cl-cairo2 in the past but didn't hear from me regarding the license change, please get in touch (eg via the Github issue tracker).

I would like to thank (in alphabetical order) Ala'a Mohammad Alawi, Jay Bromley, Pau Fernández, Johann Korndoerfer, Peter Mikula, and especially Kei Suzuki (who did the last major reorganization) for their contributions to the library (again, I apologize if I missed anyone). I would also like to thank Ryan for taking over — I am convinced that the library is in good hands.

Tuesday, November 1, 2011

Color schemes to prevent eye fatigue

I have been sitting in front of the computer quite a bit in the past few weeks, and I noticed that my eyes get very tired by the end of the day (I used to get back pain too, but stretching helped a lot). After a bit of reading I learned that the maximum-contrast white-on-black-based color scheme I was using is in fact extremely stressful for the eyes, and I started looking for something designed for the specific purpose of reducing eye fatigue. After checking out Zenburn for a few days, I started using Solarized by Ethan Schoonover. I find it really nice and relaxing, here is a screenshot of Emacs editing Common Lisp code:

New and updated libraries

Lately I have been trying to be more organized about updates to my libraries: I always try to keep the master branch on Github in a consistent state for all of them, and merge the development branches for all libraries at the same time. I have been doing that yesterday, here is a brief summary of the changes.

cl-num-utils mostly received minor fixes: I cleaned up the accumulator code a bit, and added a few minor convenience functions. The package now exports median, which causes a symbol conflict with alexandria:median so you need to shadow if you are using both. Maybe I should ask the developers of Alexandria to make mean, variance and median generic.

LLA underwent a medium-scale rewrite: I sanitized the type system (internal types are no longer exposed), cleaned up object creation functions (CLO replaced with HERMITIAN, LOWER, etc), and wrote specialized routines for array sharing which are about 30 times faster than they used to be — this of course doesn't mean that LLA is now 30x faster, just that they overhead for sharing arrays got a lot smaller. Since many operations are $O(n^\alpha), \alpha \geq 2$, this is mostly noticeable for small arrays. Note that this is still generic CFFI-based code, preliminary tests indicate that implementation-specific tricks could yield array copying code that is 2-5x faster, but currently I have neither the time nor the inclination to write this, because I would rather wait for other parts of LLA to settle down before I micro-optimize. However, if you find that array copying overhead is critical for you, please let me know and I will see what I can do — this rewrite was prompted by the benchmarks sent by Robert Brown.

Realizing that implementing special functions in Common Lisp is perhaps not the best use of my time, I decided to use a foreign library for this purpose in cl-random. I chose R's standalone math library, with an SWIG-generated wrapper (it took about 30 minutes to have this library running, SWIG is amazing). I made the result available as a small standalone library called cl-rmath in case anyone else wants to use it. I really appreciate that the R developers made this available as a standalone library, it makes my life much easier. Besides these changes, I only made small changes to cl-random, for example, the univariate normal now has the syntax (r-normal mean variance) instead of the standard deviation, to be more consistent with the multivariate case.

Finally, it looks like cl-2d refuses to die, so I fixed some of the minor bugs that were the result of other libraries moving on. I am working on a replacement library, but I will probably keep cl-2d updated until the new library is ready. It amazes me how many people are using cl-2d: I get questions and bug reports all the time. It is not a bad library, but I wrote it when I was a CL newbie and I hope that I can write a much better one — we shall see. In any case, if you are not using cl-2d, I would recommend that you don't start now.

Tuesday, September 6, 2011

Is anyone using cl-2d?

I wrote a Common Lisp plotting library called cl-2d in 2009, and I have been using it for my own graphs ever since. I am currently working on a library that implements the same functionality but with a much more friently interface and a different engine, using LaTeX and pgf to generate the plots instead of cairo. This gives me a lot of useful features out of the box, including very nicely formatted formulas if I want to use them in annotations, and it is reasonably fast for my purposes.

I plan to remove the existing cl-2d repository from Github, put up a new one (it is a complete rewrite from scratch, there isn't much sense in carrying around the old source), and recycle the name (for both the package and the library).

I don't think that there is anyone out there using the current version of this library, but in case you are, please let me know soon.

Friday, September 2, 2011

Persistent misconceptions about Lisp

I am in the process of implementing (yet another) graphing library Common Lisp, one that would conform to my requirements better (I will blog about it later on). In order to understand some issues better, I decided to read The Grammar of Graphics by Leland Wilkinson. The book is very well written and clearly organized, and I think it will turn out to be very useful for what I am doing. However, I encountered a couple of misconceptions about Lisp in the Introduction. On page 4, the author claims that

Some languages, like Simula, Smalltalk, and Java, make it easy to implement objects and difficult to implement procedures. Others, like C++, enable development of both objects and procedures. And others, like C, Pascal, Ada, BASIC, Algol, Lisp, APL, and FORTRAN, make it difficult (but not impossible) to develop objects.

O RLY? Most users of CLOS find both objects and "procedures" very easy to use, and at least on par with other OO languages. Moreover, I don't understand how Lisp ended up in the company of C, which doesn't have any OO whatsoever.

On page 5, we are told that

Lisp is ideal for manipulating lists of words and symbols because it is a list processing language.

Strictly speaking, this is true, but Lisp is much more than a list processing language: it is also used to tackle numerical problems and usually does a very good job. Compiled code can be very fast, especially with implementations like SBCL, and Lisp has a very nice foreign function interface library.

That said, from what I have read so far, the book appears to be excellent, and the author is clearly an expert on the topic. Statements about programming languages are tangential to the book in any case, so I don't understand why the author found it necessary to talk about a language that he is not so familiar with.

Thursday, July 21, 2011

Library configuration in Common Lisp: an example

I have been dodging the issue of configuration for LLA for a while, but eventually I had to come up with a solution for the revised version that I pushed to Github yesterday.

LLA users may need to configure two things: the location and name of libraries that they want to use, and whether LLA should use 64-bit integers to interface with BLAS/LAPACK (this is rarely needed, even on 64-bit platforms, as many implementations still use 32-bit integers). In the previous versions of LLA, I tried to do this with read-time conditionals (based on trivial-features) but this was not satisfactory as it is impossible to figure out the list of libraries from nothing but the platform information — for example, a Linux user could use ATLAS or MKL.

I read Maintaining Portable Lisp Programs by Christophe Rhodes and also asked on cl-pro, where I got a lot of good suggestions and decided to follow the advice of Pascal Costanza. This is how configuration works at the moment for LLA:

The user may define a variable *lla-configuration* in the CL-USER package. The variable should contain a plist of configuration options, eg :libraries. The variable does not need to be bound, and it can be NIL or incomplete: the user only needs to deal with configuration when he wants to override the defaults of the library. LLA makes an effort to come up with sensible platform-specific defaults.
When loaded/compiled, LLA checks whether cl-user::*lla-configuration* is bound, and if it is, it uses the corresponding values. If the variable is not bound or doesn't contain the desired property, a default value is used instead.

Internally, some features are implemented by pushing symbols like lla::int64 to *features*. I have do admit that I didn't even consider the possibility of package qualifiers in read-time conditionals before I read the appropriate sections of the Hyperspec, but in retrospect it makes perfect sense as it helps to avoid name clashes. LLA also removes its own symbols from *features* if they don't belong there: this means that you can reload the library with a different configuration without restating your CL image.

Wednesday, July 20, 2011

LLA reorganization almost finished, new version on Github

I have almost completed the reorganization of LLA and merged it to the main branch on Github (sorry, no installation via Quicklisp until the library is fully stable, for the moment you have to check it out from Github). Make sure that you get the latest dependencies, most importantly cl-num-utils and let-plus.

This version of LLA boasts a reorganized interface: most importantly, now it works with plain vanilla Common Lisp arrays:

LLA> (defparameter *a* #2A((1 2) (3 4)))
*A*
LLA> (mm t *a*) ; same as A^T A, but LLA recognizes that the result is Hermitian
#<HERMITIAN-MATRIX 
  10.00000        *
  14.00000 20.00000>
LLA> (elements *)  ; Lisp arrays inside all special matrices
#2A((10.0d0 0.0d0) (14.0d0 20.0d0))
LLA> (svd *a* :thin)
#S(SVD
   :U #2A((-0.40455358483375703d0 0.9145142956773045d0)
          (-0.9145142956773045d0 -0.40455358483375703d0))
   :D #<DIAGONAL 
  5.46499       .
        . 0.36597>
   :VT #2A((-0.5760484367663207d0 -0.817415560470363d0)
           (-0.817415560470363d0 0.5760484367663208d0)))
LLA> (elements (svd-d *)) ; Lisp arrays again
#(5.464985704219043d0 0.3659661906262576d0)

LLA uses clever tricks to avoid transposing when it can. Row-major and column-major representations are transposes of each other, and instead of calculating an SVD as $$A=UDV^T$$ LLA justs calculates $$A^T={V^T}^T D^T U^T$$ and returns $U$ as $V^T$ and vice versa. Of course transposing cannot be avoided all the time, but it is not needed often and should not impose a large performance penalty. So LLA can work with row-major Common Lisp arrays natively — in fact, that's the only array representation it supports.

This version of LLA has a reorganized DSL for BLAS/LAPACK calls which makes it really easy to wrap Fortran functions in general, not just LAPACK/BLAS functions for which it has extensions. For example, this is how Cholesky factorization is implemented:

(defmethod cholesky ((a hermitian-matrix))
  (let+ ((a (elements a))
         ((a0 a1) (array-dimensions a)))
    (assert (= a0 a1))
    (lapack-call ("potrf" (common-float-type a)
                          (make-instance 'cholesky :left-square-root 
                                         (make-lower-triangular-matrix l)))
                 #\U (&integer a0) (&array a :output l) (&integer a0) &info)))

The first two lines are just bookkeeping: the array A containing the elements of the hermitian (symmetric) matrix (in the lower triangle) is extracted, and its dimensions are examined. LAPACK-CALL figures out whether to use SPOTRF, DPOTRF, CPOTRF, ZPOTRF from the element type of the matrix, allocates memory for atoms (recall that Fortran takes pointers) using the macros, and automatically examines the INFO parameter at &info to catch exceptions. The #\U in the code suggests that storage is upper-triangular, even though in CL we store elements of the hermitian matrix in the lower triangle: this is because we avoid transposing. I find that it is really easy to write wrappers to Fortran code using this macro family (there is also a BLAS-CALL and a LAPACK-CALL-W/QUERY that queries workspace sizes), and it should be easy to write a FORTRAN-CALL for general Fortran code, or even a wrapper for C functions that handle arrays. Let me know if you are interested in any of these, I would be happy to add them or even factor out this part of LLA into another library.

By the way, contrary to what my previous post on LLA said, you no longer need the C wrappers in MKL or CLAPACK, so ATLAS or something similar is just fine. The README has instructions on how to set up these libraries (unless you want to compile your own, a single apt-get install or similar is enough to satisfy LLA's dependencies) and configure custom library locations.

Even though LLA is now perfectly usable and all the unit tests work just fine, there are still some things left to do:

Write the array implementation-specific sharing macros.: LLA uses a few macros to make arrays available for foreign functions, and all the wrappers are built around these. The idea is to avoid copying if the implementation supports it, and otherwise fall back to copying. I decided to focus my efforts on getting LLA to work again first, and this means that currently all operations fall back to copying, which should entail a slight performance penalty. I will of course remedy this at some point, first for SBCL and ECL and then for the other distributions, but I am waiting for the rest of LLA to stabilize before I start fiddling with low-level optimizations.
Eigenvalues/eigenvectors are not yet implemented.: LAPACK's handling of eigenvalues and especially eigenvectors is a mess, and I need to dust off the code that picks out paired eigenvectors. It should be fairly easy, but I prefer not to add functionality without thorough testing, so currently I am postponing this until someone needs it (just drop me a line).

Also, even though I wrote a lot of unit tests, it is of course possible (and probable) that LLA has bugs. Please report them on Github.

I would like to thank all users of LLA for helpful suggestions and for the gentle nudging I received during LLA's reorganization. I believe that LLA fills an important niche, and I will continue to work on it.

Thursday, July 14, 2011

Why I only buy e-books from indie publishers

Nikodemus Siivola blogged about the $2 surcharge Amazon extracts from Kindle customers outside the US. I use my Kindle every day, and I buy e-books to read on a regular basis, spending around $10-30 a month (depending on how much time I have for reading). I have owned a Kindle for almost a year now, but I didn't know about this surcharge: I guess the reason for this is that none of the money I spent on e-books went to Amazon.

There are two reasons for this: their non-competitive pricing and DRM. First, I am reluctant to pay more for an e-book than I would for a paperback, yet I frequently saw e-books on Amazon that cost more than their paperback versions. I understand that Amazon is trying to extract economic rents using its strong position in this market, but I would feel like an idiot if I participated in this scheme.

Regarding DRM: I like my Kindle very much, but I think that in the long run e-book readers will become even more of a commodity (like laptops or cell phones), and I don't wish my e-book collection to be tied to Amazon. Call me old-fashioned, but I do want to own the content I pay for, especially if I paid almost as much as I would pay for a hardcopy. So I only buy DRM-free books.

I mostly read literature on my Kindle. I buy quite a bit of sci-fi from Baen books — they also have a lot of books available for free, knowing full well that good sci-fi is addictive for some people anyway, but they don't exploit this: most of their books sell for $5-6. I haven't head a chance to check out Calibre's DRM-free book store, but it looks very promising.

I think that unless Amazon changes its pricing policy and considers offering DRM-free content, there is very little chance that I will buy anything from them. If you know some indie authors or publishers who sell reasonably priced DRM-free e-books, I would be interested in hearing about them.

Tuesday, July 12, 2011

LinkedIn: an obnoxious spam engine

I have to admit that I don't see the point of so-called social networking sites, but I recognize the possibility that some people may find them useful, and I am just not one of them. Nevertheless, I still fail to see why random people I have never met want to add me as a contact. I get invitations from these people on LinkedIn almost daily, and to make it worse, LinkedIn then sends me reminders about them. Note that I am not even registered on their damn site. I don't have a profile, and I have no intention of creating one.

Which presents another problem: I couldn't find a way of opting out from these invitations without creating a profile. Clever, isn't it? They send you spam, and then they are trying to sucker people into registering to stop spam. I could just start marking these messages spam, but I think that my spam filter would take a while to learn this, and in any case, I don't want to add noise to the Bayesian model just to create false positives for relevant e-mail. The cleanest solution I found is putting the whole linkedin.com domain on my blacklist.

I want to emphasize that this is a rare distinction: with spam filters getting so clever (thank you, Gmail!), I haven't had to manually blacklist a sender in years. Congratulations are in order: dear LinkedIn, you are a really obnoxious engine for spam! I sincerely hope that you vanish to oblivion soon.

Monday, July 11, 2011

Goodbye Ubuntu, hello Debian

My road to Ubuntu

I have started using Linux around 1998. I remember that I wanted to try out several distributions before settling on the one the "best" one, but I had a dial-up connection (28kbps, if I remember correctly) and downloading all of them seemed infeasible, so I mail-ordered a set of CDs that contained Red Hat, Slackware and Debian. I think I decided on Debian because I liked the clean and transparent package management system, especially APT which seemed way ahead of other distributions at that time. Linux became my primary OS within a few months: even though it didn't have a fraction of the tools that are available today, it had everything I needed.

A lot happened in the Linux world since then. I don't know when (if?) Linux "conquered" the desktop, but in 2003 it became mature enough that I installed Linux (Debian, of course) on my parent's desktop. Neither of my parents are very computer-savvy, but they also found Linux much nicer than Windows. I think what mattered the most to them is its stability: I would update the software once in a while, test that everything was working (and fix it when it didn't), but in the meantime, the system was rock-solid. This was important as I no longer lived in the same city, so I couldn't just drop by to fix computer problems on a short notice, which I had to do with Windows.

When they got another computer (I think it was around 2005), I replaced Debian with Ubuntu, figuring that it would provide a more polished user experience, and I was pleased with the result. I installed Ubuntu on all computers that family members asked me to help with, and in 2009, I installed Ubuntu on a new laptop I got. It was a bit more polished than Debian, especially when it came to setup things like printers and network drives. I knew how to do those things via the relevant config files, but I appreciated the GUI.

Why I moved back to Debian

I think that I like Linux because it allows me to concentrate on my work: I came to accept that sometimes setting things up required a bit of tweaking, but once I configured something, it would work exactly the way I liked it. I welcome changes if they make sense (eg a feature extension), but I don't see the point of gratuitous ones. In retrospect, I should have considered the moving of window buttons a warning sign, especially when Mark Shuttleworth refused to move them back despite popular demand. The fix was easy, but the whole move was totally unnecessary and resulted in a lot of confusion: until I fixed it (I really did give it a try for a few days), my desktop just felt like a car with the gas and brake pedals interchanged.

The controversy surrounding Unity was the real eye-opener for me. I dist-upgraded Ubuntu on my parents current computer (a perfectly good desktop for office work, around 2 years old) and was informed that the system doesn't have the resources to run the damn thing. I tried it out on my laptop briefly, and realized that all that glitz and glitter is unnecessary and a waste of resources (most importantly battery power), and Ubuntu was trying to force yet another gratuitous UI change on its users. I backed up my data, wiped my HD, and within an hour, I was back to good old Debian. Which still rocks.

Why Debian is perfect for me

In order to answer that question, here is a sample of the desktop software that I use (of course I use a lot of other programs, eg Git or SBCL, but there is no purpose in enumerating all of them):

desktop environment: XFCE (with some Gnome applets in the panel)
e-mail: mutt, offlineimap, msmtp
shell: zsh, running under screen in yakuake (with the Tango color scheme — I got used to that from Gnome)
editing: emacs, TeX Live, AUCTeX, org-mode, …
browser: chrome
password manager: Revelation
newsfeeds: Liferea

This is a pretty eclectic mix with no overall GUI concept — for example, it has programs from Gnome and KDE — but I have come to like each of these programs and they work very well for me. Debian testing has pretty recent releases for all of them, and more importantly, the idea of Debian developers forcing any particular choice on how I set up my desktop never even comes up at any point. They provide excellent software, take care of packaging new releases and fixing bugs, but I doubt that any of them would even entertain the idea of pushing some wacky desktop du jour concept on their users. I think that they have better things to do with their time, including maintaining the best Linux distro out there, and that's the way I like it.

The icing on the cake

Compared to my previous desktop (which was plain vanilla Gnome, Ubuntu/Maverick), the above setup appears to be much less resource intensive. I have plenty of RAM and CPU power, but I noticed that my laptop gives me about 20% more battery time than it used to. I cannot help but love Debian. It is good to be back!