(defblog exordium) Emacs and Lisp musings

ELisp crash course

This is the second article in a series about Lisp; it assumes you read the first one.

If you use Emacs but don’t know Lisp, you are missing a lot: Emacs is infinitely customizable with Emacs Lisp. This post is an introduction to ELisp, hopefully giving you enough basics to write useful functions. Today we will mostly focus on the language itself, as opposed to the gazillion of Emacs-specific APIs for editing text.

My goal is not to review every function of the language: it would take a book to do so. My goal instead is to give a good high-level overview of Elisp. If you find yourself looking for a function or variable, you can browse the Emacs elisp site or you can use M-x apropos which displays anything that matches a given string.

Let’s start with…

The Basic Types

Strings are double quoted and can contain newlines. Use backslash to escape double quotes:

"This is an \"Emacs verse\"
Forgive it for being so terse"

A string is a sequence of characters. The syntax for a character is ?x: question mark followed by character. Some need to be escaped, like for example ?\(, ?\) and ?\\.

There are many functions operating on strings, like for example:

(length "foo")              ; returns 3 (also works with lists)
(concat "foo" "bar" "baz")  ; returns a new string "foobarbaz"
(string= "foo" "foo")       ; string comparison
(substring "foobar" 0 3)    ; returns "foo"
(upcase "foo")              ; returns "FOO"

Note that none of these functions have any side effect, as it is the case with most functions in Lisp - they are pure functions. They create a new object and return it.

Integers have 29 bits of precision (I don’t know why) and doubles have 64 bits. Binary starts with “#b”, octal with “#o” and hexadecimal with “#x”.

The most useful data structure in Lisp is a list, but the language also has arrays, hash tables, and objects. An array is called a vector, and you can create one like so: [ "the" "answer" "is" 42]. Like lists, they can contain objects of various types. You use spaces to separate the values; comas are part of the Lisp syntax but they are used for something else as we will soon see.

Quote

The quote is a special character in the Lisp syntax that prevents an expression from being evaluated. For instance:

'a               ; => a
'(a b c)         ; => (a b c) equivalent to (list 'a 'b 'c)

The quote prevents the evaluation of the symbol “a” on the first line, and the list on the second line, otherwise they would be considered as a variable and a function call respectively.

The backquote is like a quote, except that any element preceded by a coma is evaluated. The backquote is very handy for defining macros, e.g. functions that generate code. For example:

`(1 ,(+ 1 1) 3)  ; => (1 2 3)

Variables

Lisp is a dynamically-typed language, like Ruby or Python and unlike Java or C++. You don’t need to declare the type of a variable, and a variable can hold objects of different types over time.

We already saw in the previous post how to declare a global variable with defvar and set it with setq. Another way to use variables is function parameters:

(defun add (x y)
  (+ x y))

(message "%s + %s = %s" 1 2 (add 1 2)) ; prints "1 + 2 = 3"

Here we define a function add with 2 arguments, which returns the sum of its arguments. Then we call it. message is an Emacs function similar to C’s printf: it prints a message in the mini-buffer and in the messages buffer1.

Every time you call add, Lisp creates new bindings to hold the values of x and y within the scope of the function call. A single variable can have multiple bindings at the same time; for example the parameters of a recursive function are rebound for each call of the function.

The let form declares local variables. The syntax is (let (variable*) body) where each variable is either a variable name, or a list (variable-name value). Variables declared with no value are bound to nil. For example:

(let ((x 1)
      y)
  (message "x = %s, y = %s" x y)) ; prints "x = 1, y = nil"

The scope of the variable bindings is the body of the let form. After the let, the variables refer to whatever, if anything, they referred to before the call to let. You can bind the same variable multiple times:

(defun foo (x)
  (message "x = %d" x)       ; x = 1
  (let ((x 2))
    (message "x = %d" x)     ; x = 2
    (let ((x 3))
      (message "x = %d" x))  ; x = 3
    (message "x = %d" x))    ; x = 2
  (message "x = %d" x))      ; x = 1

;; Check the Messages buffer to see the results
(foo 1)

Note that let binds variables in parallel and not sequentially. That means that you cannot declare a variable whose value depends on another variable declared in the same let. For example this is wrong:

(let ((x 1)
      (y (* x 10)))
  (message "x = %s, y = %s" x y)) ; error: variable x is void

There are two ways to fix the code above: you could use a second let within the first, or you could replace let with let*: it binds variables sequentially, one after the other. The key to understand that is to remember that the origin of Lisp is the Lamda Calculus, where everything is a function call. The first let form above is equivalent to calling an anonymous function like this:

;; equivalent to (let ((x 1) y) (message ...))
;; prints "x = 1, y = nil"
((lambda (x y)
   (message "x = %s, y = %s" x y))
 1 nil)

Here we define a lambda (anonymous) function with 2 arguments, and we call it with the values of the arguments. The syntax of a lambda is (lambda (arguments*) body), and we call it like any other function by putting it in a second pair of parentheses with the arguments.

The equivalent of a let* requires multiple function calls:

;; equivalent to (let* ((x 10) (y x)) (message ...))
;; prints "x = 1, y = 10"
((lambda (x)
   ((lambda (y)
      (message "x = %s, y = %s" x y))
    (* x 10)))
 1)

The first lambda binds x to 1 and the second lambda binds y to x * 10.

Conditions

In ELisp and Common Lisp, nil is used to mean false, and everything that is not nil is true, including the constant t which means true. Therefore a symbol is true, a string is true and a number is true (even 0). nil is the same as () and it is considered good taste to use the former when you mean false (or void) and the latter when you mean empty list. Note that Clojure and Scheme treat boolean logic differently: for them the empty list and false are different things.

Let’s start with simple boolean functions. not returns the negation of its argument, so that (not t) returns nil and vice versa. Like most functions in Lisp, and and or can take any number of arguments. and returns the value of the last argument that is true, or nil if it finds an argument that is not true. or returns the value of the first argument that is true, or nil if none of them are true. For example:

(and 0 'foo)      ; => 'foo => true
(and 0 nil 'foo)  ; => nil  => false
(or 0 nil 'foo)   ; => 0    => true

You can compare for equality using = for numbers, string= for strings or eq for same address in memory. There is also a generic equal function that tests if the objects are equal no matter what type they are, so that’s the only one you need to remember.

(if then else*) is a special form that is equivalent to C’s ternary operator ?:. It must have at least a then form and can only have one. It may have one or more else forms. It returns the value of the then form or the value of the last else form. For example:

(defun max (x y)
  (if (> x y) x y))

(max 1 2)  ; => 2

If you just want a then or an else, it is better to use when and unless because they can have multiple then or else forms. They return the value of the last form or nil. Here is an example:

;; Predicate (-p) testing if today is a Friday.
;; current-time-string returns a string like "Fri Jul 10 10:52:44 2015".
;; string-prefix-p is a predicate testing for a string prefix.
(defun today-is-friday-p ()
  (string-prefix-p "Fri" (current-time-string)))

(when (today-is-friday-p)
  (message "Yay Friday!")
  (message "I love Fridays"))

(unless (today-is-friday-p)
  (message "It's not Friday")
  (message "Oh well"))

Finally cond is like a super-charged version of C’s switch/case: it chooses between an arbitrary number of alternatives. Its arguments are a collection of clauses, each of them being a list. The car of the clause is the condition, and the cdr is the body to be executed if the condition is true (the body can have as many forms as you like). cond executes the body of the first clause for which the condition is true. For example:

(defun guess-what-type (x)
  "Guesses the type of x"
  (cond ((numberp x)
         (message "x is a number"))
        ((stringp x)
         (message "x is a string")
         (when (> (length x) 10)
           (message "It has more than 10 characters")))
        ((listp x)
         (message "x is a list"))
        ((symbolp x)
         (message "x is a symbol"))
        (t
         (message "I don't now what x is"))))

(guess-what-type 10)                ; x is a number
(guess-what-type "hello")           ; x is string
(guess-what-type '(a b))            ; x is a list
(guess-what-type nil)               ; x is a list
(guess-what-type 'foo)              ; x is a symbol
(guess-what-type :foo)              ; x is a symbol
(guess-what-type (current-buffer))  ; I don't know what x is

The code above uses predicates like numberp which returns t if the argument is a number. The function current-buffer returns a buffer object which is neither a number, string, list or symbol (it is an instance of a class). Notice the last clause: the condition is t which is obviously always true. This is the “otherwise” clause guaranteed to fire if everything else above has failed.

Loops

The simplest loop is a while:

(let ((i 10))
  (while (> i 0)
    (message "%d" i)
    (setq i (- i 1))))  ; you can also use macro (decf i)

dotimes takes a variable and a count, and sets the variable from 0 to count - 1:

(dotimes (i 5) (message "%d" i))

dolist takes a variable and a list, and sets the variable to each item in the list:

(dolist (i '(a b c)) (message "%s" i))

If you need anything more complicated, take a look at the documentation of the loop macro. This is a very powerful macro with lot of options that takes an (almost) English sentence as argument and generates what you mean. For example, a C “for” loop can be expressed like so:

(loop for i from 1 to 10
      do (message "i = %d" i))

Another example is the following code which iterates over a “plist” (property list) which is a collection like (key1 value1 key2 value2) using cddr to move by 2 items at a time and skipping the properties where the key is an even number:

(let ((l '(1 "one" 2 "two" 3 "three")))
  (loop for (key value) on l by #'cddr
        unless (evenp key)
        do (message "%d is %s in English" key value)))
;; 1 is one in English
;; 3 is three in English

Elisp also has exceptions, try/catch/finally and anything else you would expect.

Functions

Lisp uses several keywords for declaring arguments within a defun.

  • &optional introduces optional arguments, which if not specified are bound to nil. For example (defun foo (a b &optional c d) ...) makes c and d optional.
  • &rest takes all remaining arguments and concatenates them into a list. For example the signature of the list function is simply (&rest objects).
  • &key introduces a keyword argument, that is an optional argument specified by a keyword with a default value of your choice. For example:
(defun foo (&key (x 0) (y 0))
  (list x y))

(foo)            ; => (0 0)
(foo :x 1)       ; => (1 0)
(foo :y 2)       ; => (0 2)
(foo :x 1 :y 2)  ; => (1 2)

Functions are first class objects in Lisp. You can store them in a variable and call them later. For example:

(setq f #'list)       ; shortcut for (function list).
(funcall f 'a 'b 'c)  ; => (a b c)

The syntax #'foo is sugar for (function foo) which returns the definition of the function stored in symbol foo. It basically returns a pointer to the code. funcall calls the function with a given list of arguments. Note that Emacs is very tolerant and (setq f 'list) (e.g. setting f to the symbol “list”) will also work.

apply works like funcall but it applies the function to a list of arguments:

(apply #'+ '(1 2 3))  ; => 6

An interesting example of using apply is mapcar which applies a function to each element of a list and returns a list of the results:

(mapcar #'(lambda (x) (* 10 x))
        '(1 2 3))  ; => (10 20 30)

Interactive functions

Let’s use our fresh knowledge to do something useful.

Sometimes I want to include a separator in a comment, e.g. a sequence of dashes or tilde that fills up the rest of the line until the 80 character column (the fill-column variable defines that limit). For example, if I type “// Begin of test” I want a magic key to do this:

// Begin of test~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Elisp functions must be declared “interactive” if you want to call then using Meta-x or bind them to a key. You do this declaration by calling the interactive special form (it’s not a function) as the first form in the body of your function.

(defun insert-separator ()
  "Inserts a sequence of ~ at the end of the line, up to the
fill column"
  (interactive)
  (end-of-line)
  (let ((num-chars (- fill-column (current-column))))
    (dotimes (i num-chars)
      (insert ?~))))

;; Bind this function to C-c ~
(global-set-key [(control c)(~)] #'insert-separator)

end-of-line move the cursor to the end of the line, as you probably guessed. The let form calculates the number of characters to insert before it reaches the end of the line using the variable fill-column (which should be set to 79) and the current-column function which returns the cursor’s column number. The insert function inserts a character or string at the position of the cursor. Finally global-set-key binds the function to a key chord. Note that this is a simple implementation; it might be more efficient to create a string with n characters using (make-string num-chars ?~).

Let’s write another one. Suppose you work in an organization that has created its own code style, and suppose that said code style proclaims that lines longer than 80 characters are a cardinal sin. Believe me, such code styles do exist. So let’s write an interactive function that will find the next “long” line in the current buffer, from the position of the cursor. It could look like this2:

(defun goto-long-line (len)
  "Go to the first line that is at least LEN characters long.
Use a prefix arg to provide LEN.
Plain `C-u' (no number) uses `fill-column' as LEN."
  (interactive "P")
  (setq len (or len fill-column))
  (let ((start-line (line-number-at-pos))
        (len-found  0)
        (found      nil))
    (while (and (not found) (not (eobp)))
      (forward-line 1)
      (setq found (< len (setq len-found
                               (- (line-end-position) (point))))))
    (if found
        (when (called-interactively-p 'interactive)
          (message "Line %d: %d chars" (line-number-at-pos) len-found))
      ;; Compiler-happy equivalent to (goto-line start-line):
      (goto-char (point-min))
      (forward-line (1- start-line))
      (message "Not found"))))

This interactive function takes a numeric argument which is the max length of lines. The “P” string in the call to interactive specifies that we use an argument (in raw form; see the documentation of interactive for details). Either the user invokes this function with M-x goto-long-line, in which case the argument len is set to nil, or she invokes the function with C-u 7 9 M-x goto-long-line, in which case the argument len is set to 79 (for instance). The first setq line is used to set a default value to len: either it is the number that the user specified or it is the value of variable fill-column.

Without going into too much details, the rest of the code is a while loop until we have found a line or we reached the end of the buffer (predicate eobp). At each step we go down one line (forward-line) and we check the length of the line. Note that the Emacs function point returns the position of the cursor as an offset into the file (the current character number if you will). Our function is designed to be called both interactively and within a program, so it tests how we are called using predicate called-interactively-p before deciding to print a message or not. point-min returns the position of the first character in the buffer (should be 1) and goto-char goes to a given character position.

Note that sometimes the compiler complains when you call a function that is designed to be used interactively in your code (these functions are marked as such using a property). Usually the warning says you should use another function, supposedly more efficient because doing less tests.


That’s it for today. Lots more to come. Stay tuned!

  1. A right click in the mini-buffer pops up the message buffer. That’s a nice trick for debugging if you have a lot of traces. 

  2. It is defined in Exordium