This is the second article in a series about Lisp; it assumes you read
the first one.
If you use Emacs but don’t know Lisp, you are missing a lot: Emacs is
infinitely customizable with Emacs Lisp. This post is an introduction to ELisp,
hopefully giving you enough basics to write useful functions. Today we will
mostly focus on the language itself, as opposed to the gazillion of
Emacs-specific APIs for editing text.
My goal is not to review every function of the language: it would take a book
to do so. My goal instead is to give a good high-level overview of Elisp. If
you find yourself looking for a function or variable, you can browse the
Emacs elisp site
or you can use M-x apropos which displays anything that matches a given
string.
Let’s start with…
The Basic Types
Strings are double quoted and can contain newlines. Use backslash to escape
double quotes:
A string is a sequence of characters. The syntax for a character is ?x:
question mark followed by character. Some need to be escaped, like for example
?\(, ?\) and ?\\.
There are many functions operating on strings, like for example:
Note that none of these functions have any side effect, as it is the case with
most functions in Lisp - they are pure functions. They create a new object and
return it.
Integers have 29 bits of precision (I don’t know why) and doubles have 64
bits. Binary starts with “#b”, octal with “#o” and hexadecimal with “#x”.
The most useful data structure in Lisp is a list, but the language also has
arrays, hash tables, and objects. An array is called a vector, and you can
create one like so: [ "the" "answer" "is" 42]. Like lists, they can contain
objects of various types. You use spaces to separate the values; comas are part
of the Lisp syntax but they are used for something else as we will soon see.
Quote
The quote is a special character in the Lisp syntax that prevents an expression
from being evaluated. For instance:
The quote prevents the evaluation of the symbol “a” on the first line, and the
list on the second line, otherwise they would be considered as a variable and a
function call respectively.
The backquote is like a quote, except that any element preceded by a coma is
evaluated. The backquote is very handy for defining macros, e.g. functions that
generate code. For example:
Variables
Lisp is a dynamically-typed language, like Ruby or Python and unlike Java or
C++. You don’t need to declare the type of a variable, and a variable can hold
objects of different types over time.
We already saw in the previous post how to declare a global variable with
defvar and set it with setq. Another way to use variables is function
parameters:
Here we define a function add with 2 arguments, which returns the sum of its
arguments. Then we call it. message is an Emacs function similar to C’s
printf: it prints a message in the mini-buffer and in the messages
buffer1.
Every time you call add, Lisp creates new bindings to hold the values of
x and y within the scope of the function call. A single variable can have
multiple bindings at the same time; for example the parameters of a recursive
function are rebound for each call of the function.
The let form declares local variables. The syntax is (let (variable*) body)
where each variable is either a variable name, or a list (variable-name
value). Variables declared with no value are bound to nil. For example:
The scope of the variable bindings is the body of the let form. After the
let, the variables refer to whatever, if anything, they referred to before
the call to let. You can bind the same variable multiple times:
Note that let binds variables in parallel and not sequentially. That means
that you cannot declare a variable whose value depends on another variable
declared in the same let. For example this is wrong:
There are two ways to fix the code above: you could use a second let within
the first, or you could replace let with let*: it binds variables
sequentially, one after the other. The key to understand that is to remember
that the origin of Lisp is the
Lamda Calculus, where
everything is a function call. The first let form above is equivalent to
calling an anonymous function like this:
Here we define a lambda (anonymous) function with 2 arguments, and we call it
with the values of the arguments. The syntax of a lambda is (lambda
(arguments*) body), and we call it like any other function by putting it in a
second pair of parentheses with the arguments.
The equivalent of a let* requires multiple function calls:
The first lambda binds x to 1 and the second lambda binds y to x * 10.
Conditions
In ELisp and Common Lisp, nil is used to mean false, and everything that is
not nil is true, including the constant t which means true. Therefore a
symbol is true, a string is true and a number is true (even 0). nil is the
same as () and it is considered good taste to use the former when you mean
false (or void) and the latter when you mean empty list. Note that Clojure and
Scheme treat boolean logic differently: for them the empty list and false are
different things.
Let’s start with simple boolean functions. not returns the negation of its
argument, so that (not t) returns nil and vice versa. Like most functions in
Lisp, and and or can take any number of arguments. and returns the value
of the last argument that is true, or nil if it finds an argument that is not
true. or returns the value of the first argument that is true, or nil if none
of them are true. For example:
You can compare for equality using = for numbers, string= for strings or
eq for same address in memory. There is also a generic equal function that
tests if the objects are equal no matter what type they are, so that’s the only
one you need to remember.
(if then else*) is a special form that is equivalent to C’s ternary operator
?:. It must have at least a then form and can only have one. It may have
one or more else forms. It returns the value of the then form or the value
of the last else form. For example:
If you just want a then or an else, it is better to use when and unless
because they can have multiple then or else forms. They return the value of
the last form or nil. Here is an example:
Finally cond is like a super-charged version of C’s switch/case: it chooses
between an arbitrary number of alternatives. Its arguments are a collection of
clauses, each of them being a list. The car of the clause is the condition,
and the cdr is the body to be executed if the condition is true (the body can
have as many forms as you like). cond executes the body of the first clause
for which the condition is true. For example:
The code above uses predicates like numberp which returns t if the argument
is a number. The function current-buffer returns a buffer object which is
neither a number, string, list or symbol (it is an instance of a class). Notice
the last clause: the condition is t which is obviously always true. This is
the “otherwise” clause guaranteed to fire if everything else above has failed.
Loops
The simplest loop is a while:
dotimes takes a variable and a count, and sets the variable from 0 to count -
1:
dolist takes a variable and a list, and sets the variable to each item in the
list:
If you need anything more complicated, take a look at the documentation of the
loop macro. This is a very powerful macro with lot of options that takes an
(almost) English sentence as argument and generates what you mean. For example,
a C “for” loop can be expressed like so:
Another example is the following code which iterates over a “plist” (property
list) which is a collection like (key1 value1 key2 value2) using cddr to
move by 2 items at a time and skipping the properties where the key is an even
number:
Elisp also has exceptions, try/catch/finally and anything else you would expect.
Functions
Lisp uses several keywords for declaring arguments within a defun.
&optional introduces optional arguments, which if not specified are bound
to nil. For example (defun foo (a b &optional c d) ...) makes c and d
optional.
&rest takes all remaining arguments and concatenates them into a list. For
example the signature of the list function is simply (&rest objects).
&key introduces a keyword argument, that is an optional argument specified
by a keyword with a default value of your choice. For example:
Functions are first class objects in Lisp. You can store them in a variable and
call them later. For example:
The syntax #'foo is sugar for (function foo) which returns the definition
of the function stored in symbol foo. It basically returns a pointer to the
code. funcall calls the function with a given list of arguments. Note that
Emacs is very tolerant and (setq f 'list) (e.g. setting f to the symbol
“list”) will also work.
apply works like funcall but it applies the function to a list of
arguments:
An interesting example of using apply is mapcar which applies a function to
each element of a list and returns a list of the results:
Interactive functions
Let’s use our fresh knowledge to do something useful.
Sometimes I want to include a separator in a comment, e.g. a sequence of dashes
or tilde that fills up the rest of the line until the 80 character column (the
fill-column variable defines that limit). For example, if I type “// Begin of
test” I want a magic key to do this:
Elisp functions must be declared “interactive” if you want to call then using
Meta-x or bind them to a key. You do this declaration by calling the
interactive special form (it’s not a function) as the first form in the body of
your function.
end-of-line move the cursor to the end of the line, as you probably
guessed. The let form calculates the number of characters to insert before it
reaches the end of the line using the variable fill-column (which should be
set to 79) and the current-column function which returns the cursor’s column
number. The insert function inserts a character or string at the position of
the cursor. Finally global-set-key binds the function to a key chord. Note
that this is a simple implementation; it might be more efficient to create a
string with n characters using (make-string num-chars ?~).
Let’s write another one. Suppose you work in an organization that has created
its own code style, and suppose that said code style proclaims that lines
longer than 80 characters are a cardinal sin. Believe me, such code styles do
exist. So let’s write an interactive function that will find the next “long”
line in the current buffer, from the position of the cursor. It could look like
this2:
This interactive function takes a numeric argument which is the max length of
lines. The “P” string in the call to interactive specifies that we use an
argument (in raw form; see the documentation of interactive for
details). Either the user invokes this function with M-x goto-long-line, in
which case the argument len is set to nil, or she invokes the function with
C-u 7 9 M-x goto-long-line, in which case the argument len is set to 79 (for
instance). The first setq line is used to set a default value to len:
either it is the number that the user specified or it is the value of variable
fill-column.
Without going into too much details, the rest of the code is a while loop
until we have found a line or we reached the end of the buffer (predicate
eobp). At each step we go down one line (forward-line) and we check the
length of the line. Note that the Emacs function point returns the position
of the cursor as an offset into the file (the current character number if you
will). Our function is designed to be called both interactively and within a
program, so it tests how we are called using predicate called-interactively-p
before deciding to print a message or not. point-min returns the position of
the first character in the buffer (should be 1) and goto-char goes to a given
character position.
Note that sometimes the compiler complains when you call a function that is
designed to be used interactively in your code (these functions are marked as
such using a property). Usually the warning says you should use another
function, supposedly more efficient because doing less tests.
That’s it for today. Lots more to come. Stay tuned!
A right click in the mini-buffer pops up the message buffer. That’s a nice trick for debugging if you have a lot of traces. ↩
Lisp was the second programming language ever invented, right after Fortran in
the late fifties. John McCarthy, one of the founders of the discipline of
Artificial Intelligence, created it.
Lisp was very popular during the boom of AI; it even had its own
hardware which I had the
privilege to work with. Lisp has invented many, if not most, of the concepts
and programming paradigms used in modern languages, including homoiconicity,
first class functions, garbage collection, aspect-oriented programming, you
name it1. Several languages creators have said that Lisp was a
major source of inspiration for them (hello
Java,
Ruby and
JavaScript). Some colleague of mine once said, and I am paraphrasing, that
every programming language ever invented either tries to be a better Fortran or
a degraded version of Lisp, e.g. some kind of Lisp for the masses. But don’t
think that Lisp is history; modern Lisps like Clojure are state-of-the-art
programming languages.
Lisp is a programmable programming language. What does that mean? It means
that you can change Lisp (dynamically, even) to be what you want. So for
example if you are writing a text editor, you can turn Lisp into a language for
writing text editors. You will never find yourself wishing the language
supported some feature that would make your life easier; you can just add the
feature yourself.
Lisp achieves that by putting data and code at the same level: data can be used
(evaluated, compiled) as code, and vice versa. Whereas C provides naive
text-substitution macros and C++ provides brain-dead templates and template
meta-programming weirdness, Lisp macros give you full access to the power of
Lisp at compilation time. Basically you can tell the compiler “execute this
code, and use the result as the code to be included in the program”.
Developing in Lisp is easy: you can use a REPL (read-eval-print loop) to play
with your code as you write it. The story of the
Deep Space 1 probe is an
interesting anecdote about how useful a REPL can be: “Debugging a program
running on a $100M piece of hardware that is 100 million miles away is an
interesting experience”. Lisp can also be very fast, close to C-level
performance according to some benchmarks. It had to be because it is so old
(think about the kind of hardware they had in the sixties).
Dialects
There has been countless dialects of Lisp in history. The ones that are
relevant today are:
Common Lisp: the ANSI standard, which has many implementations such as SBCL
(Steel Bank Common Lisp, a high-performance native compiler). Common Lisp
includes one the best object-oriented languages I’ve ever seen: CLOS (Common
Lisp Object System).
Scheme: a Lisp with a minimalist design philosophy. It is the programming
language used in the textbook
SICP. Guile
and Racket are popular implementations.
Clojure: A very modern language that runs with the JVM or a JavaScript
runtime. Clojure brings in a modern Lisp syntax, pure functions, software
transactional memory, and many other cool things.
Emacs Lisp: unfortunately the worse Lisp out there. It uses dynamic
binding2 by default, it is single-threaded, and it is
super slow. The only good thing you can say about it is “well at least it’s a
Lisp”. Finding a replacement is a hot topic today in the Emacs community.
Getting started
The simplest way to get a taste of Lisp is just to fire up Emacs. There are two
ways to interact with ELisp:
M-x ielm (inferior emacs lisp mode) gives you a REPL similar to say irb
or python. Use C-UP and C-DOWN to repeat commands you typed
earlier. For example, try to evaluate () (nil).
You can use any ELisp buffer such as the scratch buffer. C-j evaluates the
lisp expression before the cursor and inserts the result where the cursor
is. M-C-x evaluates the current form and prints the result in the
mini-buffer (also in messages). You can also use functions like M-x
evaluate-region.
Give it a try: open the scratch buffer and type "hello world". Then evaluate
with M-C-x (meta control x): the mini-buffer should display the string. This
works because strings, like numbers, are objects that evaluate to themselves.
If you want a real hello world, type this and evaluate again3:
This is what Lisp calls a symbolic expression (s-expression or sexp). It calls
the function print passing a string as parameter (you can also use format
which is similar to C’s printf). Lisp uses the Polish notation for function
calls; for example (1 + 2 + 3) * 5 is written in Lisp as (* (+ 1 2 3) 5).
The mini-buffer should display the string “hello world” twice: one is the
printed text and the other is the returned value, which is also the printed
text. Every function returns a value which is normally the last form that was
evaluated.
If you wanted the code above to return nil (which in ELisp and Common Lisp
means void, false, and the empty list) you could do this:
A progn is a bloc (list of sexp) which evaluates each form in sequence and
returns the value of the last form. It is named like that because of an other
function prog1 which does the same thing but returns the value of the first
form. Run this code again with M-C-x: the mini-buffer should display the
string that was printed and the return value nil.
That’s nice but our hello world should really be a function. So let’s define
one:
defun is followed by the name of the function, the list of arguments (an
empty list here), an optional documentation string, and the forms. The body of
a function is an implicit progn. Note that comments begin with a semicolon.
Working with Lists
The basic data structure in Lisp is a single-linked list. The syntax of a list
is exactly the same as a sexp. For example let’s declare a variable l
containing a list of integers:
defvar is followed by a variable name and a value (and an optional
documentation string). Notice the quote character before the value: this is
syntactic sugar for (quote (1 2 3)) which means “don’t evaluate this”. The
quote is needed because otherwise Lisp would try to call a function named “1”
with parameters 2 and 3.
If you wanted to use the result of a function call as value instead, you could
use the list function, which creates a list containing its arguments:
Here we don’t use the quote because we want the list form to be
evaluated. There is no need to quote the numbers because a number evaluates to
itself.
l is a symbol, which is an object in memory with a unique name. A Symbol
has a name, and possibly a value, a function definition and a property
list. Our function hello-world above is also a symbol; it has no value but it
has a function definition. There is a special kind of symbol called keyword
which has just a name and evaluates to itself; keywords start with a colon like
:foo (they are like interned strings).
The value of l is a list containing 3 cells or cons (for construct), each
made of 2 pointers: a pointer to the value and a pointer to the next cell (or
nil).
Function car returns the value of the first pointer, and cdr the value of
the second pointer (they are named like that for historical reasons4):
You can create a cons using the function that has the same name; its parameters
are the car and the cdr.
The call to cons returns a new list starting with 0 and pointing to the first
cons of l. You can verify that l and l2 share the same tail with function
eq, which returns t (true) if its arguments are the same Lisp object:
Of course Lisp has plenty of functions to manipulate lists. Here is how to
reverse our list:
setq sets a variable to a new value (set eq).
That’s it for today. Lots more to come. Stay tuned!
And almost OOP. Alan Kay, who invented Smalltalk and coined Object Oriented Programming, said that “Lisp is the greatest single programming language ever designed”. ↩
As opposed to lexical binding. If you define a local variable x in function foo and then call function bar, bar will see the value of x even if you don’t pass it as parameter. A long time ago people thought it was a good idea for performance reasons. ↩
If you make a mistake and end up in the debugger, just press q to exit. ↩
CAR and CDR were the names of two registers in the CPU of the IBM 704! Those were literally the name of 2 instructions: “contents of the address register” and “contents of the decrement register”. You can also use FIRST and REST if you prefer. ↩
This is the first article in a series about Org mode.
Org mode is a killer feature of Emacs. Some people use Emacs just for that
mode. It can do many things including organizing notes, project planning, web
publishing and literate programming. You can even write your emacs
configuration in Org mode and publish it: here is
an example (to make it work,
you only need a tiny init.el that loads the Org file and runs the embedded
Lisp code).
The only bad thing about Org mode is that it is not universal, because it is
very tied to Emacs. There are plugins for Vim and Sublime Text for instance,
but they only cover a fraction of the features that the real thing
provides. This is the reason why Markdown is more popular than Org while being
objectively inferior. Although more and more sites understand Org files (GitHub
certainly does).
Let’s get started.
Outlines
An Org file is a plain text file with headlines, text, and some additional
information such as tags and timestamps. A headline starts with a series of
asterisks. The more asterisks there are, the deeper the headline is.
For example, you can create a file with extension “.org” and with this content:
You can make Emacs render this nicely with the
org bullet extension, which masks the
asterisks and displays Unicode bullets instead:
When typing the text above, use M-RET (meta + return) to create a new
headline at the same level as the one above it, or a first-level headline if
the document does not have headlines yet. Use M-LEFT and M-RIGHT to promote
or demote a headline, e.g. change its level. You can also move a headline and
all the text under it up and down using M-UP and M-DOWN.
Finally the TAB key collapses or expands headlines. When a headline is
collapsed, its content is replaced with an ellipsis like so:
S-TAB (shift + tab) collapses or expands everything.
Lists and checkboxes
If you prefer, you can also create hierarchies using lists. For example:
The same keys work with lists, e.g. use M-RET to create a new list item. You
can also change the style of your list using S-LEFT and S-RIGHT. For
example, change the list to use numbers:
Notice that if you move an item up and down with M-UP / M-DOWN, the numbers
are automatically updated.
To create an item with a checkbox, use S-M-RET (shift meta return). Toggle
a checkbox using C-c C-c.
TODOs
An alternative to checkboxes is TODO items in headlines:
Type S-M-RET (shift meta return) to create a new headline that starts with a
TODO. Change the state of a TODO into DONE or vice versa using S-LEFT and
S-RIGHT.
You can add more states to TODO and
DONE. Exordium uses this code to
add the WORK and WAIT states:
You can also specify the states on a per-file basis by adding a line like this
at the beginning of the file (save and reopen to make it work):
The vertical bar separates the TODO keywords (states that need action) from the
DONE states (which need no further action). They are displayed with different
colors.
Markup
Org’s markup syntax is more intuitive than the one of Markdown (IMO):
You can make the images display inline using this code in your configuration
(reopen the Org file to make it work):
Tables
Finally the pièce de résistance: type this text:
Then hit TAB and see what happens. Voila! The table will automatically
resize itself as you tab and shift-tab to move between cells.
I’ve used Eclipse for many years. What always bothered me about it is that it
forces you to use the mouse all the time, even for things like switching
between buffers1. Which is a very common operation: add an
argument to the definition of a function, switch to the file where it is
called, and change the function call. In fact my Emacs configuration sets up a
very easy key for
switching between the 2 most recently used buffers.
One of the many great things about Emacs (and Vim as well) is that you can do
everything you need without ever using the mouse, and in fact without even
requiring a GUI. This is a killer feature compared to most IDEs because menus
and mice are slow. If your hands don’t have to leave the home row, you can
change text almost as fast as you think.
You can get more productive if you know how to move the cursor quickly within a
buffer. There are several clever extensions
for that, but here we’ll review a few built-in keys. Note that I’m using the
arrow keys because they are easy to remember2.
Beginning and end
These four keys are a must:
Key binding
Description
C-a
Go to the beginning of the line.
C-e
Go to the end of the line.
M-<
Go to the beginning of the buffer.
M->
Go to the end of the buffer.
Move by words
Key binding
Description
C-right
Move forward one word (right-word).
C-left
Move backward one word (left-word).
Any major mode may have its own definition of what a word is (it is defined in
the mode’s syntax table).
Unfortunately these keys are not symmetrical: moving right then left does not
necessarily bring you back where you started. For programming, I found it
useful to define these extra keys for moving by semantic units rather than
words:
You can pass a numeric argument to these commands to move by more than a single
word. The Universal numeric argument prefix lets you pass a number to a
command, and the prefix is C-u followed by the number followed by the
command. For example C-u 3 C-right moves forward 3 words.
Move by paragraphs
I use these all the time:
Key binding
Description
C-up
Move up one paragraph (backward-paragraph).
C-down
Move down one paragraph (forward-paragraph).
Move by defuns
You can move to the beginning or end of a class or function almost the same way
you move to the beginning or end of the line, except that the prefix is M-C-:
Key binding
Description
M-C-a
Go to the beginning of a class or function.
M-C-e
Go to the end of a class or function.
Repeat to go to the next or previous class/function.
Move by s-expression
This is very handy for Lisp. In Lisp, an s-expression (symbolic expression or
sexp) is an atom or a list. For other programming languages, Emacs also
considers strings and blocs between curly braces or square brackets. Moving by
sexp is similar to moving by word, only the prefix is M-C-:
Key binding
Description
M-C-left
Move forward one sexp (forward-sexp).
M-C-right
Move backward one sexp (backward-sexp).
M-C-d
Move down a sexp.
M-C-u
Move up a sexp.
M-C-n
Move to the next sexp in the same nested level.
M-C-p
Move to the previous sexp at the same nested level.
Give it a try, it is more useful than you think.
One more thing
You can use M-x view-lossage to assess your productivity with Emacs: this
function displays the last 300 keys you have pressed. If it shows the same key
repeated many times, you are probably doing it wrong.
Eclipse has a shortcut key that displays a menu of the open files, but it is slow and cumbersome. ↩
Touch-type purists prefer to use other keys like C-f and C-b. ↩
Well. Finally got around to putting this website together. What’s neat, it is
written in Markdown using Emacs and built with
Jekyll. All it takes to publish it is to push the git
repo.
This will be a blog about Emacs. It will contain concise posts about how to
improve your productivity and how to program Emacs to make it your perfect
editor.