Mercury tidbits - dependent types and file io

Note: this was originally posted as two separate parts, 1 week apart, and has been compressed for posterity

I just started learning a functional/logic language called Mercury, which has features that make it feel (at least to my initial impressions) like a mix between Prolog and Haskell. It has all the features that make it a viable Prolog, but it also adds static typing (with full type inference) and purity (all side effects are dealt with by passing around the state of the world). Since I recently was interested in learning Prolog, but had no desire to give up static typing or purity, Mercury seemed like a neat thing to learn.

While it is not very well known, the language has been around for over 15 years, and has a high quality self-hosting compiler.

Getting to play around with logic/declarative programming is interesting (and indeed the main reason why I’m interested in learning it), but what seems even more interesting with Mercury is how they have incorporated typing to the logic programming (which, unless I’m mistaken, is a new thing). As a tiny code example:

:- pred head(list(T), T).
:- mode head(in,    out) is semidet.
:- mode head(in(non_empty_list), out) is det.
head(Xs, X) :- Xs = [X | _].

The first line says that this is a predicate (logic statement) that has two parts, the first is a list of some type T (it is polymorphic), the second is an item of type T.

The fourth line should be familiar to a prolog programmer, but briefly, the right side says that Xs is defined as X cons’d to an unnamed element. head can be seen as defining a relationship between Xs and X, where the specifics are that Xs is a list that has as it’s first element X.

Now with regular prolog, only the fourth line would be necessary, and that definition allows some interesting generalization. Because head([1,2,3],Y) will bind Y to 1, while head([1,2,3],1) will be true (or some truthy value), and if head(X,Y) were used in a set of other statements, together they would only yield a result if X (wherever it was bound, or unified, to a value) had as it’s first value Y, whatever Y was.

Since Mercury is statically typed, it adds what it calls modes to predicates, which specify whether a certain argument (that’s probably not the right word!) is going to be given, or whether it is going to be figured out by the predicate. The other thing it has is specifications about whether the predicate is deterministic. There are a couple options, but the two that are relevant to this example are det, which means fully deterministic, for every input there is exactly one output, and semidet, which means for some inputs there is an output, for others there is not (ie, the unification fails). These allow the compiler to do really interesting things, like tell you if you are not covering all of the possible cases if you declare something as det (whereas the same code, as semidet, would not cause any errors).

What is fascinating about this predicate head is that it has two modes defined, one being the obvious head that we know from Haskell etc:

:- mode head(in,    out) is semidet.

Which states that the first argument is the input (the list) and the second is the output (the element), and it is semidet because for an empty list it will fail. The next is more interesting:

:- mode head(in(non_empty_list), out) is det.

This says for an input that is a non_empty_list (defined in the standard libraries, and included below), the second argument is the output, and this is det, ie fully deterministic. What this basically means is that failure is incorporated into the type system, because something that is semidet can fail, but something that is det cannot (neat!). Now the standard modes are defined (something like):

:- mode in == (ground >> ground).
:- mode out == (free    >> ground).

Ground is a something that is bound, and the >> is showing what is happening before and after the unification (the analog to function application). So something of mode in will be bound before and after, whereas something of mode out will not be bound before (that’s what free means) and it will be bound afterwards. That’s pretty straightforward.

What get’s more interesting is something like non_empty_list, where inst stands for instantiation state, ie one of various states that a variable can be in (with ground and free being the most obvious ones):

:- inst non_empty_list == bound([ground | ground]).

What this means is that a non_empty_list is defined as one that has a ground element cons’d to another ground element. (I don’t know the syntax well enough to explain what bound means in this context, but it seems straightforward). What this should allow you to do is write programs that operate on things like non-empty-lists, and have the compiler check to make sure you are never using an empty list where you shouldn’t. Pretty cool!

Obviously you can write data types in Haskell that also do not allow an empty list, like:

data NonEmptyList a = NonEmptyList a [a]

And could build functions to convert between them and normal lists, but the fact that it is so easy to build this kind of type checking on top of existing types with Mercury is really fascinating.

This is (obviously) just scratching the surface of Mercury (and the reason all of this stuff actually works is probably more due to the theoretical underpinnings of logic programming than anything else), but seeing the definition of head gave me enough of an ‘aha!’ moment that it seemed worth sharing.

If any of this piqued your interest, all of it comes out of the (wonderful) tutorial provided at the Mercury Project Documentation page. If there are any inaccuracies (which there probably are!) send them to [email protected].


Note: this is the beginning of the second post

The language that I’ve been learning recently is a pure (ie, side-effect free) logic/functional language named Mercury. There is a wonderful tutorial (PDF) available, which explains the basics, but beyond that, the primary documentation is the language reference (which is well written, but reasonably dense) and Mercury’s standard library reference (which is autogenerated and includes types and source comments, nothing else).

Doing I/O in a pure language is a bit of a conundrum - Haskell solved this by forcing all I/O into a special monad that keeps track of sequencing (and has a mythical state of the world that it changes each time it does something, so as not to violate referential transparency). Mercury has a simpler (though equivalent) approach - every predicate that does IO must take an world state and must give back a new world state. Old world states can not be re-used (Mercury’s mode system keep track of that), and so the state of the world is manually threaded throughout the program. A simple example would be:

main(IO_0,IO_final) :- io.write_string("Hello World!",IO_0,IO_1),
                       io.nl(IO_1,IO_final).

Where the first function consumes the IO_0 state and produces IO_1 (while printing “Hello World!”) and the second function consumes IO_1 and produces IO_final (while printing a newline character).

Of course, manually threading those could become pretty tedious, so they have a shorthand, where the same code above could be written as:

main(!IO) :- io.write_string("Hello World!",!IO),
             io.nl(!IO).

This is just syntax sugar, and can work with any parameters that are dealt with in the same way (and naming it IO for io state is just convention). It definitely makes dealing with I/O more pleasant.

The task that I set was to figure out how to read in a file. This is not covered in the tutorial, and I thought it would be a simple matter of looking through the library reference for the io library. One of the first predicates looks promising:

:- pred io.read_file(io.maybe_partial_res(list(char))::out,
                     io::di,
                     io::uo) is det.

But on second thought, something seems to be missing. The second and third parameters are the world states (the type is io, the mode di stands for destructive-input, meaning the variable cannot be used again, uo means unique output, which means that no other variable in the program can have that value), and the first one is going to be the contents of the file itself. But where is the file name?

The comment provides the necessary pointer:

% Reads all the characters from the current input stream until
% eof or error.

Hmm. So all of these functions operate on whatever the current input stream is. How do we set that? io.set_input_stream looks pretty good:

% io.set_input_stream(NewStream, OldStream, !IO):
% Changes the current input stream to the stream specified.
% Returns the previous stream.
%
:- pred io.set_input_stream(io.input_stream::in,
                            io.input_stream::out,
                            io::di, io::uo) is det.

But even better is io.see, which will try to open a file and if successful, will set it to the current stream (the alternative is to use io.open_input and then io.set_input_stream):

% io.see(File, Result, !IO).
% Attempts to open a file for input, and if successful,
% sets the current input stream to the newly opened stream.
% Result is either 'ok' or 'error(ErrorCode)'.
%
:- pred io.see(string::in, io.res::out, io::di, io::uo) is det.

With that in mind, let’s go ahead and implement a predicate to read files (much like I was expecting to find in the standard library, and what I put into a module of similar utilities I’ve started, titled, in tribute to Haskell, prelude):

:- pred prelude.read_file(string::in,
                          maybe(string)::out,
                          io::di,io::uo) is det.
prelude.read_file(Path,Contents,!IO) :-
  io.see(Path,Result,!IO),
  ( Result = ok,
    io.read_file_as_string(File,!IO),
    io.seen(!IO),
    (
      File = ok(String),
      Contents = yes(String)
    ;
      File = error(_,_),
      Contents = no
    )
  ;
    Result = error(_),
    Contents = no
  ).

To walk through what this code is doing, the type says that this is a predicate that does I/O (that’s what the last two arguments are for), that it takes in a string (the path) and give out a maybe(string), and that this whole thing is deterministic (ie, it always succeeds, which is accomplished by wrapping the failure into the return type: either yes(value) or no).

The first line tries to open the file at the path and bind it as the current input stream. I then pattern match on the results of that - if it failed, just bind Contents (the return value) to no. Otherwise, we try to read the contents out of the file and then close the file and set the input stream to the default one again (that is what the predicate io.seen does). Similarly we handle (well, really don’t handle, at least not well) reading the file failing. If it succeeds, we set the return type to the contents of the file.

What is interesting about this code is that while it is written in the form of logical statements, it feels very much like the way one does I/O in Haskell - probably a bit of that is my own bias (as a Haskell programmer, I am likely to write everything like I would write Haskell code, kind of how my python code always ends up with lambda’s and maps in it), but it also is probably a function of the fact that doing I/O in a statically type pure language is going to always be pretty similar - lots of dealing with error conditions, and not much else!

Anyhow, this was just a tiny bit of code, but it is a predicate that is immediately useful, especially when trying to use Mercury for random scripting tasks (what I often do with new languages, regardless of their reputed ability for scripting).