Sunday, January 03, 2010

Explaining Haskell IO without Monads

This tutorial explains how to perform IO in Haskell, without attempting to give any understanding of monads. We start with the simplest example of IO, then build up to more complex examples. You can either read the tutorial to the end, or stop at the end of any section - each additional section will let you tackle new problems. We assume basic familiarity with Haskell, such as the material covered in chapters 1 to 6 of Programming in Haskell by Graham Hutton.

IO Functions

In this tutorial I use four standard IO functions:


  • readFile :: FilePath -> IO String -- read in a file

  • writeFile :: FilePath -> String -> IO () -- write out a file

  • getArgs :: IO [String] -- get the command line arguments, from the module System.Environment

  • putStrLn :: String -> IO () -- write out a string, followed by a new line, to the console



Simple IO

The simplest useful form of IO is to read a file, do something, then write out a file.


main :: IO ()
main = do
src <- readFile "file.in"
writeFile "file.out" (operate src)

operate :: String -> String
operate = ... -- your code here


This program gets the contents of file.in, runs the operate function on it, then writes the result to file.out. The main function contains all the IO operations, while operate is entirely pure. When writing operate you do not need to understand any details of IO. This pattern of IO was sufficient for my first two years of programming Haskell.

Action List

If the pattern described in Simple IO is insufficient, the next step is a list of actions. A main function can be written as:


main :: IO ()
main = do
x1 <- expr1
x2 <- expr2
...
xN <- exprN
return ()


The main function starts with do, then has a sequence of xI <- exprI statements, and ends with return (). Each statement has a pattern on the left of the arrow (often just a variable), and an expression on the right. If the expression is not of type IO, then you must write xI <- return (exprI). The return function takes a value, and wraps it in the IO type.

As a simple example we can write a program that gets the command line arguments, reads the file given by the first argument, operates on it, then writes out to the file given by the second argument:


main :: IO ()
main = do
[arg1,arg2] <- getArgs
src <- readFile arg1
res <- return (operate src)
_ <- writeFile arg2 res
return ()


As before, operate is a pure function. The first line after the do uses a pattern match to extract the command line arguments. The second line reads the file specified by the first argument. The third line uses return to wrap a pure value. The fourth line provides no useful result, so we ignore it by writing _ <-.

Simplifying IO

The action list pattern is very rigid, and people usually simplify the code using the following three rules:


  1. _ <- x can be rewritten as x.

  2. If the penultimate line doesn't have a binding arrow (<-) and is of type IO (), then the return () can be removed.

  3. x <- return y can be rewritten as let x = y (provided you don't reuse variable names).



With these rules we can rewrite our example as:


main :: IO ()
main = do
[arg1,arg2] <- getArgs
src <- readFile arg1
let res = operate src
writeFile arg2 res


Nested IO

So far only the main function has been of type IO, but we can create other IO functions, to wrap up common patterns. For example, we can write a utility function to print nice looking titles:


title :: String -> IO ()
title str = do
putStrLn str
putStrLn (replicate (length str) '-')
putStrLn ""


We can use this title function multiple times within main:


main :: IO ()
main = do
title "Hello"
title "Goodbye"


Returning IO Values

The functions we've written so far have all been of type IO (), which lets us perform IO actions, but not give back interesting results. To give back the value x, we write return x as the final line of the do block. Unlike the imperative language return statement, this return must be on the final line.


readArgs :: IO (String,String)
readArgs = do
xs <- getArgs
let x1 = if length xs > 0 then xs !! 0 else "file.in"
let x2 = if length xs > 1 then xs !! 1 else "file.out"
return (x1,x2)


This function returns the first two command line arguments, or supplies default values if fewer arguments are given. We can now use this in the main program from before:


main :: IO ()
main = do
(arg1,arg2) <- readArgs
src <- readFile arg1
let res = operate src
writeFile arg2 res


Now, if less than two arguments are given, the program will use default file names instead of crashing.

Optional IO

So far we've only seen a static list of IO statements, executed in order. Using if, we can choose what IO to perform. For example, if the user enters no arguments we can tell them:


main :: IO ()
main = do
xs <- getArgs
if null xs then do
putStrLn "You entered no arguments"
else do
putStrLn ("You entered " ++ show xs)


For optional IO you make the final statement of the do block an if, then under each branch continue the do. The only subtle point is that the else must be indented by one more space than the if. This caveat is widely considered to be a bug in the definition of Haskell, but for the moment, the extra space before the else is required.

Break Time

If you've gone from understanding no IO to this point in the tutorial, I suggest you take a break (a cookie is recommended). The IO presented above is all that imperative languages provide, and is a useful starting point. Just as functional programming provides much more powerful ways of working with functions by treating them as values, it also allows IO to be treated as values, which we explore in the rest of the tutorial.

Working with IO Values

The next stage is to work with IO as values. Until now, all IO statements have been executed immediately, but we can also create variables of type IO. Using our title function from above we can write:


main :: IO ()
main = do
let x = title "Welcome"
x
x
x


Instead of running the IO with x <-, we have placed the IO value in the variable x, without running it. The type of x is IO (), so we can now write x on a line to execute the action. By writing the x three times we perform the action three times.

Passing IO Arguments

We can also pass IO values as arguments to functions. In the previous example we ran the IO action three times, but how would we run it fifty times? We can write a function that takes an IO action, and a number, and runs the action that number of times:


replicateM_ :: Int -> IO () -> IO ()
replicateM_ n act = do
if n == 0 then do
return ()
else do
act
replicateM_ (n-1) act


This definition makes use of optional IO to decide when to stop, and recursion to continue performing the IO. We can now rewrite the previous example as:


main :: IO ()
main = do
let x = title "Welcome"
replicateM_ 3 x


In an imperative language the replicateM_ function is built in as a for statement, but the flexibility of Haskell allows us to define new control flow statements - a very powerful feature. The replicateM_ function defined in Control.Monad is like ours, but more general, and can be used instead.

IO in Structures

We've seen IO values being passed as arguments, so it's natural that we can also put IO in structures such as lists and tuples. The function sequence_ takes a list of IO actions, and executes each action in turn:


sequence_ :: [IO ()] -> IO ()
sequence_ xs = do
if null xs then do
return ()
else do
head xs
sequence_ (tail xs)


If there are no elements in the list then sequence_ stops, with return (). If there are elements in the list then sequence_ gets the first action (with head xs) and executes it, then calls sequence_ on the remaining actions. As before, sequence_ is available in Control.Monad, but in a more general form. It is now simple to rewrite replicateM_ in terms of sequence_:


replicateM_ :: Int -> IO () -> IO ()
replicateM_ n act = sequence_ (replicate n act)


Pattern Matching

A much more natural definition of sequence_, rather than using null/head/tail, is to make use of Haskell's pattern matching. If there is exactly one statement in a do block, you can remove the do. Rewriting sequence_ we can eliminate the do after the equals sign, and the do after the then keyword.


sequence_ :: [IO ()] -> IO ()
sequence_ xs =
if null xs then
return ()
else do
head xs
sequence_ (tail xs)


Now we can replace the if with pattern matching, without needing to consider the IO:


sequence_ :: [IO ()] -> IO ()
sequence_ [] = return ()
sequence_ (x:xs) = do
x
sequence_ xs


Final Example

As a final example, imagine we wish to perform some operation on every file given at the command line. Using what we have already learnt, we can write:


main :: IO ()
main = do
xs <- getArgs
sequence_ (map operateFile xs)

operateFile :: FilePath -> IO ()
operateFile x = do
src <- readFile x
writeFile (x ++ ".out") (operate src)

operate :: String -> String
operate = ...


IO Design

A Haskell program usually consists of an outer IO shell calling pure functions. In the previous example main and operateFile are part of the IO shell, while operate and everything it uses are pure. As a general design principle, keep the IO layer small. The IO layer should concisely perform the necessary IO, then delegate to the pure part. Use of explicit IO in Haskell is necessary, but should be kept to a minimum - pure Haskell is where the beauty lies.

Where to go now

You should now be equipped to do all the IO you need. To become more proficient I recommend any of the following:


  • Write lots of Haskell code.

  • Read chapters 8 and 9 of Programming in Haskell by Graham Hutton. You should expect to spend about 6 hours thinking and contemplating on sections 8.1 to 8.4 (I recommend going to a hospital A&E department with a minor injury).

  • Read Monads as Containers, an excellent introduction to monads.

  • Look at the documentation on the monad laws, and find where I've used them in this tutorial.

  • Read through all the functions in Control.Monad, try to define them, and then use them when writing programs.

  • Implement and use a state monad.

14 comments:

okagawa said...

First of all, I think this is a good tutorial on Haskell IO.
As I just finished reading "Programming in Haskell", this article was greate to me.

I'm afraid this is beyond the scope of this tutorial, let me ask a question on the type signature of seqence_.
In your article, type signature of sequence_ was "[IO ()] -> IO ()". I tried to change it to generic form, and found that it has to be "(Monad m) => [m a] -> m ()" as it was in Prelude's sequence_.
It looks like that "m a" is the generic form of "IO ()" as the argument type and "m ()" is the generic form of "IO ()" as the return type.
My confusions are why they has to be different and what the meaning is.

If possible, please let me know about it, or pointer to the relevant literature.

Thanks,

Neil Mitchell said...

okagawa: My definition of sequence_ is:

[IO ()] -> IO ()

To get to the standard definition you make two changes.

1) sequence_ works on any monad, so you can change from IO to m, where m is a monad.

Monad m => [m ()] -> m ()

2) sequence_ takes a list of actions, and runs them. But the actions could return results, they just get discarded. So actually, we could allow any type of actions.

Monad m => [m a] -> m ()

Another way to see it is that my sequence_ is just the standard sequence with m set to IO, and a set to ().

okagawa said...

Thank you for your immediate reply!
Your explanation is clear and concise.

Maciej Piechotka said...

Next simplifictaion rule:
If there is only one line do can be ommited.

I.e. do action -> action

Neil Mitchell said...

Maciej: That rule is listed in the section "Pattern Matching" - it makes it slightly too confusing to introduce it at the same time as the other rules, as it's not so interesting usually.

kowey said...

Nice, Neil. It's worth thinking about merging this into the wikibook, or barring that, granting something at least as permissive as the Creative Commons Attribution/Share-Alike license.

The wikibook chapter on IO tries to achieve the same effect, that is, focus on getting people to be able to do IO without worrying about monads. But it is much clumsier. I basically took YAHT and tried to refocus it on practice and threw in a lot of rambling nonsense along the way.

Your post, on the other hand, a nice catalogue of patterns, is quite usable.

Also, it would be nice to see a real "Haskell IO without Monads" post one day :-)

Neil Mitchell said...

Eric: I hereby license this post under the "Creative Commons Attribution/Share-Alike license"

I'm not going to have time to merge it in to the wiki book, but you're welcome to steal what you want. Looking at the wikibook it's a lot longer, I like my tutorials short and punchy, but not everyone has the same preferences.

Alexey Romanov said...

In "Action List", I think it would be good to say explicitly how the types of xI and exprI are related.

beroal said...

The only subtle point is that the else must be indented by one more space than the if. This caveat is widely considered to be a bug in the definition of Haskell, but for the moment, the extra space before the else is required.
"case … of" is more general and meshes nicely with identations. :)

In the chapter "Optional IO" it's natural to consider Control.Monad.when .

seriousken said...

Now you have to tell the Hospital story.

Neil Mitchell said...

Alexey: I'll try and make that revision in a future version.

beroal: Beginners (and me) prefer if, although case would neatly solve the indentation problem. However, GHC is also going to solve the indentation problem shortly, so less of an issue. Putting when in optional IO is a good idea, but you really need IO as a value first. I'll think about how to include when.

seriousken: It was icey, I slipped on the ice, i fell over and hit my elbow. Nothing too serious, but a good chunk of flesh missing.

beroal said...

Beginners (and me) prefer if
Sorry to get off topic… I think that a programmer should get used to "case … of" at the very beginning. In Haskell "case … of" is primitive construction and in C, Pascal and other imperative languages "if" is primitive construction. Invariable use of "if" may lead to the wrong belief that "if" is special in Haskell. Personally I always fall to core Haskell when rules of syntactic sugar is not clear to me or seems tricky or complex.

Anirudh Ramesh Iyer said...

Thanks a lot. this is the best post i have read in past 2 days about how to pass string from readFile to other functions without worrying about "IO String".

Anirudh Ramesh Iyer said...
This comment has been removed by the author.