Monday, August 23, 2010

CmdArgs Example

Summary: A simple CmdArgs parser is incredibly simple (just a data type). A more advanced CmdArgs parser is still pretty simple (a few annotations). People shouldn't be using getArgs even for quick one-off programs.

A few days ago I went to the Haskell Hoodlums meeting - an event in London aimed towards people learning Haskell. The event was good fun, and very well organised - professional quality Haskell tutition for free by friendly people - the Haskell community really is one of Haskell's strengths! The final exercise was to write a program that picks a random number between 1 and 100, then has the user take guesses, with higher/lower hints. After writing the program, Ganesh suggested adding command line flags to control the minimum/maximum numbers. It's not too hard to do this directly with getArgs:


import System.Environment

main = do
[min,max] <- getArgs
print (read min, read max)


Next we discussed adding an optional limit on the number of guesses the user is allowed. It's certainly possible to extend the getArgs variant to take in a limit, but things are starting to get a bit ugly. If the user enters the wrong number of arguments they get a pattern match error. There is no help message to inform the user which flags the program takes. While getArgs is simple to start with, it doesn't have much flexibility, and handles errors very poorly. However, for years I used getArgs for all one-off programs - I found the other command line parsing libraries (including GetOpt) added too much overhead, and always required referring back to the documentation. To solve this problem I wrote CmdArgs.

A Simple CmdArgs Parser

To start using CmdArgs we first define a record to capture the information we want from the command line:


data Guess = Guess {min :: Int, max :: Int, limit :: Maybe Int} deriving (Data,Typeable,Show)


For our number guessing program we need a minimum, a maximum, and an optional limit. The deriving clause is required to operate with the CmdArgs library, and provides some basic reflection capabilities for this data type. Once we've written this data type, a CmdArgs parser is only one function call away:


{-# LANGUAGE DeriveDataTypeable #-}
import System.Console.CmdArgs

data Guess = Guess {min :: Int, max :: Int, limit :: Maybe Int} deriving (Data,Typeable,Show)

main = do
x <- cmdArgs $ Guess 1 100 Nothing
print x


Now we have a simple command line parser. Some sample interactions are:


$ guess --min=10
NumberGuess {min = 10, max = 100, limit = Nothing}

$ guess --min=10 --max=1000
NumberGuess {min = 10, max = 1000, limit = Nothing}

$ guess --limit=5
NumberGuess {min = 1, max = 100, limit = Just 5}

$ guess --help
The guess program

guess [OPTIONS]

-? --help Display help message
-V --version Print version information
--min=INT
--max=INT
-l --limit=INT


Adding Features to CmdArgs

Our simple CmdArgs parser is probably sufficient for this task. I doubt anyone will be persuaded to use my guessing program without a fancy iPhone interface. However, CmdArgs provides all the power necessary to customise the parser, by adding annotations to the input value. First, we can modify the parser to make it easier to add our annotations:


guess = cmdArgsMode $ Guess {min = 1, max = 100, limit = Nothing}

main = do
x <- cmdArgsRun guess
print x


We have changed Guess to use record syntax for constructing the values, which helps document what we are doing. We've also switched to using cmdArgsMode/cmdArgsRun (cmdArgs which is just a composition of those two functions) - this helps avoid any problems with capturing the annotations when running repeatedly in GHCi. Now we can add annotations to the guess value:


guess = cmdArgsMode $ Guess
{min = 1 &= argPos 0 &= typ "MIN"
,max = 100 &= argPos 1 &= typ "MAX"
,limit = Nothing &= name "n" &= help "Limit the number of choices"}
&= summary "Neil's awesome guessing program"


Here we've specified that min/max must be at argument position 0/1, which more closely matches the original getArgs parser - this means the user is always forced to enter a min/max (they could be made optional with the opt annotation). For the limit we've added a name annotation to say that we'd like the flag -n to map to limit, instead of using the default -l. We've also given limit some help text, which will be displayed with --help. Finally, we've given a different summary line to the program.

We can now interact with our new parser:


$ guess
Requires at least 2 arguments, got 0

$ guess 1 100
Guess {min = 1, max = 100, limit = Nothing}

$ guess 1 100 -n4
Guess {min = 1, max = 100, limit = Just 4}

$ guess -?
Neil's awesome guessing program

guess [OPTIONS] MIN MAX

-? --help Display help message
-V --version Print version information
-n --limit=INT Limit the number of choices


The Complete Program

For completeness sake, here is the complete program. I think for this program the most suitable CmdArgs parser is the simpler one initially written, which I have used here:


{-# LANGUAGE DeriveDataTypeable, RecordWildCards #-}

import System.Random
import System.Console.CmdArgs

data Guess = Guess {min :: Int, max :: Int, limit :: Maybe Int} deriving (Data,Typeable)

main = do
Guess{..} <- cmdArgs $ Guess 1 100 Nothing
answer <- randomRIO (min,max)
game limit answer

game (Just 0) answer = putStrLn "Limit exceeded"
game limit answer = do
putStr "Have a guess: "
guess <- fmap read getLine
if guess == answer then
putStrLn "Awesome!!!1"
else do
putStrLn $ if guess > answer then "Too high" else "Too low"
game (fmap (subtract 1) limit) answer


(The code in this post can be freely reused for any purpose, unless you are porting it to the iPhone, in which case I want 10% of all revenues.)

29 comments:

Bryan O'Sullivan said...

Neil, I think that CmdArgs is very nice, and you've done a service for the community by liberating us from System.Console.GetOpt :-)

Any chance you'd be willing to write a blog post about the internals? I haven't looked at all, but from the outside it all looks mysteriously magical.

Christopher Done said...

Nice one! I've been wanting a proper generic arguments library. Some great stuff coming out of Data.Data! I discovered the Typeable-based Text.JSON.Generic recently too! What's next?

Neil Mitchell said...

Bryan and ctnd: I've written the internals on top of a new generics wrapper on top of Data.Data, which I'm going to roll in to Uniplate (as Data.Generics.Any). Once I've done that (and blogged about it) then I'll write something on how CmdArgs works.

Paolo Losi said...

Neil, thanks for your work! I'm a happy CmdArgs user since 0.1.
I think that the strategy that you are using with command line options could be effectively used for configuration file handling as well.

In python we're using a library that declaratively defines the configuration items for a Module
(think of user/dbname/pass/hosts parameters for a module that exposes a DB Connection Pool singleton).

We're using yaml to map the hierarchy of modules and "overwrite" the default config parameters defined in the code.

the yaml config file location is then specified via command line or
env variable.

This could be, more or less, translated to haskell, by one of
the following options:

1) building a config variable that refers to sub configurations:

data AppConfig = AppConfig { dbPool :: DbPoolConfig ...

appConfig = AppConfig PoolModule. dbPoolConfig ...

2) mapping yaml hierarchy with haskell module names (probably via TH)

What do you think about it?

Unknown said...

Typo in the complete program at the end of the post: NumberGuess should be Guess.

This seems like a great library! A reasonable balance of concision and clarity. But... the &= operator gives me heebie-jeebies! I guess you must be using something like unsafePerformIO under the hood there? I suppose it doesn't work for records with strict fields for the same reason?

Have you considered adding an Applicative interface? I think this can be made to work:

guess = cmdArgsMode $ (Guess <$>
1 &= argPos 0 &= typ "MIN" <*>
100 &= argPos 1 &= typ "MAX" <*>
Nothing &= name "n" &= help "Limit the number of choices"
) &= summary "Neil's awesome guessing program"
-- where
data Annotated a
data Annotation
instance Applicative Annotated
(&=) :: Annotated a -> Annotation -> Annotated a
cmdArgsMode :: Data a => Annotated a -> CmdArgsMode a

This certainly isn't as nice as your current syntax (I guess some form of applicative record syntax would be necessary to fix that) but it should be possible to make it pure (and to give people a choice of the two interfaces).

Neil Mitchell said...

Paolo: There's certainly scope for doing similar things with configuration files. The next version of CmdArgs will expose the capture module that I use to get annotations, so hopefully the next version will make it easy to reuse parts of CmdArgs. I think this could be done both using CmdArgs style techniques, or using Template Haskell - I'd favour Data.Data, but these things are personal preference.

rsmith: Typo fixed, thanks! In a case of coincidence, I was working on a pure variant on the train journey to work. It's not as neat as the impure variant, but it's not that far off. It doesn't use applicative at all, but as a brief example:

cmdArgsMode $ record Guess
[min := 1 += argPos 0 += typ "MIN"
,max := 100 += argPos 1 += typ "MAX"
,limit := Nothing += name "n" += help "Limit the number of choices"]
+= summary "Neil's awesome guessing program"

In my implementation, I expose both += and &=, so people import one module, and choose which variant they use.

(I think for yours if would have to be pure 1 &= ann &= ann, rather than 1 &= ann &= ann.)

Tomáš Jakl said...

Nice library!

I think the help output is a little confusing:

$ guess -?
Neil's awesome guessing program

guess [OPTIONS] MIN MAX

-? --help Display help message
-V --version Print version information
-n --limit=INT Limit the number of choices


From previous example -n argument can be after MIN MAX args. So maybe better could be:

guess [OPTIONS] MIN MAX [-nX|--limit=X]

Drp said...

Neil, very nice, I quite like cmdargs - thanks for sharing it. Just a quick couple of questions:

a) is it possible to have boolean flags that can be disabled. eg. a boolean flag "--debug", with a corresponding "--no-debug"?

b) does cmdargs support repeated options. eg. multiple "-v" flags to increase the verbosity level

Cheers,

Drp said...

Sorry, one more question... What happened to the "flag" annotation from 0.1?
Cheers,

Neil Mitchell said...

JackLee: Most programs have lots of flags, and in that situation putting the flags on one line doesn't scale. I think the help output for single mode programs is fairly consistent with what other programs do, so it hopefully won't confuse users too much. In addition, it does specify a way that works, so the user will still be able to figure it out.

Drp:

a) --debug=no will do it.

b) CmdArgs only supports -v to be verbose as it's standard verbosity settings. However, you don't need to use the built in verbosity: {verbosity :: [Bool]} will let you specify multiple -v options and count which ones you have.

c) flag => name, help => text, empty => opt. I renamed lots of flags, and now some are shared (for example name works for both modes and flags).

Drp said...

Excellent Neil, thanks! Regarding the renaming of flag=>name, I think the haddock under "explicit" still mentioned "flag". Having read through the docs, I can't seem to find the equivalent of "unknownFlags" in 0.3, did I just miss it?

Neil Mitchell said...

Drp: You are right, I had used flag under explicit - I've fixed that in the darcs version. There is no equivalent of unknown flags - I didn't know anyone was using it. What were you using it for? If you had a particular use case I'll think about adding it back.

Drp said...

Neil: I have a 'spawner' program that I use to run, and when necessary restart, a separate daemon process. I was using 'unknownFlags' and 'args' in the cmdargs specification of spawner to capture the arguments to run the daemon program with.

I could work around this by using a String flag to accept the arguments to pass to the daemon process. Even better, would be support for something like ghc's "+RTS", or the common "--" (don't parse past here) in unix utilities.

Neil Mitchell said...

Drp: -- is an excellent suggestion, and I've just checked it in to the darcs version. Now:

program -f -- -b

Is treated as one -f flag, and one args value -b.

Drp said...

Neil: That would be perfect. I'll await your next release before I upgrade from 0.1

Thanks again,

Linus said...

Hello, Haskell newbie here.

Could you post a quick real-world example of a multi-mode program? I can copy-paste your single-mode examples and get things working, but I am lost how to handle multi-mode argument processing using CmdArgs, beyond just printing the parsed arguments (as in the example at http://community.haskell.org/~ndm/darcs/cmdargs/cmdargs.htm).

Here is my problem:

myModes = cmdArgsMode $ modes [mode1,mode2]
main = do
args <- cmdArgsRun myModes
...?

In your single-mode example, you did:

main = do
Guess{..} <- cmdArgs $ Guess 1 100 Nothing

What's the multi-mode equivalent of this?

Linus said...

After some non-intuitive googling, I figured out the answer: http://stackoverflow.com/questions/2973284/type-conditional-controls-in-haskell

Then I can just do:

data Greeter =
Mode1 {command :: String}
|
Mode2 {command :: String}
deriving (Data,Typeable,Show,Eq)

...

main = do
args <- cmdArgsRun myModes
case args of
(Mode1 _) -> ...
(Mode2 _) -> ...
etc.

where Mode1 and Mode2 are the value constructors of Greeting's components.

Hopefully this will help other Haskell newbies...

Neil Mitchell said...

Linus: Glad you worked it out - that is indeed correct. You can also do a slight variation to get:


main = do
args <- cmdArgsRun myModes
case args of
Mode1{..} -> ...
Mode2{..} -> ...

Now the fields such as command will be available on the right of the -> arrows.

Linus said...

Thanks, using the record wild cards (i.e., Mode1{..} -> ...) to get at the components of Mode1 from the right side of the -> arrow is very handy indeed.

Other thoughts: I get the feeling that CmdArgs.Explicit's functions are required to get functionality like GNU's getopt()/getopt_long() family of functions (option flags that require no arguments vs. option flags that require arguments, etc.), but the documentation for CmdArgs.Explicit is difficult to understand.

The reason I say this is because I'm trying to get CmdArgs to print an error if, e.g., the "command" option in my example above is given an empty string, like this:

./myprogram -c ""
(this should abort with an error)

And I also want to create my own flags that do not take any arguments (just like the built-in --help or --verbose flags).

I'm not sure how to do these two things without getting involved with CmdArgs.Explicit, which seems a bit scary at the moment...

Neil Mitchell said...

Linus: I write lots of large parsers, and I never use the explicit version - the implicit version is more than enough.

To define a parser that errors out on the empty string, just define a normal parser, and then afterwards check the string and throw an error - CmdArgs won't deal with it for you, but it's trivial to do yourself.

To define a flag that takes no options just give it the type Bool - True will indicate it was passed, False will indicate it wasn't.

Linus said...

Ah, thank you for your responses!

I just realized that I could make a function that checks if a given string is empty, and just generalize it to take on the Mode1, Mode2, etc. values to check if any of the String types in there are empty. For some reason I kept thinking that I had to manually create a new if/then/else statement for every single String-taking flag...

As for using a Bool type for a non-argument flag --- wow, why didn't I think of that...

Thank you for your responses, and for CmdArgs!

Linus said...

Hello again, I've run into another wall...

How can I customize the output of --version?

--version seems to take on whatever is in (&= summary "string"), or if no summary, then it defaults to "The xxx program"

I currently have
&= summary
defined with the program name, version, and copyright info, and I would like --version to just show only the version.

I would also like to prevent the extra newline at the end of --version's output...

My current layout is:

myModes :: Mode (CmdArgs MyProgram)
myModes = cmdArgsMode $ modes [modeA,modeB,modeC]
&= verbosity
&= summary ("myprogram v" ++ _MYPROG_VERSION ++ ", " ++ _COPYRIGHT)
&= help "blah blah blah"

Neil Mitchell said...

Linus: That wasn't possible in cmdargs-0.6.7, but in the just released 0.6.8 it is. You can do:

MyMode &= versionArg [summary "Summary only for Version"] &= summary "Summary only for Help"

That should solve your problem. The outputting two newlines after version was fixed in 0.6.7.

Linus said...

@Neil: Ah, thanks. It works exactly as you instructed.

Petr Novotnik said...

Neil, many thanks for CmdArgs! Also, thank you for the blogging about it.

I just wondered ... when using `enum' I get one command line flag for every enumerated value, i.e.:

./app --help
...
--text Generate nicely formatted text (default)
--confluence Generate confluence wiki markup
...

I was wondering whether cmdargs does/could also support the following on an `enum':

./app --help
...
--format=(text|confluence) Specify the output format
...

I somehow fail to tell cmdargs to use the latter style :-/ Could it be, that this is not supported?

Neil Mitchell said...

Petr: cmdargs does support both. If you do:

data MyEnum = A | B | C deriving (Eq,Show,Data,Typeable)

data MyRecord = MyRecord {field :: MyEnum}

Then if you use:

val = MyRecord {field = A}

cmdargs will automatically give you a flag named --field which takes values A/B/C.

Petr Novotnik said...

Neil, this is wonderful! Many thanks.

Nathan Collins said...

Neil, is it possible to make CmdArgs suggest that the user try "-h/--help" (or whatever the program's help options are) when the wrong number of arguments are passed?

E.g. in your 'guess' example you get

$ guess
Requires at least 2 arguments, got 0

but would it be possible to get something like

$ guess
Requires at least 2 arguments, got 0
Run guess -h for usage help.

instead?

Neil Mitchell said...

Nathan: Yes, that's a good idea - there's a bug tracking the implementation here: http://code.google.com/p/ndmitchell/issues/detail?id=418

Hopefully it will be done in the next version.