Neil Mitchell's Blog (Haskell etc): cmdargs

Showing posts with label cmdargs. Show all posts

Sunday, July 05, 2020

Automatic UI's for Command Lines with cmdargs

Summary: Run cmdargs-browser hlint and you can fill out arguments easily.

The Haskell command line parsing library cmdargs contains a data type that represents a command line. I always thought it would be a neat trick to transform that into a web page, to make it easier to explore command line options interactively - similar to how the custom-written wget::gui wraps wget.

I wrote a demo to do just that, named cmdargs-browser. Given any program that uses cmdargs (e.g. hlint), you can install cmdargs-browser (with cabal install cmdargs-browser) and run:

cmdargs-browser hlint

And it will pop up:

As we can see, the HLint modes are listed on the left (you can use lint, grep or test), the possible options on the right (e.g. normal arguments and --color) and the command line it produces at the bottom. As you change mode or add/remove flags, the command line updates. If you hit OK it then runs the program with the command line. The help is included next to the argument, and if you make a mistake (e.g. write foo for the --color flag) it tells you immediately. It could be more polished (e.g. browse buttons for file selections, better styling) but the basic concepts works well.

Technical implementation

I wanted every cmdargs-using program to support this automatic UI, but also didn't want to increase the dependency footprint or compile-time overhead for cmdargs. I didn't want to tie cmdargs to this particular approach to a UI - I wanted a flexible mechanism that anyone could use for other purposes.

To that end, I built out a Helper module that is included in cmdargs. That API provides the full power and capabilities on which cmdargs-browser is written. The Helper module is only 350 lines.

If you run cmdargs with either $CMDARGS_HELPER or $CMDARGS_HELPER_HLINT set (in the case of HLint) then cmdargs will run the command line you specify, passing over the explicit Mode data type on the stdin. That Mode data type includes functions, and using a simplistic communication channel on the stdin/stdout, the helper process can invoke those functions. As an example, when cmdargs-browser wants to validate the --color flag, it does so by calling a function in Mode, that secretly talks back to hlint to validate it.

At the end, the helper program can choose to either give an error message (to stop the program, e.g. if you press Cancel), or give some command lines to use to run the program.

Future plans

This demo was a cool project, which may turn out to be useful for some, but I have no intention to develop it further. I think something along these lines should be universally available for all command line tools, and built into all command line parsing libraries.

Historical context

All the code that makes this approach work was written over seven years ago. Specifically, it was my hacking project in the hospital while waiting for my son to be born. Having a little baby is a hectic time of life, so I never got round to telling anyone about its existence.

This weekend I resurrected the code and published an updated version to Hackage, deliberately making as few changes as possible. The three necessary changes were:

jQuery deprecated the live function replacing it with on, meaning the code didn't work.
I had originally put an upper bound of 0.4 for the transformers library. Deleting the upper bound made it work.
Hackage now requires that all your uploaded .cabal files declare that they require a version of 1.10 or above of Cabal itself, even if they don't.

Overall, to recover a project that is over 7 years old, it was surprisingly little effort.

Sunday, May 08, 2011

CmdArgs is not dangerous

Summary: CmdArgs can be used purely, and even if you choose to use it in an impure manner, you don't need to worry.

As a result of my blog post yesterday, a few people have said they have been put off using CmdArgs. There are three reasons why you shouldn't be put off using CmdArgs, even though it has some impure code within it.

1: You can use CmdArgs entirely purely

You do not need to use the impure code at all. The module System.Console.CmdArgs.Implict provides two ways to write annotated records. The first way is impure, and uses cmdArgs and &=. The second way is pure, and uses cmdArgs_ and +=. Both ways have exactly the same power. For example, you can write either of:


sample = cmdArgs $
    Sample{hello = def &= help "World argument" &= opt "world"}
    &= summary "Sample v1"

sample = cmdArgs_ $
    record Sample{} [hello := def += help "World argument" += opt "world"]
    += summary "Sample v1"

The first definition is impure. The second is pure. Both are equivalent. I prefer the syntax of the first version, but the second version is not much longer or uglier. If the impurities scare you, just switch. The Implicit module documentation describes four simple rules for converting between the two methods.

2: You do not need to use annotated records at all

Notice that the above module is called Implicit, CmdArgs also features an Explicit version where you create a data type representing your command line arguments (which is entirely pure). For example:


arguments :: Mode [(String,String)]
arguments = mode "explicit" [] "Explicit sample program" (flagArg (upd "file") "FILE")
     [flagOpt "world" ["hello","h"] (upd "world") "WHO" "World argument"
     ,flagReq ["greeting","g"] (upd "greeting") "MSG" "Greeting to give"
     ,flagHelpSimple (("help",""):)]
     where upd msg x v = Right $ (msg,x):v

Here you construct modes and flags explicitly, and pass functions which describe how to update the command line state. Everything in the Implicit module maps down to the Explicit module. In addition, you can use the GetOpt compatibility layer, which also maps down to the Explicit parser.

Having written command line parsers with annotated records, I'd never go back to writing them out in full. However, if you want to map your own command line parser description down to the Explicit version you can get help messages and parsing for free.

3: I fight the optimiser so you don't have to

Even if you choose to use the impure implicit version of CmdArgs, you don't need to do anything, even at high optimisation levels. I have an extensive test suite, and will continue to ensure CmdArgs programs work properly - I rely on it for several of my programs. While I find the implicit impure nicer to work with, I am still working on making a pure version with the same syntax, developing alternative methods of describing the annotations, developing quasi-quoters etc.

Saturday, May 07, 2011

CmdArgs - Fighting the GHC Optimiser

Summary: Everyone using GHC 7 should upgrade to CmdArgs 0.7. GHC's optimiser got better, so CmdArgs optimiser avoidance code had to get better too.

Update: Has this post scared you off CmdArgs? Read why CmdArgs is not dangerous.

CmdArgs is a Haskell library for concisely specifying the command line arguments, see this post for an introduction. I have just released versions 0.6.10 and 0.7 of CmdArgs, which are a strongly recommended upgrade, particular for GHC 7 users. (The 0.6.10 is so anyone who specified 0.6.* in their Cabal file gets an automatic upgrade, and 0.7 is so people can write 0.7.* to require the new version.)

Why CmdArgs is impure

CmdArgs works by annotating fields to determine what command line argument parsing you want. Consider the data type:


data Opts = Opts {file :: FilePath}

By default this will create a command line which would respond to myprogram --file=foo. If instead we want to use myprogram foo we need to attach the annotation args to the file field. We can do this in CmdArgs with:


cmdArgs $ Opts {file = "" &= args}

However, CmdArgs does not have to be impure - you can equally use the pure variant of CmdArgs and write:


cmdArgs_ $ record Opts{} [file := "" += args]

Sadly, this code is a little more ugly. I prefer the impure version, but everything is supported in both versions.

I am still experimenting with other ways of writing an annotated record, weighing the trade-offs between purity, safety and the syntax required. I have been experimenting with pure variants tonight, and hope in a future release to make CmdArgs pure by default.

GHC vs CmdArgs

Because CmdArgs uses unsafe and untracked side effects, GHC's optimiser can manipulate the program in ways that change the semantics. A standard example where GHC's optimiser can harm CmdArgs is:


Opts2 {file1 = "" &= typFile, file2 = "" &= typFile}

Here the subexpression "" &= typFile is duplicated, and if GHC spots this duplication, it can use common sub-expression eliminate to transform the program to:


let x = "" &= typFile in Opts2 {file1 = x, file2 = x}

Unfortunately, because CmdArgs is impure, this program attaches the annotation to file1, but not file2.

This optimisation problem happens in practice, and can be eliminated by writing {-# OPTIONS_GHC -fno-cse #-} in the source file defining the annotations. However, it is burdensome to require all users of CmdArgs to add pragmas to their code, so I investigated how to reduce the problem.

Beating GHC 6.10.4

The key function used to beat GHC's optimiser is &=. It is defined as:


(&=) :: (Data val, Data ann) => val -> ann -> val
(&=) x y = addAnn x y

In order to stop CSE, I use two tricks. Firstly, I mark &= as INLINE, so that it's definition ends up in the annotations - allowing me to try and modify the expression so it doesn't become suitable for CSE. For GHC 6.10.4 I then made up increasingly random expressions, with increasingly random pragmas, until the problem went away. The end solution was:


{-# INLINE (&=) #-}
(&=) :: (Data val, Data ann) => val -> ann -> val
(&=) x y = addAnn (id_ x) (id_ y)

{-# NOINLINE const_ #-}
const_ :: a -> b -> b
const_ f x = x

{-# INLINE id_ #-}
id_ :: a -> a
id_ x = const_ (\() -> ()) x

Beating GHC 7.0.1

Unfortunately, after upgrading to GHC 7.0.1, the problem happened again. I asked for help, and then started researching GHC's CSE optimiser (using the Hoogle support for GHC I added last weekend - a happy coincidence). The information I found is summarised here. Using this information I was able to construct a much more targeted solution (which I can actually understand!):


{-# INLINE (&=) #-}
(&=) x y = addAnn x (unique x)

{-# INLINE unique #-}
unique x = case unit of () -> x
    where unit = reverse "" `seq` ()

As before, we INLINE the &=, so it gets into the annotations. Now we want to make all annotations appear different, even though they are the same. We use unique which is equivalent to id, but when wrapped around an expression causes all instances to appear different under CSE. The unit binding has the value (), but in a way that GHC can't reduce, so the case does not get eliminated. GHC does not CSE case expressions, so all annotations are safe.

How GHC will probably defeat CmdArgs next

There are three avenues GHC could explore to defeat CmdArgs:

Reorder the optimisation phases. Performing CSE before inlining would defeat everything, as any tricks in &= would be ignored.

Better CSE. Looking inside case would defeat my scheme.

Reduce unit to (). This reduction could be done in a number of ways:
- Constructor specialisation of reverse should manage to reduce reverse to "", which then seq can evaluate, and then eliminate the case. I'm a little surprised this isn't already happening, but I'm sure it will one day.
- Supercompilation can inline recursive functions, and inlining reverse would eliminate the case.
- If GHC could determine reverse "" was total it could eliminate the seq without knowing it's value. This is somewhat tricky for reverse as it isn't total for infinite lists.

Of these optimisations, I consider the reduction of unit to be most likely, but also the easiest to counteract.

How CmdArgs will win

The only safe way for CmdArgs to win is to rewrite the library to be pure. I am working on various annotation schemes and hope to have something available shortly.

Monday, August 23, 2010

CmdArgs Example

Summary: A simple CmdArgs parser is incredibly simple (just a data type). A more advanced CmdArgs parser is still pretty simple (a few annotations). People shouldn't be using getArgs even for quick one-off programs.

A few days ago I went to the Haskell Hoodlums meeting - an event in London aimed towards people learning Haskell. The event was good fun, and very well organised - professional quality Haskell tutition for free by friendly people - the Haskell community really is one of Haskell's strengths! The final exercise was to write a program that picks a random number between 1 and 100, then has the user take guesses, with higher/lower hints. After writing the program, Ganesh suggested adding command line flags to control the minimum/maximum numbers. It's not too hard to do this directly with getArgs:


import System.Environment

main = do
    [min,max] <- getArgs
    print (read min, read max)

Next we discussed adding an optional limit on the number of guesses the user is allowed. It's certainly possible to extend the getArgs variant to take in a limit, but things are starting to get a bit ugly. If the user enters the wrong number of arguments they get a pattern match error. There is no help message to inform the user which flags the program takes. While getArgs is simple to start with, it doesn't have much flexibility, and handles errors very poorly. However, for years I used getArgs for all one-off programs - I found the other command line parsing libraries (including GetOpt) added too much overhead, and always required referring back to the documentation. To solve this problem I wrote CmdArgs.

A Simple CmdArgs Parser

To start using CmdArgs we first define a record to capture the information we want from the command line:


data Guess = Guess {min :: Int, max :: Int, limit :: Maybe Int} deriving (Data,Typeable,Show)

For our number guessing program we need a minimum, a maximum, and an optional limit. The deriving clause is required to operate with the CmdArgs library, and provides some basic reflection capabilities for this data type. Once we've written this data type, a CmdArgs parser is only one function call away:


{-# LANGUAGE DeriveDataTypeable #-}
import System.Console.CmdArgs

data Guess = Guess {min :: Int, max :: Int, limit :: Maybe Int} deriving (Data,Typeable,Show)

main = do
    x <- cmdArgs $ Guess 1 100 Nothing
    print x

Now we have a simple command line parser. Some sample interactions are:


$ guess --min=10
NumberGuess {min = 10, max = 100, limit = Nothing}

$ guess --min=10 --max=1000
NumberGuess {min = 10, max = 1000, limit = Nothing}

$ guess --limit=5
NumberGuess {min = 1, max = 100, limit = Just 5}

$ guess --help
The guess program

guess [OPTIONS]

  -? --help       Display help message
  -V --version    Print version information
     --min=INT
     --max=INT
  -l --limit=INT

Adding Features to CmdArgs

Our simple CmdArgs parser is probably sufficient for this task. I doubt anyone will be persuaded to use my guessing program without a fancy iPhone interface. However, CmdArgs provides all the power necessary to customise the parser, by adding annotations to the input value. First, we can modify the parser to make it easier to add our annotations:


guess = cmdArgsMode $ Guess {min = 1, max = 100, limit = Nothing}

main = do
    x <- cmdArgsRun guess
    print x

We have changed Guess to use record syntax for constructing the values, which helps document what we are doing. We've also switched to using cmdArgsMode/cmdArgsRun (cmdArgs which is just a composition of those two functions) - this helps avoid any problems with capturing the annotations when running repeatedly in GHCi. Now we can add annotations to the guess value:


guess = cmdArgsMode $ Guess
    {min = 1   &= argPos 0 &= typ "MIN"
    ,max = 100 &= argPos 1 &= typ "MAX"
    ,limit = Nothing &= name "n" &= help "Limit the number of choices"}
    &= summary "Neil's awesome guessing program"

Here we've specified that min/max must be at argument position 0/1, which more closely matches the original getArgs parser - this means the user is always forced to enter a min/max (they could be made optional with the opt annotation). For the limit we've added a name annotation to say that we'd like the flag -n to map to limit, instead of using the default -l. We've also given limit some help text, which will be displayed with --help. Finally, we've given a different summary line to the program.

We can now interact with our new parser:


$ guess
Requires at least 2 arguments, got 0

$ guess 1 100
Guess {min = 1, max = 100, limit = Nothing}

$ guess 1 100 -n4
Guess {min = 1, max = 100, limit = Just 4}

$ guess -?
Neil's awesome guessing program

guess [OPTIONS] MIN MAX

  -? --help       Display help message
  -V --version    Print version information
  -n --limit=INT  Limit the number of choices

The Complete Program

For completeness sake, here is the complete program. I think for this program the most suitable CmdArgs parser is the simpler one initially written, which I have used here:


{-# LANGUAGE DeriveDataTypeable, RecordWildCards #-}

import System.Random
import System.Console.CmdArgs

data Guess = Guess {min :: Int, max :: Int, limit :: Maybe Int} deriving (Data,Typeable)

main = do
    Guess{..} <- cmdArgs $ Guess 1 100 Nothing
    answer <- randomRIO (min,max)
    game limit answer

game (Just 0) answer = putStrLn "Limit exceeded"
game limit answer = do
    putStr "Have a guess: "
    guess <- fmap read getLine
    if guess == answer then
        putStrLn "Awesome!!!1"
     else do
        putStrLn $ if guess > answer then "Too high" else "Too low"
        game (fmap (subtract 1) limit) answer

(The code in this post can be freely reused for any purpose, unless you are porting it to the iPhone, in which case I want 10% of all revenues.)

Monday, August 16, 2010

CmdArgs 0.2 - command line argument processing

I've just released CmdArgs 0.2 to Hackage. CmdArgs is a library for defining and parsing command lines. The focus of CmdArgs is allowing the concise definition of fully-featured command line argument processors, in a mainly declarative manner (i.e. little coding needed). CmdArgs also supports multiple mode programs, as seen in darcs and Cabal. For some examples of CmdArgs, please see the manual or the Haddock documentation.

For the last month I've been working on a complete rewrite of CmdArgs. The original version of CmdArgs did what I was hoping it would - it was concise and easy to use. However, CmdArgs 0.1 had lots of rough edges - some features didn't work together, there were lots of restrictions on which types of fields appear where, and it was hard to add new features. CmdArgs 0.2 is a ground up rewrite which is designed to make it easy to maintain and improve in the future.

The CmdArgs 0.2 API is incompatible with CmdArgs 0.1, for which I apologise. Some of the changes you will notice:

Several of the annotations have changed name (text has become help, empty has become opt).

Instead of writing annotations value &= a1 & a2, you write value &= a1 &= a2.

Instead of cmdArgs "Summary" [mode1], you write cmdArgs (mode1 &= summary "Summary"). If you need a multi mode program use the modes function.

All the basic principles have remained the same, all the old features have been retained, and translating parsers should be simple local tweaks. If you need any help please contact me.

Explicit Command Line Processor

The biggest change is the introduction of an explicit command line framework in System.Console.CmdArgs.Explicit. CmdArgs 0.1 is an implicit command line parser - you define a value with annotations from which a command line parser is inferred. Unfortunately there was not complete separation between determining what parser the user was hoping to define, and then executing it. The result was that even at the flag processing stage there were still complex decisions being made based on type. CmdArgs 0.2 has a fully featured explicit parser that can be used separately, which can process command line flags and display help messages. Now the implicit parser first translates to the explicit parser (capturing the users intentions), then executes it. The advantages of having an explicit parser are substantial.

It is possible to translate the implicit parser to an explicit parser (cmdArgsMode), which makes testing substantially easier. As a result CmdArgs 0.2 has far more tests.

I was able to write the CmdArgs command line itself in CmdArgs. This command line is a multiple mode explicit parser, which has many sub modes defined by implicit parsers.

The use of explicit parsers also alleviates one of the biggest worries of users, the impurity. Only the implicit parser relies on impure functions to extract annotations. In particular, you can create the explicit parser once, then rerun it multiple times, without worrying that GHC will optimise away the annotations.

Once you have an explicit parser, you can modify it afterwards - for example programmatically adding some flags.

The explicit parser better separates the internals of the program, making each stage simpler.

The introduction of the explicit parser was a great idea, and has dramatically improved CmdArgs. However, there are still some loose ends. The translation from implicit to explicit is a multi-stage translation, where each stage infers some additional information. While this process works, it is not very direct - the semantics of an annotation are hard to determine, and there are ordering constrains on these stages. It would be much better if I could concisely and directly express the semantics of the annotations, which is something I will be thinking about for CmdArgs 0.3.

GetOpt Compatibility

While the intention is that CmdArgs users write implicit parsers, it is not required. It is perfectly possible to write explicit parsers directly, and benefit from the argument processing and help text generation. Alternatively, it is possible to define your own command line framework, which is then translated in to an explicit parser. CmdArgs 0.2 now includes a GetOpt translator, which presents an API compatible with GetOpt, but operates by translating to an explicit parser. I hope that other people writing command line frameworks will consider reusing the explicit parser.

Capturing Annotations

As part of the enhanced separation of CmdArgs, I have extracted all the code that captures annotations, and made it more robust. The essential operation is now &= which takes a value and attaches an annotation. (x &= a) &= b can then be used to attach the two annotations a and b (the brackets are optional). Previously the capturing of annotations and processing of flags was interspersed, meaning the entire library was impure - now all impure operations are performed inside capture. The capturing framework is not fully generic (that would require module functors, to parameterise over the type of annotation), but otherwise is entirely separate.

Consistentency

One of the biggest changes for users should be the increase in consistency. In CmdArgs 0.1 certain annotations were bound to certain types - the args annotation had to be on a [String] field, the argPos annotation had to be on String. In the new version any field can be a list or a maybe, of any atomic type - including Bool, Int/Integer, Double/Float and tuples of atomic types. With this support it's trivial to write:


data Sample = Sample {size :: Maybe (Int,Int)}

Now you can run:


$ sample
Sample {size = Nothing}

$ sample --size=1,2
Sample {size = Just (1,2)}

$ sample --size=1,2 --size=3,4
Sample {size = Just (3,4)}

Hopefully this increase in consistency will make CmdArgs much more predictable for users.

Help Output

The one thing I'm still wrestling with is the help output. While the help output is reasonably straightforward for single mode programs, I am still undecided how multiple mode programs should be displayed. Currently CmdArgs displays something I think is acceptable, but not optimal. I am happy to take suggestions for how to improve the help output.

Conclusion

CmdArgs 0.1 was an experiment - is it possible to define concise command line argument processors? The result was a working library, but whose potential for enhancement was limited by a lack of internal separation. CmdArgs 0.2 is a complete rewrite, designed to put the ideas from CmdArgs 0.2 into practical use, and allow lots of scope for further improvements. I hope CmdArgs will become the standard choice for most command line processing tasks.

Saturday, January 09, 2010

Using .ghci files to run projects

I develop a reasonable number of different Haskell projects, and tend to switch between them regularly. I often come back to a project after a number of months and can't remember the basics - how to load it up, how to test it. Until yesterday, my technique was to add a ghci.bat file to load the project, and invoke the tests with either :main test or :main --test. For some projects I also had commands to run hpc, or perform profiling. Using .ghci files, I can do much better.

All my projects are now gaining .ghci files in the root directory. For example, the CmdArgs project has:


:set -w -fwarn-unused-binds -fwarn-unused-imports
:load Main

let cmdTest _ = return ":main test"
:def test cmdTest

The first line sets some additional warnings. I usually develop my projects in GHCi with lots of useful warnings turned on. I could include these warnings in the .cabal file, but I'd rather have them when developing, and not display them when other people are installing.

The second line loads the Main.hs file, which for CmdArgs is the where I've put the main tests.

The last two lines define a command :test which when invoked just runs :main test, which is how I run the test for CmdArgs.

To load GHCi with this configuration file I simply change to the directory, and type ghci. It automatically loads the right files, and provides a :test command to run the test suite.

I've also converted the HLint project to use a .ghci file. This time the .ghci file is slightly different, but the way I load/test my project is identical:


:set -fno-warn-overlapping-patterns -w -fwarn-unused-binds -fwarn-unused-imports
:set -isrc;.
:load Main

let cmdTest _ = return ":main --test"
:def test cmdTest

let cmdHpc _ = return $ unlines [":!ghc --make -isrc -i. src/Main.hs -w -fhpc -odir .hpc -hidir .hpc -threaded -o .hpc/hlint-test",":!del hlint-test.tix",":!.hpc\\hlint-test --help",":!.hpc\\hlint-test --test",":!.hpc\\hlint-test src --report=.hpc\\hlint-test-report.html +RTS -N3",":!.hpc\\hlint-test data --report=.hpc\\hlint-test-report.html +RTS -N3",":!hpc.exe markup hlint-test.tix --destdir=.hpc",":!hpc.exe report hlint-test.tix",":!del hlint-test.tix",":!start .hpc\\hpc_index_fun.html"]
:def hpc cmdHpc

The first section turns on/off the appropriate warnings, then sets the include path and loads the main module.

The second section defines a command named :test, which runs the tests.

The final section defines a command named :hpc which runs hpc and pops up a web browser with the result. Unfortunately GHC requires definitions entered in a .ghci file to be on one line, so the formatting isn't ideal, but it's just a list of commands to run at the shell.

Using a .ghci file for all my projects has a number of advantages:

I have a consistent interface for all my projects.

Typing :def at the GHCi prompt says which definitions are in scope, and thus which commands exist for this project.

I've eliminated the Windows specific .bat files.

The .ghci file mechanism is quite powerful - I've yet to explore it fully, but could imagine much more complex commands.

Update: For a slight improvement on this technique see this post.

Thanks to Gwern Branwen for submitting a .ghci file for running HLint, and starting my investigation of .ghci files.

Neil Mitchell's Blog (Haskell etc)