Sunday, May 22, 2011
Hoogle talk from TFP 2011 [PDF]
Last week I went to TFP 2011, and gave a talk on Hoogle entitled "Finding Functions from Types". The slides are now available online. These slides give some information about how the type searching works in Hoogle, and I intend to write further details in the future.
Labels:
hoogle
Sunday, May 08, 2011
CmdArgs is not dangerous
Summary: CmdArgs can be used purely, and even if you choose to use it in an impure manner, you don't need to worry.
As a result of my blog post yesterday, a few people have said they have been put off using CmdArgs. There are three reasons why you shouldn't be put off using CmdArgs, even though it has some impure code within it.
1: You can use CmdArgs entirely purely
You do not need to use the impure code at all. The module System.Console.CmdArgs.Implict provides two ways to write annotated records. The first way is impure, and uses cmdArgs and &=. The second way is pure, and uses cmdArgs_ and +=. Both ways have exactly the same power. For example, you can write either of:
The first definition is impure. The second is pure. Both are equivalent. I prefer the syntax of the first version, but the second version is not much longer or uglier. If the impurities scare you, just switch. The Implicit module documentation describes four simple rules for converting between the two methods.
2: You do not need to use annotated records at all
Notice that the above module is called Implicit, CmdArgs also features an Explicit version where you create a data type representing your command line arguments (which is entirely pure). For example:
Here you construct modes and flags explicitly, and pass functions which describe how to update the command line state. Everything in the Implicit module maps down to the Explicit module. In addition, you can use the GetOpt compatibility layer, which also maps down to the Explicit parser.
Having written command line parsers with annotated records, I'd never go back to writing them out in full. However, if you want to map your own command line parser description down to the Explicit version you can get help messages and parsing for free.
3: I fight the optimiser so you don't have to
Even if you choose to use the impure implicit version of CmdArgs, you don't need to do anything, even at high optimisation levels. I have an extensive test suite, and will continue to ensure CmdArgs programs work properly - I rely on it for several of my programs. While I find the implicit impure nicer to work with, I am still working on making a pure version with the same syntax, developing alternative methods of describing the annotations, developing quasi-quoters etc.
As a result of my blog post yesterday, a few people have said they have been put off using CmdArgs. There are three reasons why you shouldn't be put off using CmdArgs, even though it has some impure code within it.
1: You can use CmdArgs entirely purely
You do not need to use the impure code at all. The module System.Console.CmdArgs.Implict provides two ways to write annotated records. The first way is impure, and uses cmdArgs and &=. The second way is pure, and uses cmdArgs_ and +=. Both ways have exactly the same power. For example, you can write either of:
sample = cmdArgs $
Sample{hello = def &= help "World argument" &= opt "world"}
&= summary "Sample v1"
sample = cmdArgs_ $
record Sample{} [hello := def += help "World argument" += opt "world"]
+= summary "Sample v1"
The first definition is impure. The second is pure. Both are equivalent. I prefer the syntax of the first version, but the second version is not much longer or uglier. If the impurities scare you, just switch. The Implicit module documentation describes four simple rules for converting between the two methods.
2: You do not need to use annotated records at all
Notice that the above module is called Implicit, CmdArgs also features an Explicit version where you create a data type representing your command line arguments (which is entirely pure). For example:
arguments :: Mode [(String,String)]
arguments = mode "explicit" [] "Explicit sample program" (flagArg (upd "file") "FILE")
[flagOpt "world" ["hello","h"] (upd "world") "WHO" "World argument"
,flagReq ["greeting","g"] (upd "greeting") "MSG" "Greeting to give"
,flagHelpSimple (("help",""):)]
where upd msg x v = Right $ (msg,x):v
Here you construct modes and flags explicitly, and pass functions which describe how to update the command line state. Everything in the Implicit module maps down to the Explicit module. In addition, you can use the GetOpt compatibility layer, which also maps down to the Explicit parser.
Having written command line parsers with annotated records, I'd never go back to writing them out in full. However, if you want to map your own command line parser description down to the Explicit version you can get help messages and parsing for free.
3: I fight the optimiser so you don't have to
Even if you choose to use the impure implicit version of CmdArgs, you don't need to do anything, even at high optimisation levels. I have an extensive test suite, and will continue to ensure CmdArgs programs work properly - I rely on it for several of my programs. While I find the implicit impure nicer to work with, I am still working on making a pure version with the same syntax, developing alternative methods of describing the annotations, developing quasi-quoters etc.
Saturday, May 07, 2011
CmdArgs - Fighting the GHC Optimiser
Summary: Everyone using GHC 7 should upgrade to CmdArgs 0.7. GHC's optimiser got better, so CmdArgs optimiser avoidance code had to get better too.
Update: Has this post scared you off CmdArgs? Read why CmdArgs is not dangerous.
CmdArgs is a Haskell library for concisely specifying the command line arguments, see this post for an introduction. I have just released versions 0.6.10 and 0.7 of CmdArgs, which are a strongly recommended upgrade, particular for GHC 7 users. (The 0.6.10 is so anyone who specified 0.6.* in their Cabal file gets an automatic upgrade, and 0.7 is so people can write 0.7.* to require the new version.)
Why CmdArgs is impure
CmdArgs works by annotating fields to determine what command line argument parsing you want. Consider the data type:
By default this will create a command line which would respond to myprogram --file=foo. If instead we want to use myprogram foo we need to attach the annotation args to the file field. We can do this in CmdArgs with:
However, CmdArgs does not have to be impure - you can equally use the pure variant of CmdArgs and write:
Sadly, this code is a little more ugly. I prefer the impure version, but everything is supported in both versions.
I am still experimenting with other ways of writing an annotated record, weighing the trade-offs between purity, safety and the syntax required. I have been experimenting with pure variants tonight, and hope in a future release to make CmdArgs pure by default.
GHC vs CmdArgs
Because CmdArgs uses unsafe and untracked side effects, GHC's optimiser can manipulate the program in ways that change the semantics. A standard example where GHC's optimiser can harm CmdArgs is:
Here the subexpression "" &= typFile is duplicated, and if GHC spots this duplication, it can use common sub-expression eliminate to transform the program to:
Unfortunately, because CmdArgs is impure, this program attaches the annotation to file1, but not file2.
This optimisation problem happens in practice, and can be eliminated by writing {-# OPTIONS_GHC -fno-cse #-} in the source file defining the annotations. However, it is burdensome to require all users of CmdArgs to add pragmas to their code, so I investigated how to reduce the problem.
Beating GHC 6.10.4
The key function used to beat GHC's optimiser is &=. It is defined as:
In order to stop CSE, I use two tricks. Firstly, I mark &= as INLINE, so that it's definition ends up in the annotations - allowing me to try and modify the expression so it doesn't become suitable for CSE. For GHC 6.10.4 I then made up increasingly random expressions, with increasingly random pragmas, until the problem went away. The end solution was:
Beating GHC 7.0.1
Unfortunately, after upgrading to GHC 7.0.1, the problem happened again. I asked for help, and then started researching GHC's CSE optimiser (using the Hoogle support for GHC I added last weekend - a happy coincidence). The information I found is summarised here. Using this information I was able to construct a much more targeted solution (which I can actually understand!):
As before, we INLINE the &=, so it gets into the annotations. Now we want to make all annotations appear different, even though they are the same. We use unique which is equivalent to id, but when wrapped around an expression causes all instances to appear different under CSE. The unit binding has the value (), but in a way that GHC can't reduce, so the case does not get eliminated. GHC does not CSE case expressions, so all annotations are safe.
How GHC will probably defeat CmdArgs next
There are three avenues GHC could explore to defeat CmdArgs:
Of these optimisations, I consider the reduction of unit to be most likely, but also the easiest to counteract.
How CmdArgs will win
The only safe way for CmdArgs to win is to rewrite the library to be pure. I am working on various annotation schemes and hope to have something available shortly.
Update: Has this post scared you off CmdArgs? Read why CmdArgs is not dangerous.
CmdArgs is a Haskell library for concisely specifying the command line arguments, see this post for an introduction. I have just released versions 0.6.10 and 0.7 of CmdArgs, which are a strongly recommended upgrade, particular for GHC 7 users. (The 0.6.10 is so anyone who specified 0.6.* in their Cabal file gets an automatic upgrade, and 0.7 is so people can write 0.7.* to require the new version.)
Why CmdArgs is impure
CmdArgs works by annotating fields to determine what command line argument parsing you want. Consider the data type:
data Opts = Opts {file :: FilePath}
By default this will create a command line which would respond to myprogram --file=foo. If instead we want to use myprogram foo we need to attach the annotation args to the file field. We can do this in CmdArgs with:
cmdArgs $ Opts {file = "" &= args}
However, CmdArgs does not have to be impure - you can equally use the pure variant of CmdArgs and write:
cmdArgs_ $ record Opts{} [file := "" += args]
Sadly, this code is a little more ugly. I prefer the impure version, but everything is supported in both versions.
I am still experimenting with other ways of writing an annotated record, weighing the trade-offs between purity, safety and the syntax required. I have been experimenting with pure variants tonight, and hope in a future release to make CmdArgs pure by default.
GHC vs CmdArgs
Because CmdArgs uses unsafe and untracked side effects, GHC's optimiser can manipulate the program in ways that change the semantics. A standard example where GHC's optimiser can harm CmdArgs is:
Opts2 {file1 = "" &= typFile, file2 = "" &= typFile}
Here the subexpression "" &= typFile is duplicated, and if GHC spots this duplication, it can use common sub-expression eliminate to transform the program to:
let x = "" &= typFile in Opts2 {file1 = x, file2 = x}
Unfortunately, because CmdArgs is impure, this program attaches the annotation to file1, but not file2.
This optimisation problem happens in practice, and can be eliminated by writing {-# OPTIONS_GHC -fno-cse #-} in the source file defining the annotations. However, it is burdensome to require all users of CmdArgs to add pragmas to their code, so I investigated how to reduce the problem.
Beating GHC 6.10.4
The key function used to beat GHC's optimiser is &=. It is defined as:
(&=) :: (Data val, Data ann) => val -> ann -> val
(&=) x y = addAnn x y
In order to stop CSE, I use two tricks. Firstly, I mark &= as INLINE, so that it's definition ends up in the annotations - allowing me to try and modify the expression so it doesn't become suitable for CSE. For GHC 6.10.4 I then made up increasingly random expressions, with increasingly random pragmas, until the problem went away. The end solution was:
{-# INLINE (&=) #-}
(&=) :: (Data val, Data ann) => val -> ann -> val
(&=) x y = addAnn (id_ x) (id_ y)
{-# NOINLINE const_ #-}
const_ :: a -> b -> b
const_ f x = x
{-# INLINE id_ #-}
id_ :: a -> a
id_ x = const_ (\() -> ()) x
Beating GHC 7.0.1
Unfortunately, after upgrading to GHC 7.0.1, the problem happened again. I asked for help, and then started researching GHC's CSE optimiser (using the Hoogle support for GHC I added last weekend - a happy coincidence). The information I found is summarised here. Using this information I was able to construct a much more targeted solution (which I can actually understand!):
{-# INLINE (&=) #-}
(&=) x y = addAnn x (unique x)
{-# INLINE unique #-}
unique x = case unit of () -> x
where unit = reverse "" `seq` ()
As before, we INLINE the &=, so it gets into the annotations. Now we want to make all annotations appear different, even though they are the same. We use unique which is equivalent to id, but when wrapped around an expression causes all instances to appear different under CSE. The unit binding has the value (), but in a way that GHC can't reduce, so the case does not get eliminated. GHC does not CSE case expressions, so all annotations are safe.
How GHC will probably defeat CmdArgs next
There are three avenues GHC could explore to defeat CmdArgs:
- Reorder the optimisation phases. Performing CSE before inlining would defeat everything, as any tricks in &= would be ignored.
- Better CSE. Looking inside case would defeat my scheme.
- Reduce unit to (). This reduction could be done in a number of ways:
- Constructor specialisation of reverse should manage to reduce reverse to "", which then seq can evaluate, and then eliminate the case. I'm a little surprised this isn't already happening, but I'm sure it will one day.
- Supercompilation can inline recursive functions, and inlining reverse would eliminate the case.
- If GHC could determine reverse "" was total it could eliminate the seq without knowing it's value. This is somewhat tricky for reverse as it isn't total for infinite lists.
Of these optimisations, I consider the reduction of unit to be most likely, but also the easiest to counteract.
How CmdArgs will win
The only safe way for CmdArgs to win is to rewrite the library to be pure. I am working on various annotation schemes and hope to have something available shortly.
Sunday, May 01, 2011
Searching GHC with Hoogle
Summary: Hoogle can now search the GHC source code. There are also lots of small improvements in the latest version.
A few weeks ago Ranjit Jhala asked me for help getting Hoogle working on the GHC documentation. As a result of this conversation, I've now released Hoogle 4.2.3, and upgraded the Hoogle web tool.
For GHC developers
You can search the GHC documentation using the standard Hoogle website, for example: llvm +ghc
To search within a package simply write +package in your search query. The ghc package on Hoogle includes all the internals for GHC.
If you want to search using the console, you can install Hoogle and generate the GHC package database with:
You can now perform searches with:
For all Hoogle users
The new release of Hoogle contains a number of small enhancements:
The theory behind Hoogle
I'll be talking about the theory behind type searching in Hoogle at Trends in Functional Programming 2011 in Madrid in a few weeks time. It's not too late to register.
A few weeks ago Ranjit Jhala asked me for help getting Hoogle working on the GHC documentation. As a result of this conversation, I've now released Hoogle 4.2.3, and upgraded the Hoogle web tool.
For GHC developers
You can search the GHC documentation using the standard Hoogle website, for example: llvm +ghc
To search within a package simply write +package in your search query. The ghc package on Hoogle includes all the internals for GHC.
If you want to search using the console, you can install Hoogle and generate the GHC package database with:
cabal update
cabal install hoogle
hoogle data default ghc
You can now perform searches with:
hoogle +ghc llvm
For all Hoogle users
The new release of Hoogle contains a number of small enhancements:
- The web server has been upgraded to Warp. I'll write a blog post shortly on the move to Warp - but generally it's been a very positive step.
- Some of the snippets of documentation have been fixed, where the markup was interpreted wrongly.
- There is only an expand button next to the documentation if there is more information to expand.
- Some iPad integration, so you can now add it to your home page with a nice icon.
- Work on a deployment script to automate uploading a new version to the web server, which will allow for more frequent updates (until now it took over 2 hours to deploy a new version).
- Updates as some web resources moved around, particularly the Haskell Platform cabal file.
The theory behind Hoogle
I'll be talking about the theory behind type searching in Hoogle at Trends in Functional Programming 2011 in Madrid in a few weeks time. It's not too late to register.
Labels:
hoogle
Subscribe to:
Posts (Atom)