Summary: For certain hints HLint needs to determine if a Haskell expression is atomic. I wrote a generic method to generate expressions and test if they are atomic.
With HLint, if you write a statement such as:
main = print ("Hello")
You get the hint:
Sample.hs:1:14: Warning: Redundant bracket
Found:
("Hello")
Why not:
"Hello"
One of ways HLint figures out if brackets are redundant is if the expression inside the brackets is "atomic" - if you never have to bracket it in any circumstances. As an example, a literal string is atomic, but an if
expression is not. The isAtom
function from haskell-src-exts-util
has a list of the types of expression which are atomic, but the Exp
type from haskell-src-exts
has 55 distinct constructors, and I don't even know what many of them do. How can we check the isAtom
function is correct?
One approach is to use human thought, and that's the approach used until now, with reasonable success. However, I've recently written a script which solves the problem more permanently, generating random expressions and checking that isAtom
gives the right value. In this post I'm going to outline a few features of how that script works. There are basically three steps:
1) Generate a type-correct Exp
The first step is to generate a random Exp
which follows the type definition. Fortunately the Data
class in Haskell lets us generate values. We define:
mkValue :: forall a . Data a => Int -> IO a
mkValue depth
| Just x <- cast "aA1:+" = randomElem x
| Just x <- cast [-1 :: Int, 1] = randomElem x
| Just x <- cast [-1 :: Integer, 1] = randomElem x
| AlgRep cs <- dataTypeRep $ dataTypeOf (undefined :: a) =
if depth <= 0 then throwIO LimitReached else fromConstrM (mkValue $ depth - 1) =<< randomElem cs
Here we are saying that given a depth
, and a result type a
, we generate a value of type a
. Note that the a
argument is the result, but we don't pass anything in of type a
. The first three lines of the body follow the pattern:
| Just x <- cast [list_of_element] = randomElem x
This tries to convert list_of_element
to [a]
by using runtime type information. If it succeeds, we pick a random element from the list. If it doesn't we continue onwards.
The final case uses dataTypeRep
/dataTypeOf
to get a list of the constructors of a
. Note that we don't have a value of a
, so we make one up using undefined :: a
- but that's OK because dataTypeOf
promises not to look at its argument. Given a list of constructors, we pick one at random, and then call fromConstrM
- which says how to create a value of the right constructor, using some argument to fill in all the fields. We pass mkValue
as that argument, which causes us to recursively build up random values.
One immediate problem is what if we are building a [Int]
and the random generator often picks (:)
? We'll take a very long time to finish. To solve this problem we keep a depth
counter, decrement it in every recursive call, and when it runs out, throwIO
an exception and give up.
2) Generate a parsing Exp
Now we've got a valid Exp
value, but just because an Exp
can be represented in the AST doesn't mean it corresponds to Haskell fragment. As an example, consider Var (UnQual (Ident "Test"))
. That's a valid value of type Exp
, but if you pretty print it you get Test
, and if you parse it back you'll get Con (UnQual (Ident "Test"))
- variables must start with a leading lower-case letter.
To ignore invalid expressions we try pretty printing then parsing the expression, and ignore all expressions which don't roundtrip.
3) Determine if the Exp
is atomic
Now we've got a valid Exp
, which we know the user could have typed in as a source program, we need to figure out if isAtom
is correct. To do that we see if given expression x
whether self-application roundtrips, i.e. x x
. As a positive example, foo
(a variable) roundtrips as foo foo
being foo
applied to itself. However, if b then t else f
when applied to itself gives if b then t else f if b then t else f
, which parses back more like if b then t else f (if b then t else f)
, and is not atomic.
Putting it all together
Now we've got a random expression, and we know if the atomicity agrees with what we were expecting, we can report any differences. That approach has identified many additional patterns to match, but it's not perfect, in particular:
- Most values either exceed the depth limit or fail to roundtrip. For 10,000
if
expressions I typically get 1 or 2 which roundtrip properly. For non-if
expressions it's usually 100 or so. The advantage of random testing is that throwing more time at a problem solves such issues without thinking too hard.
- For some expressions, e.g.
ParComp
, I've never managed to get a valid value created. Perhaps haskell-src-exts
can't parse it, or perhaps it requires constants I don't have in my hardcoded list - none of these were particularly common examples.
haskell-src-exts
has a bug where -1
is pretty printed as (-1)
, which is then parsed as a paren and -1
. That fails step 2, so we don't test with negative literals. As it happens, non-negative literals are atomic, but negative literals aren't, so we need to take care.
- There are some patterns which appear to roundtrip successfully on their own, but not when surrounded by brackets, but secretly are just very weird. For example
do rec\n []
parses successfully, but with source positions that are error values, and when applied to itself pretty prints incorrectly. There's at least one haskell-src-exts
bug here.
- The program appears to leak progressively more memory. I solved that by running slices of it at a time, and didn't look too hard. I've seen cases of blowup in
Data
constructors when recursing, so it could be that. but needs investigating.
As a result of all this work a future HLint will spot unnecessary brackets for 20 more types of expression, 8 more types of pattern and 7 more types of type.