Summary: I added a degenerate foldr to map rule in the new version of HLint, here I describe how it works.
I've just released HLint 2.0.10, which includes a rule to recognise uses of foldr
that should really be map
. As an example:
foldr (\curr acc -> (+1) curr : acc) []
Can be rewritten as:
map (\curr -> (+1) curr)
Which is much more readable (and then subsequently HLint will suggest map (+1)
, which is vastly clearer than the initial foldr
). The change required to HLint was to add a rule to the hlint.yaml
saying:
- warn: {lhs: "foldr (\\c a -> x : a) []", rhs: "map (\\c -> x)"}
You can read this statement as saying if you see foldr (\c a -> x : a) []
, suggest map (\c -> x)
as a warning. The HLint matching engine then applies that template to every subexpression in your program. In the rest of the post I'll talk through the steps HLint performs.
Step 1: Unification
The first step is to try unifying the template foldr (\c a -> x : a) []
against the users subexpression, namely foldr (\curr acc -> (+1) curr : acc) []
. HLint is trying to find assignments for the single-letter variables in the template (namely c
, a
and x
) which cause it to match the subexpression. Unification proceeds top-down, and if it finds anything concrete that does not match (e.g. the user had written foldl
) then it fails. In this case the unification succeeds with the bindings:
c
=curr
(from the first argument to the lambda)a
=acc
(from the second argument to the lambda)x
=(+1) curr
(from before the cons)a
=acc
(from after the cons)
An example of a subexpression that would have failed unification is foldl (\curr acc -> (+1) curr : acc) []
.
Step 2: Validity
The next step is to check that any value which has been bound more than once is equal in all bindings. In our case only a
has been used twice, and it always binds to acc
, so the unification is valid.
An example of a subexpression that would have failed validity is foldr (\curr acc -> (+1) curr : xs) []
.
Step 3: Substitution
Now we've got some bindings, we can substitute them into the RHS, namely map (\c -> x)
. We replace c
and x
using the bindings above. Note that a
isn't mentioned on the RHS, so we don't use it. After substitution we get:
map (\curr -> (+1) curr)
Step 4: Free variable check
Consider the expression foldr (\curr acc -> f acc : acc) []
. Using the rules above we'd end up with map (\curr -> f acc)
, which is terrible, since we've gone from referring to a locally bound acc
to whatever acc
is in scope (if any). To solve that, we check that the result doesn't introduce any new free variables:
(freeVars result \\ freeVars hintRuleRHS) `isSubsetOf` freeVars original
Specifically any free variables introduced in the result, which weren't in the RHS (excluding the fake unification variables), must have been in the original subexpression.
With that, for foldr
, we're done. There are a handful of other steps that apply in some cases.
Step A: Dot expansion in the template
If you write a hint map f (map g x) ==> map (f . g) x
then HLint notices that also implies the rule map f . map g ==> map (f . g)
and adds it. As a result, you shouldn't write your HLint rules in point-free style.
Step B: Dot/dollar expansion in the subexpression
When matching a subexpression HLint will expand f $ x
and (f . g) x
if doing so results in a match. These operators are used commonly enough that they are often treated more like brackets than functions.
Step C: Scope matching
When unifying qualified function names, HLint uses the active imports to guess whether they match. If you have import qualified Data.Vector as V
then the subexpression V.length
will unify with Data.Vector.length
. Since HLint doesn't have complete import information it uses a few heuristics to figure out matching.
Step D: Scope moving
Similarly to scope matching on the LHS of a rule, after matching, HLint tries to requalify any necessary values on the RHS. As an example, assuming we are producing Data.Vector.null
, if we know about import qualified Data.Vector as V
then we suggest V.null
.
Full code
To see the full code and all supporting definitions go to the HLint source, which defines matchIdea
- here I show a gently simplified version. Given scope information, a rule (LHS and RHS) and a subexpression, we optionally produce a resulting expression after substitution.
matchIdea :: Scope -> HintRule -> Exp_ -> Maybe Exp_
matchIdea s HintRule{..} original = do
u <- unifyExp hintRuleLHS original
u <- validSubst u
-- need to check free vars before unqualification, but after subst (with e)
-- need to unqualify before substitution (with res)
let result = substitute u hintRuleRHS
guard $ (freeVars result Set.\\ Set.filter (not . isUnifyVar) (freeVars hintRuleRHS))
`Set.isSubsetOf` freeVars original
-- check no unexpected new free variables
return result
Funny enough
ReplyDelete$ git clone https://github.com/ndmitchell/hlint
$ hlint hlint
...
32 hints
$ git clone https://github.com/ndmitchell/hlint
ReplyDelete$ cd hlint
$ hlint src
No hints
And I check that in the test suite :-)
For info, when I repeat the experiment you did I get 33 hints. There are 41 hints inside non-code locations (e.g. misc, dist, generated test files), and I ignore 7 hints using the .hlint.yaml file in the root. I'm now curious why I don't get 48 hints with your experiment...
I now know what "hlint ." has more errors - because I turn on lots of additional things as errors in the .hlint.yaml file which your experiment doesn't use. That explains all the differences in numbers.
ReplyDeleteNice article by the way. Thanks
ReplyDelete