Summary: I wrote a script to list all operators on Hackage, and which packages they are used by.
In GHC 7.10 the
*> operator will be moving into the Prelude, which means the Shake library will have to find an alternative operator (discussion on the mailing list). In order to pick a sensible operator, I wanted to list all operators in all Hackage packages so I could be aware of clashes.
- See the list of all exported operators on Hackage.
- The most exported operator is
!, which is exported from 175 packages.
- The package with most exported operators is plumbers which has 931 operators.
Note that exported operators is more than just those defined by the package, e.g. Shake exports the
Eq class, so
== is counted as being exported by Shake. However, in most cases, operators exported by a package are defined by that package.
Producing the file
First I downloaded the Hoogle databases from Hackage, and extracted them to a directory named
hoogle. I then ran:
ghc --make Operators.hs && operators hoogle operators.txt
operators.txt above. The code for
import Control.Exception.Extra import Control.Monad import Data.List.Extra import System.Directory.Extra import System.Environment import System.FilePath import System.IO.Extra main = do [dir,out] <- getArgs files <- listFilesRecursive dir xs <- forM files $ \file -> do src <- readFileUTF8' file `catch_` \_ -> readFile' file `catch_` \_ -> return "" return [("(" ++ takeWhile (/= ')') x ++ ")", takeBaseName file) | '(':x <- lines src] writeFileUTF8 out $ unlines [unwords $ a : nub b | (a,b) <- groupSort $ concat xs]
This code relies on the normal packages distributed with GHC, plus the extra package.
The script is pretty simple. I first get two arguments, which is where to find the extracted files, and where to write the result. I then use
listFilesRecursive to recursively find all extracted files, and
forM to loop over them. For each file I read it in (trying first UTF8, then normal encoding, then giving up). For each line I look for
( as the first character, and form a list of
After producing the list, I use
groupSort to produce
[(operator-name, [package])] then
writeFileUTF8 to produce the output. Running the script takes just over a minute on my ancient computer.
Writing the code
Writing the code to produce the operator list took about 15 minutes, and I made some notes as I was going.
- I started by loading up ghcid for the file with the command line
ghcid -t -c "ghci Operators.hs". Now every save immediately resulted in a list of warnings/errors, and I never bothered opening the file in
ghci, I just compiled it to test.
- I started by inserting
take 20 filesso I could debug the script faster and could manually check the output was plausible.
- At first I wrote
takeBaseName srcrather than
takeBaseName file. That produced a lot of totally incorrect output, woops.
- At first I used
readFileto suck in the data and
putStrto print it to the console. That corrupted Unicode operators, so I switched to
- After switching to
writeFileUTF8I found a rather serious bug in the extra library, which I fixed and added tests for, then made a new release.
- After trying to search through the results, I added
)around each operator to make it easier to search for the operators I cared about.
To calculate the stats of most exported operator and package with most operators I wrote two lines of code - how would you write such code? Hint: both my lines involved