$ ghci
GHCi, version 6.8.2: http://www.haskell.org/ghc/ :? for help
Loading package base ... linking ... done.
-- load some useful packages
Prelude> :m Text.HTML.TagSoup Text.HTML.Download Data.List Data.Char Data.Maybe
Prelude Data.Maybe Data.Char Data.List Text.HTML.Download Text.HTML.TagSoup>
-- ouch, that prompt is a bit long - we can use :set prompt to shorten it
-- side note: I actually supplied the patch for set prompt :)
:set prompt "Meep> "
-- lets download the list of packages
Meep> src <- openURL "http://hackage.haskell.org/packages/archive/pkg-list.html"
... src scrolls pass the screen ...
-- parse the file, dropping everything before the packages
Meep> let parsed = dropWhile (~/= "<h3>") $ parseTags src
-- grab the list of packages
Meep> let packages = sort [x | a:TagText x:_ <- tails parsed, a ~== "<a href>"]
-- now we can query the list of packages
Meep> length packages
648
Meep> length $ filter (all isLower) packages
320
Meep> length $ filter ('_' `elem`) packages
0
Meep> length $ filter ('-' `elem`) packages
165
Meep> length $ filter (any isUpper . dropWhile isUpper) packages
100
Meep> length $ filter (isPrefixOf "hs" . map toLower) packages
47
Meep> length $ filter (any isDigit) packages
37
Meep> reverse $ sort $ map (\(x:xs) -> (1 + length xs,x)) $ group $ sort $ conca
t packages
[(484,'e'),(374,'a'),(346,'r'),(336,'s'),(335,'t'),(306,'i'),(272,'l'),(248,'c')
,(247,'n'),(240,'o'),(227,'p'),(209,'h'),(185,'-'),(171,'m'),(159,'d'),(126,'g')
,(112,'b'),(96,'u'),(87,'y'),(78,'k'),(76,'f'),(74,'x'),(58,'S'),(53,'H'),(35,'w
'),(33,'v'),(29,'q'),(29,'L'),(27,'A'),(26,'F'),(24,'D'),(23,'C'),(22,'T'),(16,'
P'),(16,'M'),(16,'I'),(16,'G'),(13,'B'),(12,'W'),(12,'3'),(12,'2'),(10,'O'),(9,'
R'),(9,'1'),(8,'z'),(8,'j'),(8,'E'),(7,'X'),(7,'U'),(7,'N'),(6,'Y'),(6,'V'),(5,'
J'),(4,'Q'),(4,'5'),(4,'4'),(3,'Z'),(3,'8'),(3,'6'),(1,'9')]
We can see that loads of packages use lowercase, lots of packages use upper case, quite a few use CamelCase, quite a few start with "hs", none use "_", but lots use "-". The final query figures out which is the most common letter in hackage packages, and rather unsurprisingly, it roughly follows the frequency of English letters.
TagSoup and GHCi make a potent combination for obtaining and playing with webpages.