Summary: Using the technique from the previous post, here are three space leaks I found.
Every large Haskell program almost inevitably contains space leaks. This post examines three space leaks I found while experimenting with a space-leak detection algorithm. The first two space leaks have obvious causes, but I remain mystified by the third.
Hoogle leak 1
The motivation for looking at space leak detection tools was that Hoogle 5 kept suffering space leaks. Since Hoogle 5 is run on a virtual machine with only 1Gb of RAM, a space leak will often cause it to use the swap file for the heap, and destroy performance. I applied the detection techniques to the hoogle generate
command (which generates the databases), which told me that writeDuplicates
took over 100K of stack. The body of writeDuplicates
is:
xs <- return $ map (second snd) $ sortOn (fst . snd) $ Map.toList $
Map.fromListWith (\(x1,x2) (y1,y2) -> (min x1 y1, x2 ++ y2))
[(s,(p,[t])) | (p,(t,s)) <- zip [0::Int ..] xs]
storeWrite store TypesDuplicates $ jaggedFromList $ map (reverse . snd) xs
return $ map fst xs
I don't expect readers to understand the purpose of the code, but it is interesting to consider if you can spot the space leak, and if you'd have realised so while writing the code.
In order to narrow down the exact line, I inserted evaluate $ rnf ...
between each line, along with print
statements. For example:
print "step 1"
evaluate $ rnf xs
print "step 2"
xs <- return $ map (second snd) $ sortOn (fst . snd) $ Map.toList $
Map.fromListWith (\(x1,x2) (y1,y2) -> (min x1 y1, x2 ++ y2))
[(s,(p,[t])) | (p,(t,s)) <- zip [0::Int ..] xs]
evaluate $ rnf xs
print "step 3"
storeWrite store TypesDuplicates $ jaggedFromList $ map (reverse . snd) xs
print "step 4"
let res = map fst xs
evaluate $ rnf res
print "step 5"
return res
(Debugging tip: always use print
for debugging and never for real code, that way getting rid of all debugging output is easy.) It failed after printing step 2
, but before printing step 3
. Pulling each subexpression out and repeating the evaluate
/rnf
pattern I reduced the expression to:
Map.fromListWith (\(x1,x2) (y1,y2) -> (min x1 y1, x2 ++ y2)) xs
The fromListWith
function essentially performs a foldl
over values with duplicate keys. I was using Data.Map.Strict
, meaning it the fold was strict, like foldl'
. However, the result is a pair, so forcing the accumulator only forces the pair itself, not the first component, which contains a space leak. I effectively build up min x1 (min x1 (min x1 ...
in the heap, which would run faster and take less memory if reduced eagerly. I solved the problem with:
Map.fromListWith (\(x1,x2) (y1,y2) -> (, x2 ++ y2) $! min x1 y1) xs
After that the stack limit could be reduced a bit. Originally fixed in commit 102966ec, then refined in 940412cf.
Hoogle leak 2
The next space leak appeared in the function:
spreadNames (reverse . sortOn snd -> xs@((_,limit):_)) =
check $ f (99 + genericLength xs) maxBound xs
where
check xs | all (isCon . snd) xs && length (nubOrd $ map snd xs) == length xs = xs
| otherwise = error "Invalid spreadNames"
-- I can only assign values between mn and mx inclusive
f :: Word16 -> Word16 -> [(a, Int)] -> [(a, Name)]
f !mn !mx [] = []
f mn mx ((a,i):xs) = (a, Name real) : f (mn-1) (real-1) xs
where real = fromIntegral $ max mn $ min mx ideal
ideal = mn + floor (fromIntegral (min commonNameThreshold i) * fromIntegral (mx - mn) / fromIntegral (min commonNameThreshold limit))
I had already added !
in the definition of f
when writing it, on the grounds it was likely a candidate for space leaks (an accumulating map), so was immediately suspicious that I hadn't got it right. However, adding bang patterns near real
made no difference, so I tried systematically reducing the bug.
Since this code isn't in IO
, the evaluate
technique from the previous leak doesn't work. Fortunately, using seq
works, but is a bit more fiddly. To check the argument expression (reverse . sortOn
) wasn't leaking I made the change:
spreadNames (reverse . sortOn snd -> xs@((_,limit):_)) =
rnf xs `seq` trace "passed xs" (check $ f (99 + genericLength xs) maxBound xs)
I was slightly worried that the GHC optimiser may break the delicate seq
/trace
due to imprecise exceptions, but at -O0
that didn't happen. Successive attempts at testing different subexpressions eventually lead to genericLength xs
, which in this case returns a Word16
. The definition of genericLength
reads:
genericLength [] = 0
genericLength (_:l) = 1 + genericLength l
Alas, a very obvious space leak. In addition, the base
library provides two rules:
{-# RULES
"genericLengthInt" genericLength = (strictGenericLength :: [a] -> Int);
"genericLengthInteger" genericLength = (strictGenericLength :: [a] -> Integer);
#-}
If you use genericLength
on Int
or Integer
then it is replaced with a strict version without a space leak - but on Word16
the space leak remains. To solve this space leak I replaced genericLength xs
with fromIntegral (length xs)
in commit 12c46e93, which worked. After that change, the Hoogle test suite can be run with 1Kb of stack - a test that has been added to the continuous integration.
Shake leak
After solving the space leak from the original post, I was then able to run the entire test suite with 1Kb stack on my Windows machine. I made that an RTS option to the Cabal test suite, and my Linux continuous integration started failing. Further experimentation on a Linux VM showed that:
- The entire test failed at 50K, but succeeded at 100K.
- The excessive stack usage could be replicated with only two of the tests - the
tar
test followed by thebenchmark
test. Thetar
test is incredibly simple and likely any of the tests before beforebenchmark
would have triggered the issue. - The tests succeeded in 1K if running
benchmark
followed bytar
.
The initial assumption was that some CAF was being partially evaluated or created by the first test, and then used by the second, but I have yet to find any evidence of that. Applying -xc
suggested a handful of possible sites (as Shake catches and rethrows exceptions), but the one that eventually lead to a fix was extractFileTime
, defined as:
extractFileTime x = ceiling $ modificationTimeHiRes x * 1e4
And called from:
getFileInfo x = handleBool isDoesNotExistError (const $ return Nothing) $ do
s <- getFileStatus $ unpackU_ x
return $ Just (fileInfo $ extractFileTime s, fileInfo $ fromIntegral $ fileSize s)
There is a small (constant sized) space leak here - the result does not force extractTime
, but returns a pair containing thunks. In fact, getFileStatus
from the unix
library allocates a ForeignPtr
to store s
, so by not forcing the pair we cause the ForeignPtr
to live much longer than would be otherwise required. The fix from commit 2ee36a99 is simple:
getFileInfo x = handleBool isDoesNotExistError (const $ return Nothing) $ do
s <- getFileStatus $ unpackU_ x
a <- evaluate $ fileInfo $ extractFileTime s
b <- evaluate $ fileInfo $ fromIntegral $ fileSize s
return $! Just $! (a, b)
Afterwards the entire Shake test suite can be run in 1K. Since getFileInfo
is different on Windows vs Linux, I understand why the space leak doesn't occur on Windows. What I still don't understand is:
- How does running one test first cause the space leak in the second test?
- How does what looks like a small space leak result in over 49K additional stack space?
- Is the fact that
ForeignPtr
is involved behind the scenes somehow relevant?
I welcome any insights.
5 comments:
Is there any haskell tutorial for Java guys?
Gopi: I Google'd "haskell tutorial for java programmers" and up came a few - the first three all seem perfectly plausible for someone coming from Java (https://wiki.haskell.org/Hitchhikers_guide_to_Haskell, https://wiki.haskell.org/Haskell_Tutorial_for_C_Programmers, http://learnyouahaskell.com/introduction) - I'd pick whichever seems to follow your preferred style most closely.
I'd love to hear what opinions you have about what (if anything) these memory issues mean for the use of Haskell in mainstream software engineering. I have only used Haskell a very little bit in my own projects and have developed the highly unscientific opinion that lazy evaluation is a very bad fit with big, messy, committee-built projects that exist all over the place.
If you've editorialized on this elsewhere, links would be great.
Ben: I haven't editorialized anywhere other than in the pub. I work on a huge project with lots of Haskell. Purity and strong typing are really useful for large messy projects, since they stop it getting too messy too quickly.
Laziness doesn't as much either way, but given a preference, I'd prefer it rather than not, since it makes things more reusable. The main downside of laziness was space leaks, and performance (to a much smaller extent), in every other way it's better than strictness (in my opinion). This technique takes space leaks from a huge problem to a minor niggle, so in my opinion, makes the choice of laziness now much easier.
Thank you Niel for the article. It helps me to solve my problem easily.
Post a Comment