tag:blogger.com,1999:blog-70946522024-03-14T08:23:11.990+00:00Neil Mitchell's Blog (Haskell etc)Neil Mitchellhttp://www.blogger.com/profile/13084722756124486154noreply@blogger.comBlogger364125tag:blogger.com,1999:blog-7094652.post-73297272524972186022022-05-04T15:27:00.001+01:002022-05-04T16:16:55.011+01:00Working on build systems full-time at Meta<p>
<em>Summary: I joined Meta 2.5 years ago to work on build systems. I’m enjoying it.</em>
</p>
<p>
I joined <a href="https://about.facebook.com/">Meta</a> over two years ago when an opportunity arose to work on build systems full time. I started the <a href="https://ndmitchell.com/#shake_01_oct_2010">Shake build system at Standard Chartered</a> over 10 years ago, and then wrote an <a href="https://shakebuild.com/">open source version</a> a few years later. Since then, I’ve always been dabbling in build systems, at both <a href="https://neilmitchell.blogspot.com/2021/09/reflecting-on-shake-build-system.html">moderate</a> and <a href="https://neilmitchell.blogspot.com/2021/09/small-project-build-systems.html">small</a> scale. I really enjoyed writing the <a href="https://ndmitchell.com/#shake_21_apr_2020">Build Systems a la Carte paper</a>, and as a result, started to appreciate some of the <a href="https://bazel.build/">Bazel</a> and <a href="https://buck.build/">Buck</a> design decisions. I was involved in the <a href="https://www.tweag.io/posts/2019-10-09-bazel-cabal-stack.html">Bazel work at Digital Asset</a>, and after that decided that there was still lots of work to be done on build systems. I did some work on <a href="https://shakebuild.com/cloud">Cloud Shake</a>, but the fact that I wasn’t working on it every day, and that I wasn’t personally using it, made it hard to productionize. A former colleague now at Meta reached out and invited me for breakfast — one thing led to another, and I ended up at Meta working on build systems full time.
</p>
<h2>What I’ve learnt about build systems</h2>
<p>
The biggest challenge at Meta is the scale. When I joined they already used the <a href="https://buck.build/">Buck build system</a>, which had been developed at Meta. Looking at the first milliseconds after a user starts an incremental build is illustrative:
</p>
<ul>
<li>With Shake, it starts the process, loads the database into memory, walks the entire graph calling <code>stat</code> on each input and runs any build actions.
<li>With Buck, it connects to a running daemon, talks to a running file watcher (<a href="https://facebook.github.io/watchman/">Watchman</a> in the case of Buck) and uses reverse dependencies to jump to the running actions.
</li>
</ul>
<p>
For Shake, on repos with 100K files, that process might take ~0.5s, but it is <em>O(n)</em>. If you increase to 10M files, it takes 50s, and your users will revolt. With Buck, the overhead is proportional to the number of <em>changed</em> files, which is usually a handful.
</p>
<p>
While Shake is clearly infeasible at the scale of Meta, Buck was also starting to show its age, and I’ve been working with others to <a href="https://developers.facebook.com/blog/post/2021/07/01/future-of-buck/">significantly improve Buck</a>, borrowing lessons from everywhere, including Shake. Buck also addresses problems that Shake doesn’t, such as how to cope with multi-configuration builds (e.g. building for x86 and ARM simultaneously), having a separate file and target namespace and effective use of remote execution and caching.
</p>
<p>
We expect that the new version of Buck will be released open source soon, at which point I’ll definitely be talking more about the design and engineering trade-offs behind it.
</p>
<h2>What's different moving from finance to tech</h2>
<p>
My career to date has been in finance, so working at Meta is a very different world. Below are a few things that stand out (I believe most of these are common to other big tech companies too, but Meta is my first one).
</p>
<p>
<strong>Engineering career ladder:</strong> In finance the promotion path for a good engineer is to become a manager of engineers, then a manager of managers, and so on up. In my previous few roles I was indeed managing teams, which included setting technical direction and doing coding. At Meta, managers look after people, and help set the team direction. Engineers look after code and services, and set the technical direction. But importantly, you can be promoted as an engineer, without gaining direct reports, and the opportunities and compensation are equivalent to that for managers. There are definitely aspects of management that I like (e.g. mentoring, career growth, starting collaborations), and happily all of these are things engineers can still engage in.
</p>
<p>
<strong>Programmer centric culture:</strong> In finance the company is often built around traders and sales people. In tech, the company is built around programmers, which is visible in the culture. There are hardware vending machines, free food, free ice cream, minimal approvals. They’ve done a very good job of providing a relaxing and welcoming environment (with open plan offices, but I don’t mind that aspect). The one complaint I had was that Meta used to have a pretty poor work from home policy, but that’s now been completely rewritten and is now very good.
</p>
<p>
<strong>Reduced hierarchy:</strong> I think this may be more true of Meta than other tech, but there is very minimal hierarchy. Programmers are all just programmers, not “senior” or “junior”. I don’t have the power to tell anyone what to do, but in a slightly odd way, my manager doesn’t have that power either. If I want someone to tackle a bug, I have to justify that it is a worthwhile thing to do. One consequence of that is that the ability to form relationships and influence people is much more important. Another consequence that I didn’t foresee is that working with people in different teams is very similar to working with people in your team, since exactly the same skills apply. I can message any engineer at Meta, about random ideas and possible collaborations, and everyone is happy to talk.
</p>
<p>
<strong>Migration is harder:</strong> In previous places I worked, if we needed 100 files moved to a new version of a library, someone got told to do it, and they went away and spent a lot of time doing it. At Meta that’s a lot harder — firstly, it’s probably 100K files due to the larger scale, and secondly, telling someone they must do something is a lot less effective. That means there is a greater focus on automation (automatically editing the files), compatibility (doesn’t require editing the files) and benefits (ensuring that moving to the new version of the library will make your life better). All those are definitely better ways to tackle the problem, but sometimes, work must be done that is tedious and time consuming, and that is harder to make happen.
</p>
<p>
<strong>Open source:</strong> The process for open sourcing an internal library or tool in the developer infrastructure space is very smooth. The team I work in has open sourced the <a href="https://github.com/facebookexperimental/starlark-rust">Starlark programming language</a> (<a href="https://developers.facebook.com/blog/post/2021/04/08/rust-starlark-library/">taking over maintenance from Google</a>), the <a href="https://github.com/facebookincubator/gazebo">Gazebo Rust utility library</a> and a <a href="https://github.com/facebookincubator/gazebo_lint">Rust linter</a>, plus we have a few more projects in the pipeline. As I write code in the internal Meta monorepo, it gets sync’d to GitHub a few minutes later. It’s also easy to contribute to open source projects, e.g. Meta engineers have contributed to my projects such as <a href="https://neilmitchell.blogspot.com/2019/05/hoogle-xss-vulnerability.html">Hoogle</a> (before I even considered joining Meta).
</p>
<p>
<strong>Hiring:</strong> Meta hires a lot of engineers (e.g. <a href="https://www.cnbc.com/2020/01/21/facebook-is-creating-1000-new-jobs-in-the-uk.html">1,000 additional people in London</a>). That means that interviews are more like a production line, with a desire to have a repeatable process, where candidates are assigned teams after the interviews, rather than interviewing with a team. There are upsides and downsides to that—if I interview a strong candidate, it’s really sad to know that I probably won’t get to work closely with them. It also means that the interview process is determined centrally, so I can’t <a href="https://neilmitchell.blogspot.com/2020/07/how-i-interview.html">follow my preferences</a>. But it does mean that if a friend is looking for a job there’s often something available for them (you can find <a href="https://drive.google.com/file/d/1l2AhKj3C8yGmw_Rvo1CP8w2TgQKCCwY2/view">details on compilers and programming here</a> and a <a href="https://www.metacareers.com/jobs">full list of jobs here</a>), and a repeatable process is good for fairness.
</p>
<p>
Overall I’m certainly very happy to be working on build systems. The build system is the thing that stands between a user and trying out their changes, so anything I can do to make that process better benefits all developers. I’m very excited to share what I’ve been working on more widely in the near future!
</p>
<p>
(Disclosure: This blog post had to go through Meta internal review, because it’s an employee talking about Meta, but other than typos, came out unchanged.)
</p>Neil Mitchellhttp://www.blogger.com/profile/13084722756124486154noreply@blogger.com5tag:blogger.com,1999:blog-7094652.post-87777060271837313352021-09-16T16:46:00.000+01:002021-09-16T16:46:51.698+01:00Huge Project Build Systems<p><em>Summary: Shake won't scale to millions of files, this post says what would be required to make it do so.</em></p>
<p>While Shake has compiled projects with hundreds of thousands of files, it's never scaled out to millions of files, and it would be unlikely to work well at that size. The most popular build systems that operate at that scale are <a href="https://buck.build/">Buck</a> (from Facebook) and <a href="https://bazel.build/">Bazel</a> (from Google). In this post I go through the changes that would need to be made to make Shake scale.</p>
<p>The first issue is covered in my <a href="https://neilmitchell.blogspot.com/2021/09/reflecting-on-shake-build-system.html">previous post</a>, that Shake doesn't know if you change the build rules themselves. As you scale up, it becomes much more important that if you change the rules, everything is properly tracked. As the number of people involved in a project increases, the rate at which the build system changes will also increase. Both Buck and Bazel solve this problem using a deterministic Python-based configuration language called <a href="https://developers.facebook.com/blog/post/2021/04/08/rust-starlark-library/">Starlark</a>. If Shake stopped being a Haskell DSL, I'd argue that it stops being Shake and becomes something different, so it's unclear what could be done there.</p>
<p>The next issue is that every time Shake is invoked, it checks the modification time of every file, and then walks the entire dependency graph. That works fine at 10K files, but as you move to 1M files, it takes too long. The solution is two-fold, first be informed which files have changed using notification APIs (e.g. the <a href="https://facebook.github.io/watchman/">Watchman tool</a>), and then use reverse dependencies to only explore the portion of the graph that has changed. Happily, Pepe already has a patch <a href="https://github.com/ndmitchell/shake/pull/802">adding reverse dependencies to Shake</a>, so that isn't too infeasible.</p>
<p>The final issue is that Shake was designed as a single-machine build system, not for sharing results between users. When I first wrote Shake, I didn't have access to servers, and AWS was brand new. Now, over a decade later, servers are easy to obtain and large scale build systems need to share results, so that if one user builds a file, no one else needs to. Within the realm of multi-user build systems, there are two basic operations - sharing results and offloading commands.</p>
<p>Shake, with it's new <a href="https://shakebuild.com/cloud">cloud features</a>, is able to share results between users using a shared drive. It works, and big companies are using it for real, but I'd consider it fairly experimental. For execution, Shake is unable to run actions remotely, so can't make use of something like Bazel's <a href="https://bazel.build/remote-execution-services.html">remote execution API</a>. Since dependencies are specified at the rule level, and remote execution operates at the command level, there is a bit of a mismatch, and it's unclear what that might look like in Shake.</p>
<p>While Shake won't work at <em>huge</em> scales, it is still quite an effective build system at quite large scales. But, given the limitations, I imagine it will never get to the scale of Buck/Bazel. At the same time, Buck/Bazel lack dynamic dependencies, which makes them unable to express <a href="https://github.com/ghc-proposals/ghc-proposals/pull/245#issuecomment-890962688">rules such as Haskell</a> effectively.</p>
<p>Happily, I am involved with a new build system, the <a href="https://developers.facebook.com/blog/post/2021/07/01/future-of-buck/">next generation of Buck</a>. I joined Facebook two years ago, and since that time have been focused on this project. It's written in Rust, configured with Starlark (I've spent a lot of time working on an open-source <a href="https://github.com/facebookexperimental/starlark-rust">Starlark interpreter</a> in Rust), and should work at huge scales. It's not yet open source, but it will be - we are targeting early next year.</p>
<p>I think Shake is still a build system with a lot to offer, and continue to maintain and improve it. For people who want to scale beyond the range of Shake, I'd definitely recommend using the next generation of Buck, once it is available.</p>
Neil Mitchellhttp://www.blogger.com/profile/13084722756124486154noreply@blogger.com2tag:blogger.com,1999:blog-7094652.post-56133365549775913462021-09-15T17:02:00.001+01:002021-09-16T21:05:49.947+01:00Small Project Build Systems<p><em>Summary: Forward build systems might work better for small projects.</em></p>
<p><a href="https://neilmitchell.blogspot.com/2021/09/reflecting-on-shake-build-system.html">Yesterday's post</a> talked about how Shake is a good medium sized build system - but what about smaller projects? Is the Shake model right for them? Shake can be considered a <em>backwards build system</em>. Each rule says how to produce a file, given some input files (which are dependencies) and an action. Almost all build systems (e.g. Make, Buck, Bazel, CMake, SCons, Ninja) fit this model, which is analysed in the <a href="https://ndmitchell.com/#shake_21_apr_2020">Build Systems a la Carte paper</a>. While this model works, it has two disadvantages:</p>
<ul>
<li>You have to explicitly list dependencies, or infer them from include files etc. That means either dependencies are insufficient (you probably forgot some), or they are excessive (you added some you don't need). Usually both.</li>
<li>You have to think backwards. When you ask someone how to build an executable from a C file, no one talks about linking first, but to program a build system you have to.</li>
</ul>
<p>The alternative to a backwards build system is a <em>forwards build system</em>, of which <a href="https://github.com/kgaughan/memoize.py">Memoize</a> was the first. You just write out the commands in order, and dependency tracing figures out if they have changed. To compile a C program it can be as simple as:</p>
<pre><code>gcc -c util.c
gcc -c main.c
gcc -o main main.o util.o
</code></pre>
<p>That build script is incredibly simple - so simple it could also be treated as a shell script.</p>
<p>A few years ago I wrote such a system, called <a href="https://github.com/ndmitchell/rattle">Rattle</a>, and wrote a paper about it at <a href="https://ndmitchell.com/#rattle_18_nov_2020">OOPSLA 2020</a> with my co-authors Sarah Spall and Sam Tobin-Hochstadt. Sarah gave a talk about Rattle <a href="https://www.youtube.com/watch?v=WRLfQo-IJTg">at OOPSLA</a>, and I gave a talk <a href="https://ndmitchell.com/#rattle_25_jun_2021">at Build Meetup 2021</a>. We were able to compile projects like NodeJS faster than the NodeJS build system (which uses Make), showing the idea might be feasible.</p>
<p>If forward build systems are so great, why do I think they are most suitable for small projects? There are four reasons, the first three of which have mitigations, but the final one sets a limit on the size at which forward build systems are suitable.</p>
<ol>
<li>Forward build systems rely on tracing which files are dependencies of a command. Doing that quickly in a cross-platform manner is a nightmare. There are tricks like hooking system calls etc, but it presents a significant engineering hurdle, especially on MacOS, which makes this task harder with every release.</li>
<li>Forward build systems are immature. The earliest examples no longer work. Rattle is a relatively practical research system - it could evolve into a production system - but it's not there yet. And compared to the alternatives, Rattle is probably one of the closest to production, in large part because it builds off a lot of common infrastructure from Shake which is much more mature.</li>
<li>Forward build systems lack parallelism, since if you want to express parallelism, you need to think about dependencies once more, and it's easy to go wrong. Rattle mostly solves the parallelism by automatically inferring when it is safe to parallelise, which is how we were able to remain competitive with Make.</li>
</ol>
<p>And finally, the biggest issue is that forward build systems are not compositional, while backward build systems are. If you want to write a 1 million build rule system, in a backwards system, each rule looks like any other. Whereas in a forward build system, assuming you need to give an order, writing down that order in a compositional way is hard - in fact, whenever I've tried it, you start expressing the dependencies between entries and end up with a backwards build system.</p>
<p>Happily, people are continuing to research forward build system. Rattle adds parallelism, <a href="https://blogs.ncl.ac.uk/andreymokhov/stroll/">Stroll</a> removes the requirement for an order, <a href="https://sites.science.oregonstate.edu/~roundyd/fac/">Fac</a> allows some dependencies and infers the remaining ones, <a href="https://arxiv.org/pdf/2108.12469.pdf">LaForge</a> finds greater incrementality. Perhaps all those ideas can be combined, along with a lot of engineering, to produce a practical forward build system.</p>
<p>Rattle has shown a well engineered forward build system would be feasible for small projects. It's unclear how much larger the concept might be able to scale, probably never to millions of files, but for small projects it might provide a significantly lower effort path to writing build systems.</p>
Neil Mitchellhttp://www.blogger.com/profile/13084722756124486154noreply@blogger.com4tag:blogger.com,1999:blog-7094652.post-52827291023670882162021-09-14T16:22:00.004+01:002021-09-16T21:51:51.897+01:00Reflecting on the Shake Build System<p><em>Summary: As a medium-sized build system, Shake has some good bits and some bad bits.</em></p>
<p>I first developed the <a href="https://shakebuild.com/">Shake build system</a> at Standard Chartered in 2008, <a href="https://shakebuild.com/faq#whats-the-history-of-shake">rewriting an open source version</a> in my spare time in 2011. I wrote a paper on Shake for <a href="https://ndmitchell.com/#shake_10_sep_2012">ICFP 2012</a> and then clarified some of the details in a <a href="https://ndmitchell.com/#shake_21_apr_2020">JFP 2020 paper</a>. Looking back, over a decade later, this post discusses what went well and what could be improved.</p>
<p>The first thing to note is that Shake is a medium sized build system. If you have either 1 source file or 1 million source files, Shake probably isn't a good fit. In this post I'm going to go through how Shake does as a medium-sized build system, and two other posts reflect on what I think a <a href="https://neilmitchell.blogspot.com/2021/09/small-project-build-systems.html">small build system</a> or <a href="https://neilmitchell.blogspot.com/2021/09/huge-project-build-systems.html">huge build system</a> might look like.</p>
<p>The most important thing Shake got right was adding monadic/dynamic dependencies. Most build systems start with a static graph, and then, realising that can't express the real world, start hacking in an unprincipled manner. The resulting system becomes a bunch of special cases. Shake embraced dynamic dependencies. That makes some things harder (no static cycle detection, less obvious parallelism, must store dependency edges), but all those make Shake itself harder to write, while dynamic dependencies make Shake easier to use. I hope that eventually <em>all</em> build systems gain dynamic dependencies.</p>
<p>In addition to dynamic dependencies, Shake has early cut-off, meaning files that rebuild but don't change don't invalidate rules that depend upon them. This feature is something increasingly becoming standard in build systems, which is great to see.</p>
<p>Shake is written as a Haskell DSL, which means users of Shake are writing a Haskell program that happens to heavily leverage the Shake library. That choice was a double-edged sword. There are some significant advantages:</p>
<ul>
<li>I didn't need to invent a special purpose language. That means I get to reuse existing tooling, existing learning materials, and existing libraries.</li>
<li>Since Shake is just a library, it <a href="https://hackage.haskell.org/package/shake/docs/Development-Shake.html">can be documented</a> with the normal tools like <a href="https://www.haskell.org/haddock/">Haddock</a>.</li>
<li>Users can extend Shake using Haskell, and publish libraries building on top of Shake.</li>
<li>The modelling of monadic dependencies in Haskell is pretty elegant, given a dedicated syntax for expressing monadic computations (the <code>do</code> keyword).</li>
<li>Projects like <a href="https://github.com/haskell/haskell-language-server">Haskell Language Server</a> can build on top of Shake in fairly fundamental ways. See our recent <a href="https://ndmitchell.com/#hls_04_sep_2020">IFL 2020 paper</a> for the benefits that brought.</li>
</ul>
<p>But there are also some downsides:</p>
<ul>
<li>Most significantly, the audience of Shake is somewhat limited by the fact that Shake users probably have to learn some Haskell. While the <a href="https://shakebuild.com/manual">user manual</a> aims to teach enough Haskell to write Shake without really knowing Haskell, it's still a barrier.</li>
<li>Haskell has some significant weaknesses, e.g. it has two competing package managers, countless distribution mechanisms, and none of these things are consistent for long periods of time. Haskell has a poor on-ramp, and thus so does Shake.</li>
</ul>
<p>The choice of an embedded DSL for Shake also leads to the issue that Shake doesn't know when a rule has changed, since a rule is opaque Haskell code. As a consequence, if you modify a command line in a <code>.hs</code> file, Shake is unaware and won't rebuild the necessary files. There are a bunch of techniques for dealing with this limitation (see the Shake functions <code>shakeVersion</code>, <code>versioned</code>), but none are pleasant, and it remains an easy mistake to make. A potential way out is to build a system which reads configuration files not in Haskell and interprets them, which I <a href="https://ndmitchell.com/#shake_09_oct_2015">gave a talk about</a>, and I've seen deployed in practice. But it's something where each user ends up rolling their own.</p>
<p>Another limitation is that Shake is (deliberately) quite low-level. It gives you a way to depend on a file, and a way to run a command line. It doesn't give you a way to express a C++ library. The hope from the beginning was that Shake would be language neutral, and that libraries would arise that built on top of Shake providing access to standard libraries. If you were writing a Python/C++/Ruby build script, you'd simply import those libraries, mix them together, and have a working build system. There are libraries that have gone in that direction, the libraries <a href="https://hackage.haskell.org/package/shake-language-c"><code>shake-language-c</code></a> and <a href="https://github.com/jfeltz/shake-cpp"><code>shake-cpp</code></a> provide C++ rules, <a href="https://hackage.haskell.org/package/avr-shake"><code>avr-shake</code></a> lets you work with <a href="https://www.obdev.at/products/crosspack/index.html">AVR Crosspack</a>. Unfortunately, there aren't enough libraries to just plug together a build system. I think a fundamental problem is that it's not immediately obvious <em>how</em> such libraries would compose, and without that composability, it's hard to build the libraries that would rely on composability.</p>
<p>Finally, the most surprising aspect about developing Shake is that a large part of the effort has gone into writing an ergonomic and cross-platform process executor. The result is found at <a href="https://hackage.haskell.org/package/shake/docs/Development-Shake-Command.html"><code>Development.Shake.Command</code></a>, and can be used outside Shake, letting users write:</p>
<pre><code class="language-haskell">cmd "gcc -c" [src]
</code></pre>
<p>This example invokes <code>gcc</code>, ensuring that <code>src</code> is properly escaped if it has spaces or other special characters. A significant amount of the engineering work in Shake has gone into that facility, when it's totally orthogonal to the build system itself.</p>
<p>In the next two parts of this series, I'll go through why I think Shake isn't a great build system for tiny projects (and what might be), followed by why Shake isn't great for absolutely huge projects (and what would need to be fixed).</p>
Neil Mitchellhttp://www.blogger.com/profile/13084722756124486154noreply@blogger.com3tag:blogger.com,1999:blog-7094652.post-19589502925042856172021-01-17T16:29:00.001+00:002021-01-17T17:35:42.571+00:00Recording video<p><em>Summary: Use OBS, Camo and Audacity</em>.</p>
<p>I recently needed to record a presentation which had slides and my face combined, using a Mac. Based on suggestions from friends and searching the web, I came up with a recipe that worked reasonably well. I'm writing this down to both share that recipe, and so I can reuse the recipe next time.</p>
<p><strong>Slide design:</strong> I used a slide template which had a vertical rectangle hole at the bottom left so I could overlay a picture of my video. It took a while to find a slide design that looked plausible, and make sure callouts/quotes etc didn't overlap into this area.</p>
<p><strong>Camera:</strong> The best camera you have is probably the one on your phone. To hook up my iPhone to my Mac I used a <a href="https://www.apple.com/uk/shop/product/MX0K2ZM/A/usb-c-to-lightning-cable-1m">£20 lightning to USB-C cable</a> (next day shipping from Apple) along with the software <a href="https://reincubate.com/camo/">Camo</a>. I found Camo delightfully easy to use. I paid £5 per month to disable the logo and because I wanted to try out the portrait mode to blur my background - but that mode kept randomly blurring and unblurring things in the background, so I didn't use it. Camo is useful, but I record videos infrequently, and £5/month is way too steep. I'm not a fan of software subscriptions, so I'll remember to cancel Camo. Because it is subscription based, and subscribing/cancelling is a hassle, I'll probably just suck up the logo next time.</p>
<p><strong>Composition:</strong> To put it all together I used <a href="https://obsproject.com/">OBS Studio</a>. The lack of an undo feature is a bit annoying (click carefully), but otherwise everything was pretty smooth. I put my slide deck (in Keynote) on one monitor, and then had OBS grab the slide contents from it. I <em>didn't</em> use presentation mode in Keynote as that takes over all the screen, so I just used the slide editing view, with OBS cropping to the slide contents. One annoyance of slide editing view is that spelling mistakes (and variable names etc.) have red dotted underlines, so I had to go through every slide and make sure the spellings were ignored. Grabbing the video from Camo into OBS was very easy.</p>
<p><strong>Camera angle:</strong> To get the best camera angle I used a <a href="https://www.amazon.co.uk/gp/product/B07VTKKCLZ/">lighting plus phone stand</a> (which contains an impressive array of stands, clips, extensions etc) I'd already bought to position the camera right in front of me. Unfortunately, putting the camera right in front of me made it hard to see the screen, which is what I use to present from. It was awkward, and I had to make a real effort to ensure I kept looking into the camera - using my reflection on the back of the shiny iPhone to make sure I kept in the right position. Even then, watching the video after, you can see my eyes dart to the screen to read the next slide. There must be something better out there - or maybe it's only a problem if you're thinking about it and most people won't notice.</p>
<p><strong>Recording:</strong> For actual recording there are two approaches - record perfectly in one take (which may take many tries, or accepting a lower quality) or repeatedly record each section and edit it together after. I decided to go for a single take, which meant that if a few slides through I stumbled then I restarted. Looking at my output directory, I see 15 real takes, with a combined total of about an hour runtime, for a 20 minute talk. I did two complete run throughs, one before I noticed that spelling mistakes were underlined in dotted red.</p>
<p><strong>Conversion to MP4:</strong> OBS records files as <code>.mkv</code>, so I used <a href="https://www.videolan.org/vlc/index.en-GB.html">VLC</a> to preview them. When I was happy with the result, I converted the file to <code>.mp4</code> using the OBS feature "Remux recordings".</p>
<p><strong>Audio post processing:</strong> After listening to the audio, there was a clear background hum, I suspect from the fan of the laptop. I removed that using <a href="https://www.audacityteam.org/">Audacity</a>. Getting Audacity to open a <code>.mp4</code> file was a bit of an uphill struggle, following <a href="https://manual.audacityteam.org/man/installing_ffmpeg_for_mac.html">this guide</a>. I then cleaned up the audio using <a href="https://www.techsmith.com/blog/not-late-reduce-audio-noise-recordings-free/">this guide</a>, saved it as <code>.wav</code>, and reintegrated it with the video using <a href="https://ffmpeg.org/">ffmpeg</a> and <a href="https://superuser.com/a/277667/240196">this guide</a>. I was amazed and impressed how well Audacity was able to clean up the audio with no manual adjustment.</p>
<p><strong>Sharing:</strong> I shared the resulting video via <a href="https://www.dropbox.com/">DropBox</a>. However, when sharing via DropBox I noticed that the audio quality was significantly degraded in the DropBox preview on the iOS app. Be sure to download the file to assess whether the audio quality is adequate (it was fine when downloaded).</p>
Neil Mitchellhttp://www.blogger.com/profile/13084722756124486154noreply@blogger.com0tag:blogger.com,1999:blog-7094652.post-4166190919889635112020-11-15T21:05:00.000+00:002020-11-15T21:05:05.098+00:00Data types for build system dependencies<p><em>Summary: Monadic and early cut-off? Use a sequence of sets.</em></p>
<p>In the <a href="https://ndmitchell.com/#shake_21_apr_2020">Build Systems a la Carte paper</a> we talk about the expressive power of various types of build systems. We deliberately simplify away parallelism and implementation concerns, but those details matter. In this post I'm going to discuss some of those details, specifically the representation of dependencies.</p>
<h2>Applicative build systems</h2>
<p>In an applicative build system like Make, all dependencies for a target are known before you start executing the associated action. That means the dependencies have no ordering, so are best represented as a set. However, because they can be calculated from the target, they don't usually need to be stored separately. The dependencies can also be evaluated in parallel. To build a target you evaluate the dependencies to values, then evaluate the action.</p>
<p>Early cut-off is when an action is skipped because none of its dependencies have changed <em>value</em>, even if some dependencies might have required recomputing. This optimisation can be incredibly important for build systems with generated code - potentially <a href="https://ndmitchell.com/#shake_10_sep_2012">seconds vs hours of build time</a>. To obtain early cut-off in applicative systems, after evaluating the dependencies you compare them to the previous results, and only run the action if there were changes.</p>
<h2>Monadic build systems</h2>
<p>In monadic <a href="https://shakebuild.com/">build systems like Shake</a>, the representation of dependencies is more complex. If you have an alternative mechanism of detecting whether a rule is dirty (e.g. reverse dependencies) you don't need to record the dependencies at all. If the key is dirty, you start executing the action, and that will request the dependencies it needs. The action can then suspend, calculate the dependencies, and continue.</p>
<p>If you want early cut-off in a monadic build system, you need to rerun the dependencies in advance, and if they all have the same result, skip rerunning the action. Importantly, you probably want to rerun the dependencies in the <em>same order</em> that the action originally requested them -- otherwise you might pay a severe and unnecessary time penalty. As an example, let's consider an action:</p>
<pre><code class="language-haskell">opt <- need "is_optimised"
object <- if opt then need "foo.optimised" else need "foo.unoptimised"
link object
</code></pre>
<p>This rule is monadic, as whether you need the optimised or unoptimised dependency depends on the result of calculating some <code>is_optimised</code> property. If on the first run <code>is_optimised</code> is <code>True</code>, then we build <code>foo.optimised</code>. On the second run, if <code>is_optimised</code> is <code>False</code>, it is important we <em>don't</em> build <code>foo.optimised</code> as that might take a seriously long time and be entirely redundant. Therefore, it's important when checking for early cut-off we build in the order that the previous action requested the dependencies, and stop on the first difference we encounter.</p>
<p>(If you have unlimited resources, e.g. <a href="https://docs.bazel.build/versions/master/remote-execution.html">remote execution</a>, it might be profitable to evaluate everything in parallel - but we're assuming that isn't the case here.)</p>
<p>Provided a rule performs identically between runs (i.e. is deterministic and hasn't been changed), everything that we request to check for early cut-off will still be needed for real, and we won't have wasted any work. For all these reasons, it is important to store dependencies as a sequence (e.g. a list/vector).</p>
<h2>Monadic build systems plus parallelism</h2>
<p>Applicative build systems naturally request all their dependencies in parallel, but monadic build systems are naturally one dependency at a time. To regain parallelism, in build systems like Shake the primitive dependency requesting mechanism takes a <em>set</em> of dependencies that are computed in parallel. While requesting dependencies individually or in bulk gives the same result, in bulk gives significantly more parallelism. (In Shake we use lists to track correspondence between requests and results, but it's morally a set.)</p>
<p>As we saw previously, it is still important that for early cut-off you reproduce the dependencies much like they were in the action. That means you request dependencies in the order they were requested, and when they were requested in bulk, they are also checked in bulk. Now we have a sequence of sets to represent dependencies, where the elements of the sets can be checked in parallel, but the sequence must be checked in order.</p>
<h2>Monadic build systems plus explicit parallelism</h2>
<p>What if we add an explicit parallelism operator to a monadic build system, something like <code>parallel :: [Action a] -> IO [a]</code> to run arbitrary actions in parallel (which is what Shake provides). Now, instead of a sequence of sets, we have a <em>tree</em> of parallelism. As before it's important when replaying that the dependencies are requested in order, but also that as much is requested in parallel as possible.</p>
<h2>What Shake does</h2>
<p>Shake is a monadic build system with early cut-off, parallelism and explicit parallelism. When building up dependencies it uses a tree representation. The full data type is:</p>
<pre><code class="language-haskell">data Traces
= None
| One Trace
| Sequence Traces Traces
| Parallel [Traces]
</code></pre>
<p>Sequenced dependencies are represented with <code>Sequence</code> and the traces captured by parallelism use <code>Parallel</code>. Importantly, constructing <code>Traces</code> values is nicely <em>O(1)</em> in all cases. (Shake v0.19.1 used a different representation and repeatedly normalised it, which could have awful time complexity - potentially <em>O(2^n)</em> in pathological cases.)</p>
<p>While these traces store complete information, actually evaluating that trace when checking for rebuilds would be complicated. Instead, we flatten that representation to <code>[[Trace]]</code> for writing to the Shake database. The outer list is a sequence, the inner list is morally a set. We have the invariant that no <code>Trace</code> value will occur multiple times, since if you depend on something once, and then again, the second dependency was irrelevant. To flatten <code>Parallel</code> computations we take the first required dependency in each parallel action, merge them together, and then repeat for the subsequent actions. If you run code like:</p>
<pre><code class="language-haskell">parallel [
need ["a"] >> parallel [need ["b"], need ["c"]]
need ["d"]
]
</code></pre>
<p>It will get flattened to appear as though you wrote <code>need ["a","d"] >> need ["b","c"]</code>. When checking, it will delay the evaluation of <code>b</code> and <code>c</code> until after <code>d</code> completes, even though that is unnecessary. But simplifying traces at the cost of marginally less rebuild parallelism for those who use explicit parallelism (which is not many) seems like the right trade-off for Shake.</p>
<h2>Conclusion</h2>
<p>Applicative build systems should use sets for their dependencies. Monadic build systems should use sets, but if they support early cut-off, should use sequences of sets.</p>
Neil Mitchellhttp://www.blogger.com/profile/13084722756124486154noreply@blogger.com0tag:blogger.com,1999:blog-7094652.post-39867716672547371692020-11-09T11:43:00.000+00:002020-11-09T11:43:52.461+00:00Turing Incomplete Languages<p><em>Summary: Some languages ban recursion to ensure programs "terminate". That's technically true, but usually irrelevant.</em></p>
<p>In my career there have been three instances where I've worked on a programming language that went through the evolution:</p>
<ol>
<li>Ban recursion and unbounded loops. Proclaim the language is "Turing incomplete" and that all programs terminate.</li>
<li>Declare that Turing incomplete programs are simpler. Have non-technical people conflate terminate quickly with terminate eventually.</li>
<li>Realise lacking recursion makes things incredibly clunky to express, turning simple problems into brain teasers.</li>
<li>Add recursion.</li>
<li>Realise that the everything is better.</li>
</ol>
<p>Before I left university, this process would have sounded ridiculous. In fact, even after these steps happened <em>twice</em> I was convinced it was the kind of thing that would never happen again. Now I've got three instances, it seems worth writing a blog post so for case number four I have something to refer to.</p>
<h2>A language without recursion or unbounded loops</h2>
<p>First, let's consider a small simple statement-orientated first-order programming language. How might we write a non-terminating program? There are two easy ways. Firstly, write a loop - <code>while (true) {}</code>. Second, write recursion, <code>void f() { f() }</code>. We can ban both of those, leaving only bounded iteration of the form <code>for x in xs { .. }</code> or similar. Now the language is Turing incomplete and all programs terminate.</p>
<p>The lack of recursion makes programs harder to write, but we can always <a href="https://learn1.open.ac.uk/mod/oublog/viewpost.php?post=162710">use an explicit stack</a> with unbounded loops.</p>
<p>The lack of unbounded loops isn't a problem provided we have an upper bound on how many steps our program might take. For example, we know <a href="https://en.wikipedia.org/wiki/Quicksort">QuickSort</a> has worst-case complexity <em>O(n^2)</em>, so if we can write <code>for x in range(0, n^2) { .. }</code> then we'll have enough steps in our program such that we never reach the bound.</p>
<p>But what if our programming language doesn't even provide a <code>range</code> function? We can synthesise it by realising that in a linear amount of code we can produce exponentially large values. As an example:</p>
<pre><code class="language-haskell">double xs = xs ++ xs -- Double the length of a list
repeated x = double (double (double (double (double (double (double (double (double (double [x])))))))))
</code></pre>
<p>The function <code>repeated 1</code> makes 10 calls to double, and creates a list of length 2^10 (1024). A mere 263 more calls to <code>double</code> and we'll have a list long enough to contain each atom in the universe. With some tweaks we can cause doubling to stop at a given bound, and generate numbers in sequence, giving us <code>range</code> to any bound we pick.</p>
<p>We now have a menu of three techniques that lets us write almost any program we want to do so:</p>
<ol>
<li>We can encoding recursion using an explicit stack.</li>
<li>We can change unbounded loops into loops with a conservative upper bound.</li>
<li>We can generate structures of exponential size with a linear amount of code.</li>
</ol>
<h2>The consequences</h2>
<p>Firstly, we still don't have a Turing complete language. The code will terminate. But there is no guarantee on how long it will take to terminate. Programs that take a million years to finish technically terminate, but probably can't be run on an actual computer. For most of the domains I've seen Turing incompleteness raised, a runtime of seconds would be desirable. Turing incompleteness doesn't help at all.</p>
<p>Secondly, after encoding the program in a tortured mess of logic puzzles, the code is much harder to read. While there are three general purpose techniques to encode the logic, there are usually other considerations that cause each instance to be solved differently. I've written tree traversals, sorts and parsers in such restricted languages - the result is always a lot of comments and at least one level of unnecessary indirection.</p>
<p>Finally, code written in such a complex style often performs significantly worse. Consider QuickSort - the standard implementation takes <em>O(n^2)</em> time worst case, but <em>O(n log n)</em> time average case, and <em>O(log n)</em> space (for the stack). If you take the approach of building an <em>O(n^2)</em> list before you start to encode a <code>while</code> loop, you end up with <em>O(n^2)</em> space and time. Moreover, while in normal QuickSort the time complexity is counting the number of cheap comparisons, in an encoded version the time complexity relates to allocations, which can be much more expensive as a constant factor.</p>
<h2>The solution</h2>
<p>Most languages with the standard complement of <code>if</code>/<code>for</code> etc which are Turing incomplete do not gain any benefit from this restriction. One exception is in domains where you are proving properties or doing analysis, as two examples:</p>
<ol>
<li>Dependently typed languages such as <a href="https://www.idris-lang.org/">Idris</a>, which typically have much more <a href="http://docs.idris-lang.org/en/latest/tutorial/theorems.html#totality-checking">sophisticated termination checkers</a> than just banning recursion and unbounded loops.</li>
<li>Resource bounded languages such as <a href="https://www.macs.hw.ac.uk/~greg/hume/">Hume</a>, which allow better analysis and implementation techniques by restricting how expressive the language is.</li>
</ol>
<p>Such languages tend to be a rarity in industry. In all the Turing incomplete programming languages I've experienced, recursion was later added, programs were simplified, and programming in the language became easier.</p>
<p>While most languages I've worked on made this evolution in private, one language, <a href="https://daml.com">DAML</a> from <a href="https://www.digitalasset.com/">Digital Asset</a>, did so in public. <a href="https://blog.digitalasset.com/blog/introducing-the-digital-asset-modeling-language-a-powerful-alternative-to-smart-contracts-for-financial-institutions">In 2016 they wrote</a>:</p>
<blockquote>
<p>DAML was intentionally designed not to be Turing-complete. While Turing-complete languages can model any business domain, what they gain in flexibility they lose in analysability.</p>
</blockquote>
<p>Whereas <a href="https://docs.daml.com/1.6.0/daml/intro/9_Functional101.html#recursion">in 2020 their user manual says</a>:</p>
<blockquote>
<p>If there is no explicit iterator, you can use recursion. Let’s try to write a function that reverses a list, for example.</p>
</blockquote>
<p>Note that while I used to work at Digital Asset, these posts both predate and postdate my time there.</p>
Neil Mitchellhttp://www.blogger.com/profile/13084722756124486154noreply@blogger.com19tag:blogger.com,1999:blog-7094652.post-91441107429643589912020-09-22T10:16:00.000+01:002020-09-22T10:16:30.760+01:00Don't use Ghcide anymore (directly)<p><em>Summary: I recommend people use the Haskell Language Server IDE.</em></p>
<p>Just over a year ago, I recommended people looking for a Haskell IDE experience to <a href="https://ndmitchell.com/#ghcide_07_sep_2019">give Ghcide a try</a>. A few months later the Haskell IDE Engine and Ghcide teams <a href="https://neilmitchell.blogspot.com/2020/01/one-haskell-ide-to-rule-them-all.html">agreed to work together</a> on <a href="https://github.com/haskell/haskell-language-server">Haskell Language Server</a> - using <a href="https://github.com/haskell/ghcide">Ghcide as a library</a> as the core, with the plugins/installer experience from the <a href="https://github.com/haskell/haskell-ide-engine">Haskell IDE Engine</a> (by that stage we were already both using the same <a href="https://github.com/mpickering/hie-bios">Haskell setup</a> and <a href="https://github.com/alanz/haskell-lsp">LSP</a> libraries). At that time <a href="https://github.com/alanz">Alan Zimmerman</a> said to me:</p>
<blockquote>
<p>"We will have succeeded in joining forces when you (Neil) start recommending people use Haskell Language Server."</p>
</blockquote>
<p>I'm delighted to say that time has come. For the last few months I've been both using and recommending <a href="https://github.com/haskell/haskell-language-server">Haskell Language Server</a> for all Haskell IDE users. Moreover, for VS Code users, I recommend simply installing the <a href="https://marketplace.visualstudio.com/items?itemName=haskell.haskell">Haskell extension</a> which downloads the right version automatically. The experience of Haskell Language Server is better than either the Haskell IDE Engine or Ghcide individually, and is improving rapidly. The teams have merged seamlessly, and can now be regarded as a single team, producing one IDE experience.</p>
<p>There's still <a href="https://github.com/haskell/haskell-language-server/issues">lots of work to be done</a>. And for those people developing the IDE, Ghcide remains an important part of the puzzle - but it's now a developer-orientated piece rather than a user-orientated piece. Users should follow the <a href="https://github.com/haskell/haskell-language-server#readme">README at Haskell Language Server</a> and report bugs <a href="https://github.com/haskell/haskell-language-server/issues">against Haskell Language Server</a>.</p>
Neil Mitchellhttp://www.blogger.com/profile/13084722756124486154noreply@blogger.com4tag:blogger.com,1999:blog-7094652.post-54465594325107875262020-08-31T11:29:00.001+01:002020-08-31T13:26:23.547+01:00Interviewing while biased<p>Interviewing usually involves some level of subjectivity. I once struggled to decide about a candidate, and after some period of reflection, the only cause I can see is that I was biased against the candidate. That wasn't a happy realisation, but even so, it's one I think worth sharing.</p>
<p>Over my years, I've interviewed hundreds of candidates for software engineering jobs (I reckon somewhere in the 500-1000 mark). I've interviewed for many companies, for teams I was managing, for teams I worked in, and for other teams at the same company. In most places, I've been free to set the majority of the interview. I have <a href="https://neilmitchell.blogspot.com/2020/07/how-i-interview.html">a standard pattern</a>, with a standard technical question, to which I have heard a lot of answers. The quality of the answers fall into one of three categories:</p>
<ul>
<li>About 40% give excellent, quick, effortless answers. These candidates pass the technical portion.</li>
<li>About 50% are confused and make nearly no progress even with lots of hints. These candidates fail.</li>
<li>About 10% struggle a bit but get to the answer.</li>
</ul>
<p>Candidates in the final bucket are by far the hardest to make a decision on. Not answering a question effortlessly doesn't mean you aren't a good candidate - it might mean it's not something you are used to, you got interview nerves or a million other factors that go into someone's performance. It makes the process far more subjective.</p>
<p>Many years ago, I interviewed one candidate over the phone. It was their first interview with the company, so I had to decide whether we should take the step of transporting them to the office for an in-person interview, which has some level of cost associated with it. Arranging an in-person interview would also mean holding a job open for them, which would mean pausing further recruitment. The candidate had a fairly strong accent, but a perfect grasp of English. Their performance fell squarely into the final bucket.</p>
<p>For all candidates, I make a decision, and write down a paragraph or so explaining how they performed. My initial decision was to not go any further in interviewing the candidate. But after writing down the paragraph, I found it hard to justify my decision. I'd written other paragraphs that weren't too dissimilar, but had a decision to continue onwards. I wondered about changing my decision, but felt rather hesitant - I had a sneaking suspicion that this candidate "just wouldn't work out". Had I spotted something subtle I had forgotten to write down? Had their answers about their motivation given me a subconscious red-flag? I didn't know, but for the first time I can remember, decided to wait on sending my internal interview report overnight.</p>
<p>One day later, I still had a feeling of unease. But still didn't have anything to pin it on. In the absence of a reason to reject them, I decided the only fair thing to do was get them onsite for further interviews. Their onsite interviews went fine, I went on to hire them, they worked for me for over a year, and were a model employee. If I saw red-flags, they were false-flags, but more likely, I saw nothing.</p>
<p>However, I still wonder what caused me to decide "no" initially. Unfortunately, the only thing I can hypothesise is that their accent was the cause. I had previously worked alongside someone with a similar accent, who turned out to be thoroughly incompetent. I seem to have projected some aspects of that behaviour onto an entirely unrelated candidate. That's a pretty depressing realisation to make.</p>
<p>To try and reduce the chance of this situation repeating, I now write down the interview description first, and then the decision last. I also remember this story, and how my biases nearly caused me to screw up someone's career.</p>
Neil Mitchellhttp://www.blogger.com/profile/13084722756124486154noreply@blogger.com2tag:blogger.com,1999:blog-7094652.post-36725964742758843412020-07-27T14:56:00.003+01:002020-07-27T14:57:38.478+01:00Which packages does Hoogle search?<p><em>Summary: Hoogle searches packages on Stackage.</em></p><p>Haskell (as of 27 July 2020) has 14,490 packages in the <a href="https://hackage.haskell.org/">Hackage package repository</a>. <a href="https://hoogle.haskell.org/">Hoogle</a> (the Haskell API search engine) searches 2,463 packages. This post explains which packages are searched, why some packages are excluded, and thus, how you can ensure your package is searched.</p><p>The first filter is that Hoogle only searches packages on <a href="https://www.stackage.org/">Stackage</a>. Hoogle indexes any package which is either in the latest Stackage nightly or Stackage LTS, but always indexes the latest version that is on Hackage. If you want a Hoogle search that perfectly matches a given Stackage release, I recommend using the Stackage Hoogle search available from <a href="https://www.stackage.org/nightly">any snapshot page</a>. There are two reasons for restricting to only packages on Stackage:</p><ul><li>I want Hoogle results to be useful. The fact that the package currently builds with a recent GHC used by Stackage is a positive sign that the package is maintained and might actually work for a user who wants to use it. Most of the packages on Hackage probably don't build with the latest GHC, and thus aren't useful search results.</li>
<li>Indexing time and memory usage is proportional to the number of packages, and somewhat the size of those packages. By dropping over 10 thousand packages we can index more frequently and on more constrained virtual machines. With the recent release of <a href="https://hackage.haskell.org/package/hoogle">Hoogle 5.0.18</a> the technical limitations on size were removed to enable indexing all of Hackage - but there is still no intention to do so.</li>
</ul><p>There are 2,426 packages in Stackage Nightly, and 2,508 in Stackage LTS, with most overlapping. There are 2,580 distinct packages between these two sources, the <a href="https://www.haskell.org/platform/">Haskell Platform</a> and a few custom packages Hoogle knows about (e.g. GHC).</p><p>Of the 2,580 eligible packages, 77 are executables only, so don't have any libraries to search, leaving 2,503 packages.</p><p>Of the remaining packages 2,503, 40 are missing documentation on Hackage, taking us down to 2,463. As for why a package might not have documentation:</p><ul><li>Some are missing documentation because they are very GHC internal and are mentioned but not on Hackage, e.g. ghc-heap.</li>
<li>Some are Windows only and won't generate documentation on the Linux Hackage servers, e.g. <a href="https://hackage.haskell.org/package/Win32-notify">Win32-notify</a>.</li>
<li>Some have dependencies not installed on the Hackage servers, e.g. <a href="https://hackage.haskell.org/package/rocksdb-query">rocksdb-query</a>.</li>
<li>Some have documentation that appears to have been generated without generating a corresponding Hoogle data file, e.g. <a href="https://hackage.haskell.org/package/array-memoize">array-memoize</a>.</li>
<li>Some are just missing docs entirely on Hackage for no good reason I can see, e.g. <a href="https://hackage.haskell.org/package/bytestring-builder">bytestring-builder</a>.</li>
</ul><p>The Hoogle database is generated and deployed once per day, automatically. Occasionally a test failure or dependency outage will cause generation to fail, but I get alerted, and usually it doesn't get stale by more than a few days. If you <a href="https://github.com/commercialhaskell/stackage#add-your-package">add your package to Stackage</a> and it doesn't show up on Hoogle within a few days, <a href="https://github.com/ndmitchell/hoogle/issues">raise an issue</a>.</p>Neil Mitchellhttp://www.blogger.com/profile/13084722756124486154noreply@blogger.com0tag:blogger.com,1999:blog-7094652.post-79302577297262211492020-07-15T19:17:00.000+01:002020-07-15T19:17:32.569+01:00Managing Haskell Extensions<p><em>Summary: You can divide extensions into yes, no and maybe, and then use HLint to enforce that.</em></p><p>I've worked in multiple moderately sized multinational teams of Haskell programmers. One debate that almost always comes up is which extensions to enable. It's important to have some consistency, so that everyone is using similar dialects of Haskell and can share/review code easily. The way I've solved this debate in the past is by, as a team, dividing the extensions into three categories:</p><ul><li><strong>Always-on</strong>. For example, <code>ScopedTypeVariables</code> is just how Haskell should work and should be enabled everywhere. We turn these extensions on globally <a href="https://github.com/digital-asset/ghcide/blob/cbafcf29f4157e86e0522d87bf99cb2aeff1d853/ghcide.cabal#L186-L199">in Cabal with <code>default-extensions</code></a>, and then write an <a href="https://github.com/digital-asset/ghcide/blob/cbafcf29f4157e86e0522d87bf99cb2aeff1d853/.hlint.yaml#L54-L73">HLint rule</a> to ban turning the extension on manually. In one quick stroke, a large amount of <code>{-# LANGUAGE #-}</code> boilerplate at the top of each file disappears.</li>
<li><strong>Always-off</strong>, not enabled and mostly banned. For example, <code>TransformListComp</code> steals the <code>group</code> keyword and never got much traction. These extensions can be similarly banned by HLint, or you can <a href="https://github.com/digital-asset/ghcide/blob/cbafcf29f4157e86e0522d87bf99cb2aeff1d853/.hlint.yaml#L51-L52">default unmentioned extensions to being disabled</a>. If you really need to turn on one of these extensions, you need to both turn it on in the file, <em>and</em> add an HLint exception. Such an edit can trigger wider code review, and serves as a good signal that something needs looking at carefully.</li>
<li><strong>Sometimes-on</strong>, written at the top of the file as <code>{-# LANGUAGE #-}</code> pragmas when needed. These are extensions which show up sometimes, but not that often. A typical file might have zero to three of them. The reasons for making an extension sometimes-on fall into three categories:<br />
<ul><li>The extension has a harmful compile-time impact, e.g. <code>CPP</code> or <code>TemplateHaskell</code>. It's better to only turn these extensions on if they are needed, but they are needed fairly often.</li>
<li>The extension can break some code, e.g. <code>OverloadedStrings</code>, so enabling it everywhere would cause compile failures. Generally, we work to minimize such cases, aiming to fix all code to compile with most extensions turned on.</li>
<li>The extension is used rarely within the code base and is a signal to the reader that something unusual is going on. Depending on the code base that might be things like <code>RankNTypes</code> or <code>GADTs</code>. But for certain code bases, those extensions will be very common, so it very much varies by code base.</li>
</ul></li>
</ul><p>The features that are often most debated are the syntax features - e.g. <code>BlockSyntax</code> or <code>LambdaCase</code>. Most projects should either use these extensions commonly (always-on), or never (banned). They provide some syntactic convenience, but if used rarely, tend to mostly confuse things.</p><p>Using this approach every large team I've worked on has had one initial debate to classify extensions, then every few months someone will suggest moving an extension from one pile to another. However, it's pretty much entirely silenced the issue from normal discussion thereafter, leaving us to focus on actual coding.</p>Neil Mitchellhttp://www.blogger.com/profile/13084722756124486154noreply@blogger.com0tag:blogger.com,1999:blog-7094652.post-37762855249485281632020-07-07T22:21:00.002+01:002020-07-07T22:21:54.144+01:00How I Interview<p><em>Summary: In previous companies I had a lot of freedom to design an interview. This article describes what I came up with.</em></p><p>Over the years, I've interviewed hundreds of candidates for software engineering jobs (at least 500, probably quite a bit more). I've interviewed for many companies, for teams I was setting up, for teams I was managing, for teams I worked in, and for different teams at the same company. In most places, I've been free to set the majority of the interview. This post describes why and how I designed my interview process. I'm making this post now because where I currently work has a pre-existing interview process, so I won't be following the process below anymore.</p><p>I have always run my interviews as a complete assessment of a candidate, aiming to form a complete answer. Sometimes I did that as a phone screen, and sometimes as part of a set of interviews, but I never relied on other people to cover different aspects of a candidate. (Well, I did once, and it went badly...)</p><p>When interviewing, there are three questions I want to answer for myself, in order of importance.</p><h2>Will they be happy here?</h2><p>If the candidate joined, would they be happy? If people aren't happy, it won't be a pleasant experience, and likely, they won't be very successful. Whether they are happy is the most important criteria because an employee who can't do the job but is happy can be trained or can use their skills for other purposes. But an employee who is unhappy will just drag the morale of the whole team down.</p><p>To figure out whether a candidate would be happy, I explain the job (including any office hours/environment/location) and discuss it in contrast to their previous experience. The best person to judge if they would be happy are the candidate themselves - so I ask that question. The tricky part is that it's an interview setting, so they have prepared saying "Yes, that sounds good" to every question. I try and alleviate that by building a rapport with the candidate first, being honest about my experiences, and trying to discuss what they like in the abstract first. If I'm not convinced they are being truthful or properly thinking it through, I ask deeper questions, for example how they like to split their day etc.</p><p>A great sign is when a candidate, during the interview, concludes <em>for themselves</em> that this job just isn't what they were looking for. I've had that happen 5 times during the actual interview, and 2 times as an email afterwards. It isn't awkward, and has saved some candidates an inappropriate job (at least 2 would have likely been offered a job otherwise).</p><p>While I'm trying to find out if the candidate will be happy, at the same time, I'm also attempting to persuade the candidate that they want to join. It's a hard balance and being open and honest is the only way I have managed it. Assuming I am happy where I work, I can use my enthusiasm to convince the candidate it's a great place, but also give them a sense of what I do.</p><h2>Can they do the job?</h2><p>There are two ways I used to figure out if someone can do the job. Firstly, I discuss their background, coding preferences etc. Do the things they've done in the past match the kind of things required in the job. Have they got experience with the non-technical portions of the job, or domain expertise. Most of these aspects are on their CV, so it involves talking about their CV, past projects, what worked well etc.</p><p>Secondly, I give them a short technical problem. My standard problem can be solved in under a minute in a single line of code by the best candidates. The problem is not complex, and has no trick-question or clever-approach element. The result can then be used as a springboard to talk about algorithmic efficiency, runtime implementation, parallelism, testing, verification etc. However, my experience is that candidates who struggle at the initial problem go on to struggle with any of the extensions, and candidates that do well at the initial question continue to do well on the extensions. The correlation has been so complete that over time I have started to use the extensions more for candidates who did adequately but not great on the initial problem.</p><p>My approach of an incredibly simple problem does not seem to be standard or adopted elsewhere. One reason might be that if it was used at scale, the ability to cheat would be immense (I actually have 2 backup questions for people I've interviewed previously).</p><p>Given such a simple question, there have been times when 5 candidates in a row ace the question, and I wonder if the question is just too simple. But usually then the next 5 candidates all struggle terribly and I decide it still has value.</p><h2>Will I be happy with them doing the job?</h2><p>The final thing I wonder is would I be happy with them being a part of the team/company. The usual answer is yes. However, if the candidate displays nasty characteristics (belittling, angry, racist, sexist, lying) then it's a no. This question definitely isn't code for "culture fit" or "would I go for a beer with them", but specific negative traits. Generally I answer this question based on whether I see these characteristics reflected in the interactions I have with the candidate, not specific questions. I've never actually had a candidate who was successful at the above questions, and yet failed at this question. I think approximately 5-10 candidates have failed on this question.</p>Neil Mitchellhttp://www.blogger.com/profile/13084722756124486154noreply@blogger.com0tag:blogger.com,1999:blog-7094652.post-59807977363118506292020-07-05T10:49:00.000+01:002020-07-05T11:03:35.765+01:00Automatic UI's for Command Lines with cmdargs<p><em>Summary: Run <code>cmdargs-browser hlint</code> and you can fill out arguments easily.</em></p><p>The Haskell <a href="https://hackage.haskell.org/package/cmdargs">command line parsing library <code>cmdargs</code></a> contains a <a href="http://hackage.haskell.org/package/cmdargs/docs/System-Console-CmdArgs-Explicit.html#t:Mode">data type that represents a command line</a>. I always thought it would be a neat trick to transform that into a web page, to make it easier to explore command line options interactively - similar to how the custom-written <a href="http://www.martin-achern.de/wgetgui/">wget::gui</a> wraps <a href="https://www.gnu.org/software/wget/"><code>wget</code></a>.</p><p>I wrote a demo to do just that, named <a href="http://hackage.haskell.org/package/cmdargs-browser"><code>cmdargs-browser</code></a>. Given any program that uses <code>cmdargs</code> (e.g. <a href="https://github.com/ndmitchell/hlint"><code>hlint</code></a>), you can install <code>cmdargs-browser</code> (with <code>cabal install cmdargs-browser</code>) and run:</p><pre><code>cmdargs-browser hlint
</code></pre><p>And it will pop up:</p><p><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh_LtD6aZXRCPNqGDKzoH_FDT_Ju8Wjr4zx0KE0g0D9FKDufqkP6bkAhlxFuhbaIuCnjQOdz7bUIB4jwlFSMGKU4JCmS7lNb71u2cOVMg46-s1ncJ_xiQJ82lTkqYEjjUlAESnx/s1600/cmdargs-browser.png" data-original-width="776" data-original-height="686" /></p><p>As we can see, the HLint modes are listed on the left (you can use <code>lint</code>, <code>grep</code> or <code>test</code>), the possible options on the right (e.g. normal arguments and <code>--color</code>) and the command line it produces at the bottom. As you change mode or add/remove flags, the command line updates. If you hit <code>OK</code> it then runs the program with the command line. The help is included next to the argument, and if you make a mistake (e.g. write <code>foo</code> for the <code>--color</code> flag) it tells you immediately. It could be more polished (e.g. browse buttons for file selections, better styling) but the basic concepts works well.</p><h2>Technical implementation</h2><p>I wanted every <code>cmdargs</code>-using program to support this automatic UI, but also didn't want to increase the dependency footprint or compile-time overhead for <code>cmdargs</code>. I didn't want to tie <code>cmdargs</code> to this particular approach to a UI - I wanted a flexible mechanism that anyone could use for other purposes.</p><p>To that end, I built out a <a href="https://hackage.haskell.org/package/cmdargs-0.10.20/docs/System-Console-CmdArgs-Helper.html"><code>Helper</code> module</a> that is included in <code>cmdargs</code>. That API provides the full power and capabilities on which <code>cmdargs-browser</code> is written. The <code>Helper</code> module is only 350 lines.</p><p>If you run <code>cmdargs</code> with either <code>$CMDARGS_HELPER</code> or <code>$CMDARGS_HELPER_HLINT</code> set (in the case of HLint) then <code>cmdargs</code> will run the command line you specify, passing over the explicit <code>Mode</code> data type on the stdin. That <code>Mode</code> data type includes functions, and using a simplistic communication channel on the stdin/stdout, the helper process can invoke those functions. As an example, when <code>cmdargs-browser</code> wants to validate the <code>--color</code> flag, it does so by calling a function in <code>Mode</code>, that secretly talks back to <code>hlint</code> to validate it.</p><p>At the end, the helper program can choose to either give an error message (to stop the program, e.g. if you press Cancel), or give some command lines to use to run the program.</p><h2>Future plans</h2><p>This demo was a cool project, which may turn out to be useful for some, but I have no intention to develop it further. I think something along these lines should be universally available for all command line tools, and built into all command line parsing libraries.</p><h2>Historical context</h2><p>All the code that makes this approach work was written over seven years ago. Specifically, it was my hacking project in the hospital while <a href="https://ndmitchell.com/elements/henry-photo-big.jpg">waiting for my son to be born</a>. Having a little baby is a hectic time of life, so I never got round to telling anyone about its existence.</p><p>This weekend I resurrected the code and published an updated version to Hackage, deliberately making as few changes as possible. The three necessary changes were:</p><ol><li>jQuery deprecated the <code>live</code> function replacing it with <code>on</code>, meaning the code didn't work.</li>
<li>I had originally put an upper bound of <code>0.4</code> for the <code>transformers</code> library. Deleting the upper bound made it work.</li>
<li>Hackage now requires that all your uploaded <code>.cabal</code> files declare that they require a version of 1.10 or above of Cabal itself, even if they don't.</li>
</ol><p>Overall, to recover a project that is over 7 years old, it was surprisingly little effort.</p>Neil Mitchellhttp://www.blogger.com/profile/13084722756124486154noreply@blogger.com3tag:blogger.com,1999:blog-7094652.post-26680812570158860392020-07-01T09:50:00.000+01:002020-07-01T09:50:36.523+01:00A Rust self-ownership lifetime trick (that doesn't work)<p><em>Summary: I came up with a clever trick to encode lifetimes of allocated values in Rust. It doesn't work.</em></p><p>Let's imagine we are using Rust to implement some kind of container that can allocate values, and a special value can be associated with the container. It's a bug if the allocated value gets freed while it is the special value of a container. We might hope to use lifetimes to encode that relationship:</p><pre><code class="language-rust">struct Value<'v> {...}
struct Container {...}
impl Container {
fn alloc<'v>(&'v self) -> Value<'v> {...}
fn set_special<'v>(&'v self, x: Value<'v>) {...}
}
</code></pre><p>Here we have a <code>Container</code> (which has no lifetime arguments), and a <code>Value<'v></code> (where <code>'v</code> ties it to the right container). Within our container we can implement <code>alloc</code> and <code>set_special</code>. In both cases, we take <code>&'v self</code> and then work with a <code>Value<'v></code>, which ensures that the lifetime of the <code>Container</code> and <code>Value</code> match. (We ignore details of how to implement these functions - it's possible but requires <code>unsafe</code>).</p><p>Unfortunately, the following code compiles:</p><pre><code class="language-rust">fn set_cheat<'v1, 'v2>(to: &'v1 Container, x: Value<'v2>) {
to.set_special(x);
}
</code></pre><p>The Rust compiler has taken advantage of the fact that <code>Container</code> can be reborrowed, and that <a href="https://doc.rust-lang.org/nomicon/subtyping.html"><code>Value</code> is variant</a>, and rewritten the code to:</p><pre><code class="language-rust">fn set_cheat<'v1, 'v2>(to: &'v1 Container, x: Value<'v2>) {
'v3: {
let x : Value<'v3> = x; // Value is variant, 'v2 : 'v3
let to : &'v3 Container = &*to;
to.set_special(x);
}
}
</code></pre><p>The code with lifetime annotations doesn't actually compile, it's just what the compiler did under the hood. But we can stop <code>Value</code> being variant by making it contain <code>PhantomData<Cell<&'v ()>></code>, since lifetimes under <code>Cell</code> are invariant. Now the above code no longer compiles. Unfortunately, there is a closely related variant which does compile:</p><pre><code class="language-rust">fn set_cheat_alloc<'v1, 'v2>(to: &'v1 Container, from: &'v2 Container) {
let x = from.alloc();
to.set_special(x);
}
</code></pre><p>While <code>Value</code> isn't variant, <code>&Container</code> is, so the compiler has rewritten this code as:</p><pre><code class="language-rust">fn set_cheat<'v1, 'v2>(to: &'v1 Container, from: &'v2 Container) {
'v3: {
let from = &'v3 Container = &*from;
let x : Value<'v3> = from.alloc();
let to : &'v3 Container = &*to;
to.set_special(x);
}
}
</code></pre><p>Since lifetimes on <code>&</code> are always variant, I don't think there is a trick to make this work safely. Much of the information in this post was gleaned from <a href="https://stackoverflow.com/a/62661100/160673">this StackOverflow question</a>.</p>Neil Mitchellhttp://www.blogger.com/profile/13084722756124486154noreply@blogger.com0tag:blogger.com,1999:blog-7094652.post-88190818336519864992020-06-22T13:15:00.001+01:002020-06-22T21:22:02.361+01:00The HLint Match Engine<p><em>Summary: HLint has a match engine which powers most of the rules.</em></p>
<p>The <a href="https://github.com/ndmitchell/hlint">Haskell linter HLint</a> has two forms of lint - some are built in written in Haskell code over the GHC AST (e.g. <a href="https://github.com/ndmitchell/hlint/blob/fe9c3ad66968840fcd61a9b967ffd9a89ff2970d/src/Hint/Extensions.hs">unused extension detection</a>), but 700+ hints are written using a matching engine. As an example, we can replace <code>map f (map g xs)</code> with <code>map (f . g) xs</code>. Doing so might be more efficient, but importantly for HLint, it's often clearer. That rule <a href="https://github.com/ndmitchell/hlint/blob/fe9c3ad66968840fcd61a9b967ffd9a89ff2970d/data/hlint.yaml#L129">is defined in HLint</a> as:</p>
<pre><code class="language-yaml">- hint: {lhs: map f (map g x), rhs: map (f . g) x}
</code></pre>
<p>All single-letter variables are wildcard matches, so the above rule will match:</p>
<pre><code class="language-haskell">map isDigit (map toUpper "test")
</code></pre>
<p>And suggest:</p>
<pre><code class="language-haskell">map (isDigit . toUpper) "test"
</code></pre>
<p>However, Haskell programmers are uniquely creative in specifying functions - with a huge variety of <code>$</code> and <code>.</code> operators, infix operators etc. The <a href="https://github.com/ndmitchell/hlint/blob/fe9c3ad66968840fcd61a9b967ffd9a89ff2970d/src/GHC/Util/Unify.hs">HLint matching engine</a> in HLint v3.1.4 would match this rule to all of the following (I'm using <code>sort</code> as a convenient function, replacing it with <code>foo</code> below would not change any matches):</p>
<ul>
<li><code>map f . map g</code></li>
<li><code>sort . map f . map g . sort</code></li>
<li><code>concatMap (map f . map g)</code></li>
<li><code>map f (map (g xs) xs)</code></li>
<li><code>f `map` (g `map` xs)</code></li>
<li><code>map f $ map g xs</code></li>
<li><code>map f (map g $ xs)</code></li>
<li><code>map f (map (\x -> g x) xs)</code></li>
<li><code>Data.List.map f (Prelude.map g xs)</code></li>
<li><code>map f ((sort . map g) xs)</code></li>
</ul>
<p>That's a large variety of ways to write a nested <code>map</code>. In this post I'll explain <em>how</em> HLint matches everything above, and the bug that used to cause it to match even the final line (which <em>isn't</em> a legitimate match) which was fixed in HLint v3.1.5.</p>
<p><strong>Eta-contraction</strong></p>
<p>Given a hint comprising of <code>lhs</code> and <code>rhs</code>, the first thing HLint does is determine if it can eta-contract the hint, producing a version without the final argument. If it can do so for both sides, it generates a completely fresh hint. In the case of <code>map f (map g x)</code> in generates:</p>
<pre><code class="language-yaml">- hint: {lhs: map f . map g, rhs: map (f . g)}
</code></pre>
<p>For the examples above, the first three match with this eta-contracted version, and the rest match with the original form. Now we've generated two hints, it's important that we don't perform sufficiently fuzzy matching that <em>both</em> match some expression, as that would generate twice as many warnings as appropriate.</p>
<p><strong>Root matching</strong></p>
<p>The next step is root matching, which happens only when trying to match at the root of some match. If we have <code>(foo . bar) x</code> then it would be reasonable for that to match <code>bar x</code>, despite the fact that <code>bar x</code> is not a subexpression. We overcome that by transforming the expression to <code>foo (bar x)</code>, unifying only on <code>bar x</code>, and recording that we need to add back <code>foo .</code> at the start of the replacement.</p>
<p><strong>Expression matching</strong></p>
<p>After splitting off any extra prefix, HLint tries to unify the single-letter variables with expressions, and build a substitution table with type <code>Maybe [(String, Expr)]</code>. The substitution is <code>Nothing</code> to denote the expressions are incompatible, or <code>Just</code> a mapping of variables to the expression they matched. If two expressions have the same structure, we descend into all child terms and match further. If they don't have the same structure, but are similar in a number of ways, we adjust the source expression and continue.</p>
<p>Examples of adjustments include expanding out <code>$</code>, removing infix application such as <code>f `map` x</code> and ignoring redundant brackets. We translate <code>(f . g) x</code> to <code>f (g x)</code>, but <em>not</em> at the root - otherwise we might match both the eta-expanded and non-eta-expanded variants. We also re-associate <code>(.)</code> where needed, e.g. for expressions like <code>sort . map f . map g . sort</code> the bracketing means we have <code>sort . (map f . (map g . sort))</code>. We can see that <code>map f . map g</code> is not a subexpression of that expression, but given that <code>.</code> is associative, we can adjust the source.</p>
<p>When we get down to a terminal name like <code>map</code>, we use the scope information HLint knows to determine if the two <code>map</code>'s are equivalent. I'm not going to talk about that too much, as it's <a href="https://github.com/ndmitchell/hlint/issues/1001">slated to be rewritten</a> in a future version of HLint, and is currently both slow and a bit approximate.</p>
<p><strong>Substitution validity</strong></p>
<p>Once we have a substitution, we see if there are any variables which map to multiple distinct expressions. If so, the substitution is invalid, and we don't match. However, in our example above, there are no duplicate variables so any matching substitution must be valid.</p>
<p><strong>Side conditions</strong></p>
<p>Next we check any side conditions - e.g. we could decide that the above hint only makes sense if <code>x</code> is atomic - i.e. does not need brackets in any circumstance. We could have expressed that with <code>side: isAtom x</code>, and any such conditions are checked in a fairly straightforward manner.</p>
<p><strong>Substitutions</strong></p>
<p>Finally, we substitute the variables into the provided replacement. When doing the replacement, we keep track of the free variables, and if the resulting expression has more free variables than it started with, we assume the hint doesn't apply cleanly. As an example, consider the hint <code>\x -> a <$> b x</code> to <code>fmap a . b</code>. It looks a perfectly reasonable hint, but what if we apply it to the expression <code>\x -> f <$> g x x</code>. Now <code>b</code> matches <code>g x</code>, but we are throwing away the <code>\x</code> binding and <code>x</code> is now dangling, so we reject it.</p>
<p>When performing the substitution, we used knowledge of the AST we want, and the brackets required to parse that expression, to ensure we insert the right brackets, but not too many.</p>
<p><strong>Bug <a href="https://github.com/ndmitchell/hlint/issues/1055">#1055</a></strong></p>
<p>Hopefully all the above sounds quite reasonable. Unfortunately, at some point, the root-matching lost the check that it really was at the root, and started applying the translation to terms such as <code>sort .</code> in <code>map f ((sort . map g) xs)</code>. Having generated the <code>sort .</code>, it decided since it wasn't at the root, there was nowhere for it to go, so promptly threw it away. Oops. HLint v3.1.5 fixes the bug in two distinct ways (for defence in depth):</p>
<ol>
<li>It checks the root boolean before doing the root matching rule.</li>
<li>If it would have to throw away any extra expression, it fails, as throwing away that expression is certain to lead to a correctness bug.</li>
</ol>
<p><strong>Conclusion</strong></p>
<p>The matching engine of HLint is relatively complex, but I always assumed one day would be replaced with a finite-state-machine scanner that could match <em>n</em> hints against an expression in <em>O(size-of-expression)</em>, rather than the current <em>O(n * size-of-expression)</em>. However, it's never been the bottleneck, so I've kept with the more direct version.</p>
<p>I'm glad HLint has a simple external lint format. It allows easy contributions and makes hint authoring accessible to everyone. For large projects it's easy to define your own hints to capture common coding patterns. When using languages whose linter does not have an external matching language (e.g. <a href="https://github.com/rust-lang/rust-clippy">Rust's Clippy</a>) I certainly miss the easy customization.</p>
Neil Mitchellhttp://www.blogger.com/profile/13084722756124486154noreply@blogger.com0tag:blogger.com,1999:blog-7094652.post-14199635172509115242020-06-09T12:31:00.000+01:002020-06-09T13:57:30.954+01:00Hoogle Searching Overview<p><em>Summary: Hoogle 5 has three interesting parts, a pipeline, database and search algorithm.</em></p><p>The Haskell search engine <a href="https://hoogle.haskell.org">Hoogle</a> has gone through five major designs, the first four of which are described in <a href="https://ndmitchell.com/downloads/slides-hoogle_finding_functions_from_types-16_may_2011.pdf">these slides from TFP 2011</a>. <a href="https://neilmitchell.blogspot.com/2015/01/hoogle-5-is-coming.html">Hoogle version 5</a> was designed to be a complete rewrite which simplified the design and allowed it to scale to all of <a href="https://hackage.haskell.org">Hackage</a>. All versions of Hoogle have had some preprocessing step which consumes Haskell definitions, and writes out a data file. They then have the search phase which uses that data file to perform searches. In this post I'll go through three parts -- what the data file looks like, how we generate it, and how we search it. When we consider these three parts, the evolution of Hoogle can be seen as:</p><ul><li>Versions 1-3, produce fairly simple data files, then do an expensive search on top. Fails to scale to large sizes.</li>
<li>Version 4, produce a very elaborate data files, aiming to search quickly on top. Failed because producing the data file required a lot of babysitting and a long time, so was updated very rarely (yearly). Also, searching a complex data file ends up with a lot of corner cases which have terrible complexity (e.g. <code>a -> a -> a -> a -> a</code> would kill the server).</li>
<li>Version 5, generate very simple data files, then do <em>O(n)</em> but small-constant multiplier searching on top. Update the files daily and automatically. Make search time very consistent.</li>
</ul><h2>Version 5 data file</h2><p>By version 5 I had realised that deserialising the data file was both time consuming and memory hungry. Therefore, in version 5, the data file consists of chunks of data that can be memory-mapped into <code>Vector</code> and <code>ByteString</code> chunks using a <code>ForeignPtr</code> underlying storage. The OS figures out which bits of the data file should be paged in, and there is very little overhead or complexity on the Haskell side. There is a small index structure at the start of the data file which says where these interesting data structures live, and gives them identity using types. For example, to store information about name search we have three definitions:</p><pre><code class="language-haskell">data NamesSize a where NamesSize :: NamesSize Int
data NamesItems a where NamesItems :: NamesItems (V.Vector TargetId)
data NamesText a where NamesText :: NamesText BS.ByteString
</code></pre><p>Namely, in the data file we have <code>NamesSize</code> which is an <code>Int</code>, <code>NamesItems</code> which is a <code>Vector TargetId</code>, and <code>NamesText</code> which is a <code>ByteString</code>. The <code>NamesSize</code> is the maximum number of results that can be returned from any non-empty search (used to reduce memory allocation for the result structure), the <code>NamesText</code> is a big string with <code>\0</code> separators between each entry, and the <code>NamesItems</code> are the identifiers of the result for each name, with as many entries as there are <code>\0</code> separators.</p><p>The current data file is 183Mb for all of Stackage, of which 78% of that is the information about items (documentation, enough information to render them, where the links go etc - we then GZip this information). There are 21 distinct storage types, most involved with type search.</p><h2>Generating the data file</h2><p>Generating the data file is done in four phases.</p><p>Phase 0 downloads the inputs, primarily a <code>.tar.gz</code> file containing all <code>.cabal</code> files, and another containing all the Haddock Hoogle outputs. These <code>.tar.gz</code> files are never unpacked, but streamed through and analysed <a href="https://neilmitchell.blogspot.com/2015/07/thoughts-on-conduits.html">using conduit</a>.</p><p>Phase 1 reads through all the <code>.cabal</code> files to get metadata about each package - the author, tags, whether it's in Stackage etc. It stores this information in a <code>Map</code>. This phase takes about 7s and uses 100Mb of memory.</p><p>Phase 2 reads through every definition in every Haddock Hoogle output (the <code>.txt</code> files <code>--hoogle</code> generates). It loads the entry, parses it, processes it, and writes most of the data to the data file, assigning it a <code>TargetId</code>. That <code>TargetId</code> is the position of the item in the data file, so it's unique, and can be used to grab the relevant item when we need to display it while searching. During this time we collect the unique deduplicated type signatures and names, along with the <code>TargetId</code> values. This phase takes about 1m45s and has about 900Mb of memory at the end. The most important part of phase 2 is <a href="https://neilmitchell.blogspot.com/2015/09/three-space-leaks.html">not to introduce a space leak</a>, since then memory soars to many Gb.</p><p>Phase 3 processes the name and type maps and writes out the information used for searching. This phase takes about 20s and consumes an additional 250Mb over the previous phase.</p><p>Since generating the data file takes only a few minutes, there is a nightly job that updates the data file at 8pm every night. The job takes about 15 minutes in total, because it checks out a new version of <a href="https://github.com/ndmitchell/hoogle">Hoogle from GitHub</a>, builds it, downloads all the data files, generates a data file, runs the tests, and then restarts the servers.</p><h2>Searching</h2><p>Hoogle version 5 works on the principle that it's OK to be <em>O(n)</em> if the constant is small. For textual search, we have a big flat <code>ByteString</code>, and give that to some C code that quickly looks for the substring we enter, favouring complete and case-matching matches. Such a loop is super simple, and at the size of data we are working with (about 10Mb), plenty fast enough.</p><p>Type search is inspired by the same principle. We deduplicate types, then for each type, we produce an 18 byte fingerprint. There are about 150K distinct type signatures in Stackage, so that results in about 2.5Mb of fingerprints. For every type search we scan all those fingerprints and figure out the top 100 matches, then do a more expensive search on the full type for those top 100, producing a ranking. For a long time (a few years) I hadn't even bothered doing the second phase of more precise matching, and it still gave reasonable results. (In fact, I <em>never</em> implemented the second phase, but happily <a href="https://github.com/matt-noonan">Matt Noonan</a> <a href="https://github.com/ndmitchell/hoogle/commits?author=matt-noonan">contributed it</a>.)</p><p>A type fingerprint is made up of three parts:</p><ul><li>1 byte being the arity of the function. <code>a -> b -> c</code> would have arity 3.</li>
<li>1 byte being the number of constructors/variables in the type signature. <code>Maybe a -> a</code> would have a value of 3.</li>
<li>The three rarest names in the function. E.g. <code>A -> B -> C -> D</code> would compare how frequent each of <code>A</code>, <code>B</code>, <code>C</code> and <code>D</code> were in the index of functions, and record the 3 rarest. Each name is given a 32 bit value (where 0 is the most common and 2^32 is the rarest).</li>
</ul><p>The idea of arity and number of constructors/variables is to try and get an approximate shape fit to the type being search for. The idea of the rarest names is an attempt to take advantage that if you are searching for <code>ShakeOptions -> [a] -> [a]</code> then you probably didn't write <code>ShakeOptions</code> by accident -- it provides a lot of signal. Therefore, filtering down to functions that mention <code>ShakeOptions</code> probably gives a good starting point.</p><p>Once we have the top 100 matches, we can then start considering whether type classes are satisfied, whether type aliases can be expanded, what the shape of the actual function is etc. By operating on a small and bounded number of types we can do much more expensive comparisons than if we had to apply them to every possible candidate.</p><h2>Conclusion</h2><p>Hoogle 5 is far from perfect, but the performance is good, the scale can keep up with the growth of Haskell packages, and the simplicity has kept maintenance low. The technique of operations which are <em>O(n)</em> but with a small constant is one I've applied in other projects since, and I think is an approach often overlooked.</p>Neil Mitchellhttp://www.blogger.com/profile/13084722756124486154noreply@blogger.com2tag:blogger.com,1999:blog-7094652.post-27473102822634539812020-06-07T12:03:00.000+01:002020-06-07T12:03:15.191+01:00Surprising IO: How I got a benchmark wrong<p><em>Summary: IO evaluation caught me off guard when trying to write some benchmarks.</em></p><p>I once needed to know a quick back-of-the-envelope timing of a pure operation, so hacked something up quickly rather than going via <a href="https://hackage.haskell.org/package/criterion"><code>criterion</code></a>. The code I wrote was:</p><pre><code class="language-haskell">main = do
(t, _) <- duration $ replicateM_ 100 $ action myInput
print $ t / 100
{-# NOINLINE action #-}
action x = do
evaluate $ myOperation x
return ()
</code></pre><p>Reading from top to bottom, it takes the time of running <code>action</code> 100 times and prints it out. I deliberately engineered the code so that GHC couldn't optimise it so <code>myOperation</code> was run only once. As examples of the defensive steps I took:</p><ul><li>The <code>action</code> function is marked <code>NOINLINE</code>. If <code>action</code> was inlined then <code>myOperation x</code> could be floated up and only run once.</li>
<li>The <code>myInput</code> is given as an argument to <code>action</code>, ensuring it can't be applied to <code>myOperation</code> at compile time.</li>
<li>The <code>action</code> is in <code>IO</code> so the it has to be rerun each time.</li>
</ul><p>Alas, GHC still had one trick up its sleeve, and it wasn't even an optimisation - merely the definition of evaluation. The <code>replicateM_</code> function takes <code>action myInput</code>, which is evaluated once to produce a value of type <code>IO ()</code>, and then runs that <code>IO ()</code> 100 times. Unfortunately, in my benchmark <code>myOperation x</code> is actually evaluated in the process of creating the <code>IO ()</code>, not when running the <code>IO ()</code>. The fix was simple:</p><pre><code class="language-haskell">action x = do
_ <- return ()
evaluate $ myOperation x
return ()
</code></pre><p>Which roughly desugars to to:</p><pre><code class="language-haskell">return () >>= \_ -> evaluate (myOperation x)
</code></pre><p>Now the <code>IO</code> produced has a lambda inside it, and my benchmark runs 100 times. However, at <code>-O2</code> GHC used to manage to break this code once more, by lifting <code>myOperation x</code> out of the lambda, producing:</p><pre><code class="language-haskell">let y = myOperation x in return () >>= \_ -> evaluate y
</code></pre><p>Now <code>myOperation</code> runs just once again. I finally managed to defeat GHC by lifting the input into <code>IO</code>, giving:</p><pre><code>action x = do
evaluate . myOperation =<< x
return ()
</code></pre><p>Now the input <code>x</code> is itself in <code>IO</code>, so <code>myOperation</code> can't be hoisted.</p><p>I originally wrote this post a very long time ago, and back then GHC did lift <code>myOperation</code> out from below the lambda. But nowadays it doesn't seem to do so (quite possibly because doing so might cause a space leak). However, there's nothing that promises GHC won't learn this trick again in the future.</p>Neil Mitchellhttp://www.blogger.com/profile/13084722756124486154noreply@blogger.com0tag:blogger.com,1999:blog-7094652.post-66336347868584590702020-05-31T01:15:00.000+01:002020-05-31T01:15:56.744+01:00HLint --cross was accidentally quadratic<p><em>Summary: HLint <code>--cross</code> was accidentally quadratic in the number of files.</em></p>
<p>One of my favourite blogs is <a href="https://accidentallyquadratic.tumblr.com/">Accidentally Quadratic</a>, so when the <a href="https://github.com/ndmitchell/hlint">Haskell linter HLint</a> suffered such a fate, I felt duty bound to write it up. Most HLint hints work module-at-a-time (or smaller scopes), but there is one hint that can process multiple modules simultaneously - the duplication hint. If you write a sufficiently large repeated fragment in two modules, and pass <code>--cross</code>, then this hint will detect the duplication. The actual application of hints is HLint is governed by:</p>
<pre><code class="language-haskell">applyHintsReal :: [Setting] -> Hint -> [ModuleEx] -> [Idea]
</code></pre>
<p>Given a list of settings, a list of hints (which gets merged to a single composite <code>Hint</code>) and a list of modules, produce a list of ideas to suggest. Usually this function is called in parallel with a single module at a time, but when <code>--cross</code> is passed, all the modules being analysed get given in one go.</p>
<p>In <a href="https://neilmitchell.blogspot.com/2020/05/hlint-30.html">HLint 3</a>, <code>applyHintsReal</code> became quadratic in the number of modules. When you have 1 module, 1^2 = 1, and everything works fine, but <code>--cross</code> suffers a lot. The bug was simple. Given a Haskell list comprehension:</p>
<pre><code class="language-haskell">[(a,b) | a <- xs, b <- xs]
</code></pre>
<p>When given the list <code>xs</code> of <code>[1,2]</code> you get back the pairs <code>[(1,1),(1,2),(2,1),(2,2)]</code> - the cross product, which is quadratic in the size of <code>xs</code>. The real HLint code didn't look much different:</p>
<pre><code class="language-haskell">[ generateHints m m'
| m <- ms
, ...
, (nm',m') <- mns'
, ...
]
where
mns' = map (\x -> (scopeCreate (GHC.unLoc $ ghcModule x), x)) ms
</code></pre>
<p>We map over <code>ms</code> to create <code>mns'</code> containing each module with some extra information. In the list comprehension we loop over each module <code>ms</code> to get <code>m</code>, then for each <code>m</code> in <code>ms</code>, loop over <code>mns'</code> to get <code>m'</code>. That means you take the cross-product of the modules, which is quadratic.</p>
<p><strong>How did this bug come about?</strong> HLint used to work against <a href="https://hackage.haskell.org/package/haskell-src-exts"><code>haskell-src-exts</code> (HSE)</a>, but now works against the <a href="https://github.com/digital-asset/ghc-lib">GHC parser</a>. <a href="https://neilmitchell.blogspot.com/2019/06/hlints-path-to-ghc-parser.html">We migrated the hints one by one</a>, changing HLint to thread through both ASTs, and then each hint could pick which AST to use. The patch that <a href="https://github.com/ndmitchell/hlint/commit/0948430ef9b65097d3f1d05fdc66616e22e3e0c6">introduced this behaviour</a> left <code>ms</code> as the HSE AST, and made <code>mns'</code> the GHC AST. It should have zipped these two together, so for each module you have the HSE and GHC AST, but accidentally took the cross-product.</p>
<p><strong>How did we spot it?</strong> <a href="https://k1024.org/">Iustin Pop</a> filed <a href="https://github.com/ndmitchell/hlint/issues/1018">a bug report</a> noting that each hint was repeated once per file being checked and performance had got significantly worse, hypothesising it was <em>O(n^2)</em>. Iustin was right!</p>
<p><strong>How did we fix it?</strong> By the time the bug was spotted, the HSE AST had been removed entirely, and both <code>m</code> and <code>m'</code> were the same type, so <a href="https://github.com/ndmitchell/hlint/commit/31665ab581eafd6792b1b229d75bea493e17780f">deleting one of the loops</a> was easy. The fix is out in HLint version 3.1.4.</p>
<p><strong>Should I be using <code>--cross</code>?</strong> If you haven't heard about <code>--cross</code> in HLint, I don't necessarily suggest you start experimenting with it. The duplicate detection hints are <a href="https://github.com/ndmitchell/hlint/issues/1009#issuecomment-630103050">pretty dubious</a> and I think most people would be better suited with a real duplicate detection tool. I've had good experiences with <a href="https://www.harukizaemon.com/simian/">Simian</a> in the past.</p>
Neil Mitchellhttp://www.blogger.com/profile/13084722756124486154noreply@blogger.com1tag:blogger.com,1999:blog-7094652.post-78939983312937017822020-05-27T21:42:00.000+01:002020-05-27T21:42:11.830+01:00Fixing Space Leaks in Ghcide<p><em>Summary: A performance investigation uncovered a memory leak in unordered-containers and performance issues with Ghcide.</em></p><p>Over the bank holiday weekend, I decided to devote some time to a possible <a href="https://shakebuild.com/">Shake build system</a> performance issue in <a href="https://github.com/digital-asset/ghcide">Ghcide Haskell IDE</a>. As I started investigating (and mostly failed) I discovered a space leak which I eventually figured out, solved, and then (as a happy little accident) got a performance improvement anyway. This post is a tale of what I saw, how I tackled the problem, and how I went forward. As I'm writing the post, not all the threads have concluded. I wrote lots of code during the weekend, but most was only to experiment and has been thrown away - I've mostly left the code to the links. Hopefully the chaotic nature of development shines through.</p><p><strong>Shake thread-pool performance</strong></p><p>I started with <a href="https://github.com/ndmitchell/shake/pull/751">a Shake PR</a> claiming that simplifying the Shake thread pool could result in a performance improvement. Faster and simpler seems like a dream combination. Taking a closer look, simpler seemed like it was simpler because it supported less features (e.g. ability to kill all threads when one has an exception, some fairness/scheduling properties). But some of those features (e.g. better scheduling) were in the pursuit of speed, so if a simpler scheduler was 30% faster (the cost of losing randomised scheduling), that might not matter.</p><p>The first step was to <a href="https://github.com/ndmitchell/shake/pull/751#issuecomment-632634439">write a benchmark</a>. It's very hard to synthesise a benchmark that measures the right thing, but spawning 200K short tasks into the thread pool seemed a plausible start. As promised on the PR, the simpler version did indeed run faster. But interestingly, the simplifications weren't really responsible for the speed difference - switching from <code>forkIO</code> to <code>forkOn</code> explained nearly all the difference. I'm not that familiar with <code>forkOn</code>, so decided to micro-benchmark it - how long does it take to spawn off 1M threads with the two methods. I found two surprising results:</p><ul><li>The performance of <code>forkOn</code> was quadratic! A <a href="https://gitlab.haskell.org/ghc/ghc/issues/18221">GHC bug</a> explains why - it doesn't look too hard to fix, but relying on <code>forkOn</code> is unusual, so its unclear if the fix is worth it.</li>
<li>The performance of <code>forkIO</code> was highly inconsistent. Often it took in the region of 1 second. Sometimes it was massively faster, around 0.1s. A <a href="https://stackoverflow.com/questions/61971292/ghc-forkio-bimodal-performance">StackOverflow question</a> didn't shed much light on <em>why</em>, but did show that by using the <a href="https://hackage.haskell.org/package/pvar/docs/Data-Primitive-PVar.html#t:PVar"><code>PVar concurrency primitive</code></a> it could be 10x faster. There is a <a href="https://gitlab.haskell.org/ghc/ghc/issues/18224">GHC bug tracking the issue</a>, and it seems as though the thread gets created them immediately switches away. There is a suggestion from Simon Peyton Jones of a heuristic that might help, but the issue remains unsolved.</li>
</ul><p>My desire to switch the Shake thread-pool to a quadratic primitive which is explicitly discouraged is low. Trying to microbenchmark with primitives that have inconsistent performance is no fun. The hint towards <code>PVar</code> is super interesting, and I may follow up on it in future, but given the remarks in the GHC tickets I wonder if <code>PVar</code> is merely avoiding one small allocation, and avoiding an allocation avoids a context switch, so it's not a real signal.</p><p>At this point I decided to zoom out and try benchmarking all of Ghcide.</p><p><strong>Benchmarking Ghcide</strong></p><p>The thread about the Shake thread pool pointed at <a href="https://github.com/digital-asset/ghcide/issues/503">a benchmarking approach</a> of making hover requests. I concluded that making a hover request with no file changes would benchmark the part of Shake I thought the improved thread-pool was most likely to benefit. I used the Shake source code as a test bed, and opened a file with 100 transitive imports, then did a hover over the <code>listToMaybe</code> function. I know that will require Shake validating that everything is up to date, and then doing a little bit of hover computation.</p><p>I knew I was going to be running Ghcide a lot, and the Cabal/Stack <code>build</code> steps are frustratingly slow. In particular, every time around Stack wanted to unregister the Ghcide package. Therefore, I wrote a simple <code>.bat</code> file that <a href="https://gist.github.com/ndmitchell/11467985dbf1855e62035fa97248a585#file-test-bat">compiled Ghcide and my benchmark</a> using <code>ghc --make</code>. So I could experiment quickly with changes to Shake, I pulled in all of Shake as source, not as a separate library, with an include path. I have run that benchmark 100's of times, so the fact it is both simple (no arguments) and as fast as I could get has easily paid off.</p><p>For the benchmark itself, I first went down the route of looking at the <a href="https://hackage.haskell.org/package/lsp-test/docs/Language-Haskell-LSP-Test-Replay.html#v:replaySession">replay functionality</a> in <a href="https://hackage.haskell.org/package/lsp-test">lsp-test</a>. Sadly, that code doesn't link to anything that explains how to <em>generate</em> traces. After asking on the <a href="https://webchat.freenode.net/?channels=haskell-ide-engine">haskell-ide-engine IRC</a> I got pointed at both the existing functionality of <a href="https://hackage.haskell.org/package/haskell-lsp-0.22.0.0/docs/Language-Haskell-LSP-Core.html#v:resCaptureFile"><code>resCaptureFile</code></a>. I also got pointed at the <a href="https://github.com/alanz/haskell-lsp/pull/247/files">vastly improved version in a PR</a>, which doesn't fail if two messages race with each other. Configuring that and running it on my benchmark in the IDE told me that the number of messages involved was tiny - pretty much an initialisation and then a bunch of hovers. Coding those directly in <code>lsp-test</code> was trivial, and so <a href="https://gist.github.com/ndmitchell/11467985dbf1855e62035fa97248a585#file-benchmark-hs">I wrote a benchmark</a>. The essence was:</p><pre><code class="language-haskell">doc <- openDoc "src/Test.hs" "haskell"
(t, _) <- duration $ replicateM_ 100 $
getHover doc $ Position 127 43
print t
</code></pre><p>Open a document. Send 100 hover requests. Print the time taken.</p><p><strong>Profiling Ghcide</strong></p><p>Now I could run 100 hovers, I wanted to use the GHC profiling mechanisms. Importantly, the 100 hover requests dominates the loading by a huge margin, so profiles would focus on the right thing. I ran a profile, but it was empty. Turns out the way <code>lsp-test</code> invokes the binary it is testing means it kills it too aggressively to allow GHC to write out profiling information. I changed the benchmark to send a shutdown request at the end, then sleep, and changed Ghcide to abort on a shutdown, so it could write the profiling information.</p><p>Once I had the profiling information, I was thoroughly uniformed. 10% went in file modification checking, which <a href="https://github.com/digital-asset/ghcide/issues/583">could be eliminated</a>. 10% seemed to go to hash table manipulations, which seemed on the high side, but not too significant (turned out I was totally wrong, read to the end!). Maybe 40% went in the Shake monad, but profiling exaggerates those costs significantly, so it's unclear what the truth is. Nothing else stood out, but earlier testing when profiling <code>forkIO</code> operations had shown they weren't counted well, so that didn't mean much.</p><p><strong>Prodding Ghcide</strong></p><p>In the absence of profiling data, I started changing things and measuring the performance. I tried a bunch of things that made no difference, but some things did have an impact on the time to do 100 hovers:</p><ul><li>Running normally: 9.77s. The baseline.</li>
<li>Switching to <code>forkOn</code>: 10.65s. Suggestive that either Ghcide has changed, or the project is different, or platform differences mean that <code>forkOn</code> isn't as advantageous.</li>
<li>Using only one Shake thread: 13.65s. This change had been suggested in one ticket, but made my benchmark worse.</li>
<li>Avoid spawning threads for things I think will be cheap: 7.49s. A useful trick, and maybe one that will be of benefit in future, but for such a significant change a 25% performance reduction seemed poor.</li>
<li>Avoid doing any Shake invalidation: 0.31s. An absolute lower bound if Shake cheats and does nothing.</li>
</ul><p>With all that, I was a bit dejected - performance investigation reveals nothing of note was not a great conclusion from a days work. I think that other changes to Ghcide to <a href="https://github.com/digital-asset/ghcide/pull/554">run Shake less</a> and <a href="https://github.com/wz1000/ghcide/tree/hiedb">cache data more</a> will probably make this benchmark even less important, so the conclusion worsens - performance investigation of nothing of note reveals nothing of note. How sad.</p><p>But in my benchmark I did notice something - a steadily increasing memory size in process explorer. Such issues are pretty serious in an interactive program, and <a href="https://github.com/digital-asset/ghcide/pull/557">we'd fixed several issues recently</a>, but clearly there were more. Time to change gears.</p><p><strong>Space leak detection</strong></p><p>Using the benchmark I observed a space leak. But the program is huge, and manual code inspection usually needs a 10 line code fragment to have a change. So I started modifying the program to do less, and continued until the program did as little as it could, but still leaked space. After I fixed a space leak, I zoomed out and saw if the space leak persisted, and then had another go.</p><p>The first investigation took me into the Shake Database module. I found that if I ran the Shake script to make everything up to date, but did no actions inside, then there was a space leak. Gradually commenting out lines (over the course of several hours) eventually took me to:</p><pre><code class="language-haskell">step <- pure $ case v of
Just (_, Loaded r) -> incStep $ fromStepResult r
_ -> Step 1
</code></pre><p>This code increments a step counter on each run. In normal Shake this counter is written to disk each time, which forces the value. In Ghcide we use Shake in memory, and nothing ever forced the counter. The change was simple - replace <code>pure</code> with <code>evaluate</code>. This fix has been <a href="https://github.com/ndmitchell/shake/commit/8da74bab4a2466b52f8ddc50b75a56139eecb273">applied to Shake HEAD</a>.</p><p><strong>Space leak detection 2</strong></p><p>The next space leak took me to the Shake database <code>reset</code> function, which moves all Shake keys from <code>Ready</code> to <code>Loaded</code> when a new run starts. I determined that if you didn't run this function, the leak went away. I found a few places I should have <a href="https://github.com/ndmitchell/shake/commit/04b0fb349a5e8ff84c073f9751bcef11b3928570">put strictness annotations</a>, and a function that <a href="https://github.com/ndmitchell/shake/commit/ddf5e2d2020decc44f08c2d5482b8941c5c6d816">mutated an array lazily</a>. I reran the code, but the problem persisted. I eventually realised that if you don't call <code>reset</code> then none of the user rules run either, which was really what was fixing the problem - but I committed the improvements I'd made even though they don't fix any space leaks.</p><p>By this point I was moderately convinced that Shake wasn't to blame, so turned my attention to the user rules in Ghcide. I stubbed them out, and the leak went away, so that looked plausible. There were 8 types of rules that did meaningful work during the hover operation (things like <code>GetModificationTime</code>, <code>DoesFileExist</code>, <code>FilesOfInterest</code>). I picked a few in turn, and found they all leaked memory, so picked the simple <code>DoesFileExist</code> and looked at what it did.</p><p>For running <code>DoesFileExist</code> I wrote a very quick "bailout" version of the rule, equivalent to the "doing nothing" case, then progressively enabled more bits of the rule before bailing out, to see what caused the leak. The bailout looked like:</p><pre><code class="language-haskell">Just v <- getValues state key file
let bailout = Just $ RunResult ChangedNothing old $ A v
</code></pre><p>I progressively enabled more and more of the rule, but even with the whole rule enabled, the leak didn't recur. At that point, I realised I'd introduced a syntax error and that all my measurements for the last hour had been using a stale binary. Oops. I span up a copy of <a href="https://github.com/ndmitchell/ghcid">Ghcid</a>, so I could see syntax errors more easily, and repeated the measurements. Again, the leak didn't recur. Very frustrating.</p><p>At that point I had two pieces of code, one which leaked and one which didn't, and the <em>only</em> difference was the unused <code>bailout</code> value I'd been keeping at the top to make it easier to quickly give up half-way through the function. Strange though it seemed, the inescapable conclusion was that <code>getValues</code> must somehow be fixing the space leak.</p><p>If <code>getValues</code> fixes a leak, it is a likely guess that <code>setValues</code> is causing the leak. I modified <code>setValues</code> to also call <code>getValues</code> and the problem went away. But, after hours of staring, I couldn't figure out why. The code of <code>setValues</code> read:</p><pre><code class="language-haskell">setValues state key file val = modifyVar_ state $ \vals -> do
evaluate $ HMap.insert (file, Key key) (fmap toDyn val) vals
</code></pre><p>Namely, modify a strict <code>HashMap</code> from <a href="https://hackage.haskell.org/package/unordered-containers"><code>unordered-containers</code></a>, forcing the result. After much trial and error I determined that a "fix" was to add:</p><pre><code class="language-haskell">case HMap.lookup k res of
Nothing -> pure ()
Just v -> void $ evaluate v
</code></pre><p>It's necessary to insert into the strict <code>HashMap</code>, then do a <code>lookup</code>, then evaluate the result that comes back, or there is a space leak. I duly <a href="https://github.com/digital-asset/ghcide/pull/586">raised a PR to Ghcide</a> with the unsatisfying comment:</p><blockquote><p>I'm completely lost, but I do have a fix.</p></blockquote><p>It's nice to fix bugs. It's better to have some clue why a fix works.</p><p><strong>Space leak in <code>HashMap</code></strong></p><p>My only conclusion was that <code>HashMap</code> must have a space leak. I took a brief look at the code, but it was 20+ lines and nothing stood out. I wrote a benchmark that inserted billions of values at 1000 random keys, but it didn't leak space. I puzzled it over in my brain, and then about a day later inspiration struck. One of the cases was to deal with collisions in the <code>HashMap</code>. Most <code>HashMap</code>'s don't have any collisions, so a bug hiding there could survive a very long time. I wrote a benchmark with colliding keys, and lo and behold, it leaked space. Concretely, it leaked 1Gb/s, and brought my machine to its knees. The benchmark inserted three keys all with the same hash, then modified one key repeatedly. I posted the <a href="https://github.com/tibbe/unordered-containers/issues/254">bug to the <code>unordered-containers</code> library</a>.</p><p>I also looked at the code, figured out why the space leak was occurring, and a potential fix. However, the fix requires duplicating some code, and its likely the same bug exists in several other code paths too. The <code>Lazy</code> vs <code>Strict</code> approach of <code>HashMap</code> being dealt with as an outer layer doesn't quite work for the functions in question. I took a look at the PR queue for <code>unordered-containers</code> and saw 29 requests, with the recent few having no comments on them. That's a bad sign and suggested that spending time preparing a PR might be in vain, so I didn't.</p><p>Aside: Maintainers get busy. It's no reflection negatively on the people who have invested lots of time on this library, and I thank them for their effort! Given <a href="https://packdeps.haskellers.com/reverse/unordered-containers">1,489 packages on Hackage</a> depend on it, I think it could benefit from additional bandwidth from someone.</p><p><strong>Hash collisions in Ghcide</strong></p><p>While hash collisions leading to space leaks is bad, having hash collisions at all is also bad. I augmented the code in Ghcide to print out hash collisions, and saw collisions between <code>("Path.hs", Key GetModificationTime)</code> and <code>("Path.hs", Key DoesFileExist)</code>. Prodding a bit further I saw that the <code>Hashable</code> instance for <code>Key</code> only consulted its argument value, and given most key types are simple <code>data Foo = Foo</code> constructions, they all had the same hash. The solution was to mix in the type information stored by <code>Key</code>. I changed to the definition:</p><pre><code class="language-haskell">hashWithSalt salt (Key key) = hashWithSalt salt (typeOf key) `xor` hashWithSalt salt key
</code></pre><p>Unfortunately, that now gave hash collisions with different paths at the same key. I looked into the hashing for the path part (which is really an <code>lsp-haskell-types</code> <code>NormalizedFilePath</code>) and saw that it used an optimised hashing scheme, precomputing the hash, and returning it with <code>hash</code>. I also looked at the <code>hashable</code> library and realised the authors of <code>lsp-haskell-types</code> hadn't implemented <code>hashWithSalt</code>. If you don't do that, a generic instance is constructed which deeply walks the data structure, completely defeating the <code>hash</code> optimisation. A <a href="https://github.com/alanz/haskell-lsp/pull/248">quick PR fixes that</a>.</p><p>I also found that for tuples, the types are combined by using the <code>salt</code> argument. Therefore, to hash the pair of path information and <code>Key</code>, the <code>Key</code> <code>hashWithSalt</code> gets called with the <code>hash</code> of the path as its salt. However, looking at the definition above, you can imagine that both <code>hashWithSalt</code> of a type and <code>hashWithSalt</code> of a key expand to something like:</p><pre><code class="language-haskell">hashWithSalt salt (Key key) = salt `xor` hash (typeOf key) `xor` (salt `xor` 0)
</code></pre><p>Since <code>xor</code> is associative and commutative, those two <code>salt</code> values cancel out! While I wasn't seeing complete cancellation, I was seeing quite a degree of collision, so I changed to:</p><pre><code class="language-haskell">hashWithSalt salt (Key key) = hashWithSalt salt (typeOf key, key)
</code></pre><p>With that <a href="https://github.com/digital-asset/ghcide/pull/588">fix in Ghcide</a>, all collisions went away, and all space leaks left with them. I had taken this implementation of hash combining from Shake, and while it's not likely to be a problem in the setting its used there, <a href="https://github.com/ndmitchell/shake/commit/297c60fa0c6b0d4e98f61b9cdb1359a409cda901">I've fixed it in Shake too</a>.</p><p><strong>Benchmarking Ghcide</strong></p><p>With the hash collisions reduced, and the number of traversals when computing a hash reduced, I wondered what the impact was on performance. A rerun of the original benchmark showed the time had reduced to 9.10s - giving a speed up of about 5%. Not huge, but welcome.</p><p>Several days later we're left with less space leaks, more performance, and hopefully a better IDE experience for Haskell programmers. I failed in what I set out to do, but found some other bugs along the way, leading to 9 PRs/commits and 4 ongoing issues. I'd like to thank everyone in the Haskell IDE team for following along, making suggestions, confirming suspicions, and generally working as a great team. <a href="https://neilmitchell.blogspot.com/2020/01/one-haskell-ide-to-rule-them-all.html">Merging the Haskell IDE efforts</a> continues to go well, both in terms of code output, and team friendliness.</p>Neil Mitchellhttp://www.blogger.com/profile/13084722756124486154noreply@blogger.com0tag:blogger.com,1999:blog-7094652.post-51055917498863959862020-05-23T20:29:00.001+01:002020-05-23T20:29:36.971+01:00Shake 0.19 - changes to process execution<p><em>Summary: The new version of Shake has some tweaks to how stdin works with <code>cmd</code>.</em></p><p>I've just released <a href="https://shakebuild.com/">Shake 0.19</a>, see the <a href="https://github.com/ndmitchell/shake/blob/master/CHANGES.txt">full change log</a>. Most of the interesting changes in this release are around the <code>cmd</code>/<code>command</code> functions, which let you easily run command lines. As an example, Shake has always allowed:</p><pre><code class="language-haskell">cmd "gcc -c" [source] "-o" [output]
</code></pre><p>This snippet compiles a source file using <code>gcc</code>. The <a href="https://hackage.haskell.org/package/shake/docs/Development-Shake-Command.html#v:cmd"><code>cmd</code> function</a> is variadic, and treats strings as space-separated arguments, and lists as literal arguments. It's overloaded by return type, so can work in the <code>IO</code> monad (entirely outside Shake) or the Shake <code>Action</code> monad (inside Shake). You can capture results and pass in options, e.g. to get the standard error and run in a different directory, you can do:</p><pre><code class="language-haskell">Stderr err <- cmd "gcc -c" [source] "-o" [output] (Cwd "src")
</code></pre><p>Shake is a dynamic build system with advanced dependency tracking features that let's you write your rules in Haskell. It just so happens that running commands is <em>very</em> common in build systems, so while not really part of a build system, it's a part of Shake that has had a lot of work done on it. Since the command is both ergonomic and featureful, I've taken to using the <a href="https://hackage.haskell.org/package/shake/docs/Development-Shake-Command.html">module <code>Develoment.Shake.Command</code></a> in non-Shake related projects.</p><p><strong>Recent <code>cmd</code> changes</strong></p><p>The first API breaking change only impacts users of the <a href="https://neilmitchell.blogspot.com/2020/05/file-tracing.html">file access tracing</a>. The resulting type is now polymorphic, and if you opt to for the <code>FSATrace ByteString</code>, you'll get your results a few milliseconds faster. Even if you stick with <code>FSATrace FilePath</code>, you'll get your results faster than the previous version. Performance of tracing happened to matter for a project I've been working on :-).</p><p>The other changes in this release are to process groups and the standard input. In Shake 0.18.3, changes were made to switch to <a href="https://hackage.haskell.org/package/process/docs/System-Process.html#t:CreateProcess"><code>create_group=True</code></a> in the <a href="https://hackage.haskell.org/package/process">process library</a>, as that improves the ability to cancel actions and clean up sub-processes properly. Unfortunately, on Linux that caused <a href="https://github.com/ndmitchell/shake/issues/748">processes that read from standard input to hang</a>. The correlation between these events, and the exact circumstances that triggered it, took a long time to track down - thanks to <a href="https://gergo.erdi.hu/">Gergő Érdi</a> for some <a href="https://github.com/ndmitchell/shake/issues/748#issuecomment-596450021">excellent bisection work</a>. Most processes that are run in a build system <em>should not</em> access the standard input, and the only reports have come from <code>docker</code> (don't use <code>-i</code>) and <code>ffmpeg</code> (pass <code>-nostdin</code>), but hanging is a very impolite way to fail. In older versions of Shake we inherited the Shake stdin to the child (unless you specified the stdin explicitly <a href="https://hackage.haskell.org/package/shake/docs/Development-Shake.html#v:Stdin">with <code>Stdin</code></a>), but now we create a new pipe with no contents. There are now options <a href="https://hackage.haskell.org/package/shake/docs/Development-Shake.html#v:NoProcessGroup"><code>NoProcessGroup</code></a> and <a href="https://hackage.haskell.org/package/shake/docs/Development-Shake.html#v:InheritStdin"><code>InheritStdin</code></a> which let you change these settings independently. I suspect a handful of commands will need flag tweaks to stop reading the stdin, but they will probably fail saying the stdin is inaccessible, so debugging it should be relatively easy.</p><p>In another tale of <code>cmd</code> not working how you might hope, in Shake 0.15.2 we changed <code>cmd</code> to <a href="https://hackage.haskell.org/package/process-1.6.9.0/docs/System-Process.html#v:close_fds">close file handles</a> when spawning a process. Unfortunately, that step is <em>O(n)</em> in the number of potential handles on your system, where <em>n</em> is <code>RLIMIT_NOFILE</code> and can be quite big, so we switched back in 0.18.4. Since 0.18.4 you can pass <code>CloseFileHandles</code> if you definitely want handles to be closed. It's been argued that <a href="https://www.microsoft.com/en-us/research/publication/a-fork-in-the-road/"><code>fork</code> is a bad design</a>, and this performance vs safety trade-off seems another point in favour of that viewpoint.</p><p>The amount of work that has gone into processes, especially around timeout and cross-platform differences, has been huge. I see 264 commits to these files, but the debugging time associated with them has been many weeks!</p><p><strong>Other changes</strong></p><p>This release contains other little tweaks that might be useful:</p><ul><li>Time spent in the <a href="https://hackage.haskell.org/package/shake/docs/Development-Shake.html#v:batch"><code>batch</code> function</a> is better accounted for in profiles.</li>
<li>Finally deleted the stuff that has been deprecated since 2014, particularly the <code>*></code> operator. I think a six year deprecation cycle seems more than fair for a pre-1.0 library.</li>
<li>Optimised modification time on Linux.</li>
</ul>Neil Mitchellhttp://www.blogger.com/profile/13084722756124486154noreply@blogger.com0tag:blogger.com,1999:blog-7094652.post-90024126236374192722020-05-20T15:20:00.000+01:002020-05-20T15:20:37.527+01:00GHC Unproposals<p><em>Summary: Four improvements to Haskell I'm not going to raise as GHC proposals.</em></p><p>Writing a <a href="https://github.com/ghc-proposals/ghc-proposals">GHC proposal</a> is a lot of hard work. It requires you to fully flesh out your ideas, and then defend them robustly. That process can take <a href="https://neilmitchell.blogspot.com/2018/12/ghc-from-bug-to-merge.html">many months</a>. Here are four short proposals that I won't be raising, but think would be of benefit (if you raise a proposal for one of them, I'll buy you a beer next time we are physically co-located).</p><p><strong>Use <code>:</code> for types</strong></p><p>In Haskell we use <code>:</code> for list-cons and <code>::</code> for types. That's the <a href="https://neilmitchell.blogspot.com/2018/11/counting-cost-of-colons-in-haskell.html">wrong way around</a> - the use of types is increasing, the use of lists is decreasing, and type theory has always used <code>:</code>. This switch has been <a href="https://github.com/ghc-proposals/ghc-proposals/pull/118">joke-proposed before</a>. We actually switched these operators <a href="https://medium.com/daml-driven/four-tweaks-to-improve-haskell-b1de9c87f816">in DAML</a>, and it worked very nicely. Having written code in both styles, I now write Haskell on paper with <code>:</code> for types instead of <code>::</code>. Nearly all other languages use <code>:</code> for types, <a href="https://docs.python.org/3/library/typing.html">even Python</a>. It's sad when Python takes the more academically pure approach than Haskell.</p><p><em>Is it practical</em>: Maybe. The compiler diff is quite small, so providing it as an option has very little technical cost. The problem is it bifurcates the language - example code will either work with <code>:</code> for types or <code>::</code> for types. It's hard to write documentation, text books etc. If available, I would switch my code.</p><p><strong>Make recursive <code>let</code> explicit</strong></p><p>Currently you can write <code>let x = x + 1</code> and it means loop forever at runtime because <code>x</code> is defined in terms of itself. You probably meant to refer to the enclosing <code>x</code>, but you don't get a type error, and often don't even get a good runtime error message, just a hang. In <code>do</code> bindings, to avoid the implicit reference to self, it's <a href="https://neilmitchell.blogspot.com/2020/03/the-pure-pattern.html">common to write</a> <code>x <- pure $ x + 1</code>. That can impose a runtime cost, and obscure the true intent.</p><p>In languages like OCaml there are two different forms of <code>let</code> - one which allows variables to be defined and used in a single <code>let</code> (spelled <code>let rec</code>) and one which doesn't (spelled <code>let</code>). Interestingly, this distinction is important in GHC Core, which has two different keywords, and a source <code>let</code> desugars differently based on whether it is recursive. I think Haskell should add <code>letrec</code> as a separate keyword and make normal <code>let</code> non-recursive. Most recursive bindings are done under a <code>where</code>, and these would continue to allow full recursion, so most code wouldn't need changing.</p><p><em>Is it practical</em>: The simplest version of this proposal would be to add <code>letrec</code> as a keyword equivalent to <code>let</code> and add a <a href="https://gitlab.haskell.org/ghc/ghc/issues/14527">warning on recursive <code>let</code></a>. Whether it's practical to go the full way and redefine the semantics of <code>let</code> to mean non-recursive binding depends on how strong the adoption of <code>letrec</code> was, but given that I suspect recursive <code>let</code> is less common, it seems like it could work. Making Haskell a superset of GHC Core is definitely an attractive route to pursue.</p><p><strong>Allow trailing variables in bind</strong></p><p>When writing Haskell code, I often have <code>do</code> blocks that I'm in the middle of fleshing out, e.g.:</p><pre><code>do fileName <- getLine
src <- readFile fileName
</code></pre><p>My next line will be to print the file or similar, but this entire <code>do</code> block, and every sub-part within it, is constantly a parse error until I put in that final line. When the IDE has a parse error, it can't really help me as much as I'd like. The reason for the error is that <code><-</code> can't be the final line of a <code>do</code> block. I think we should relax that restriction, probably under a language extension that only IDE's turn on. It's not necessarily clear <a href="https://twitter.com/ndm_haskell/status/1223215216739201030">what such a construct should mean</a>, but in many ways that isn't the important bit, merely that such a construct results in a valid Haskell program, and allows more interactive feedback.</p><p><em>Is it practical</em>: Yes, just add a language extension - since it doesn't actually enable any new power it's unlikely to cause problems. Fleshing out the semantics, and whether it applies to <code>let x = y</code> statements in a <code>do</code> block too, is left as an exercise for the submitter. An alternative would be to not change the language, but make GHC emit the error slightly later on, much like <code>-fdefer-type-errors</code>, which still works for IDEs (either way needs a GHC proposal).</p><p><strong>Add an exporting keyword</strong></p><p>Currently the top of every Haskell file duplicates all the identifiers that are exported - unless you just export everything (which you shouldn't). That approach duplicates logic, makes refactorings like renamings more effort, and makes it hard to immediately know if the function you are working on is exposed. It would be much nicer if you could just declare things that were exported inline, e.g. with a <code>pub</code> keyword - so <code>pub myfunc :: a -> a</code> both defines and exports <code>myfunc</code>. Rust has <a href="https://doc.rust-lang.org/reference/visibility-and-privacy.html">taken this approach</a> and it works out quite well, modulo <a href="https://twitter.com/ndm_haskell/status/1254015262451535874">some mistakes</a>. The currently Haskell design has been found a bit wanting, with constructs like <code>pattern Foo</code> in the export list to differentiate when multiple names <code>Foo</code> might be in scope, when attaching the visibility to the identifier would be much easier.</p><p><em>Is it practical</em>: Perhaps, provided someone doesn't try and take the proposal too far. It would be super tempting to differentiate between exports to of the package, and exports that are only inside this package (what Rust clumsily calls <code>pub(crate)</code>). And there are other things in the module system that could be improved. And maybe we should export submodules. I suspect everyone will want to pile more things into this design, to the point it breaks, but a simple exporting keyword would probably be viable.</p>Neil Mitchellhttp://www.blogger.com/profile/13084722756124486154noreply@blogger.com2tag:blogger.com,1999:blog-7094652.post-88713405568813115832020-05-15T09:39:00.000+01:002020-05-15T11:42:55.254+01:00File Access Tracing<p><em>Summary: It is useful to trace files accessed by a command. Shake and FSATrace provide some tools to do that.</em></p><p>When writing a build system, it's useful to see which files a command accesses. In <a href="https://shakebuild.com/">the Shake build system</a>, we use that information for <a href="https://shakebuild.com/lint">linting</a>, an <a href="https://hackage.haskell.org/package/shake/docs/Development-Shake.html#v:AutoDeps">auto-deps feature</a> and a <a href="https://hackage.haskell.org/package/shake/docs/Development-Shake-Forward.html">forward build mode</a>. What we'd like is a primitive which when applied to a command execution:</p><ol><li>Reports which files are read/written.</li>
<li>Reports the start and end time for when the files were accessed.</li>
<li>Reports what file metadata is accessed, e.g. modification times and directory listing.</li>
<li>Lets us pause a file access (so the dependencies can be built) or deny a file access (so dependency violations can be rejected early).</li>
<li>Is computationally cheap.</li>
<li>Doesn't require us to write/maintain too much low-level code.</li>
<li>Works on all major OSs (Linux, Mac, Windows).</li>
<li>Doesn't require <code>sudo</code> or elevated privilege levels.</li>
</ol><p>While there are lots of approaches to tracing that get some of those features, it is currently impossible to get them all. Therefore, Shake has to make compromises. The first fours bullet points are about features -- we give up on 2 (timestamps) and 4 (pause/deny); 1 (read/writes) is essential, and we make 3 (metadata) optional, using the imperfect information when its available and tolerating its absence. The last four bullet points are about how it works -- we demand 7 (compatibility) and 8 (no sudo) because Shake must be easily available to its users. We strive for 5 (cheap) and 6 (easy), but are willing to compromise a bit on both.</p><p>Shake abstracts the result behind the <a href="https://hackage.haskell.org/package/shake/docs/Development-Shake-Command.html#v:cmd"><code>cmd</code> function</a> with the <a href="https://hackage.haskell.org/package/shake/docs/Development-Shake.html#t:FSATrace"><code>FSATrace</code> return type</a>. As an example I ran in GHCi:</p><pre><code class="language-haskell">traced :: [FSATrace] <- cmd "gcc -c main.c"
print traced
</code></pre><p>Which compiles <code>main.c</code> with <code>gcc</code>, and on my machine prints 71 entries, including:</p><pre><code>[ FSARead "C:\\ghc\\ghc-8.6.3\\mingw\\bin\\gcc.exe"
, FSARead "C:\\Neil\\temp\\main.c"
, FSAWrite "C:\\Users\\ndmit_000\\AppData\\Local\\Temp\\ccAadCiR.s"
, FSARead "C:\\ghc\\ghc-8.6.3\\mingw\\bin\\as.exe"
, FSARead "C:\\Users\\ndmit_000\\AppData\\Local\\Temp\\ccAadCiR.s"
, FSAWrite "C:\\Neil\\temp\\main.o"
, ...
]
</code></pre><p>Most of the remaining entries are dlls that <code>gcc.exe</code> uses, typically from the Windows directory. I've reordered the list to show the flow more clearly. First the process reads <code>gcc.exe</code> (so it can execute it), which reads <code>main.c</code> and writes a temporary file <code>ccAadCiR.s</code>. It then reads <code>as.exe</code> (the assembler) so it can run it, which in turn reads <code>ccAadCiR.s</code> and writes <code>main.o</code>.</p><p>Under the hood, Shake currently uses <a href="https://github.com/jacereda/fsatrace">FSATrace</a>, but that is an implementation detail -- in particular the <a href="https://github.com/droundy/bigbro">BigBro</a> library might one day <a href="https://github.com/droundy/bigbro/issues/6">also be supported</a>. In order to understand the limitations of the above API, it's useful to understand the different approaches to file system tracing, and which ones FSATrace uses.</p><p><strong>Syscall tracing</strong> On Linux, <a href="https://www.linuxjournal.com/article/6100"><code>ptrace</code></a> allows tracing every system call made, examining the arguments, and thus recording the files accessed. Moreover, by tracing the <code>stat</code> system call even file queries can be recorded. The syscall tracking approach can be made complete, but because <em>every</em> syscall must be hooked, can end up imposing high overhead. This approach is used by BigBro as well as numerous other debugging and instrumentation tools.</p><p><strong>Library preload</strong> On both Linux and Mac most programs use a dynamically linked C library to make file accesses. By using <code>LD_LIBRARY_PRELOAD</code> it is possible to inject a different library into the program memory which intercepts the relevant C library calls, recording which files are read and written. This approach is simpler than hooking syscalls, but only works if all syscall access is made through the C library. While normally true, that isn't the case for <a href="https://golang.org/">Go programs</a> (syscalls are invoked directly) or statically linked programs (the C library cannot be replaced).</p><p>While the technique works on a Mac, from Mac OS X 1.10 onwards system binaries can't be traced due to <a href="https://developer.apple.com/library/content/documentation/Security/Conceptual/System_Integrity_Protection_Guide/ConfiguringSystemIntegrityProtection/ConfiguringSystemIntegrityProtection.html">System Integrity Protection</a>. As an example, the C compiler is typically installed as a system binary. It is possible to disable System Integrity Protection (but not recommended by Apple); or to use non-system binaries (e.g. those supplied by <a href="https://nixos.org/nix/">Nix</a>); or to copy the system binary to a temporary directory (which works provided the binary does not afterwards invoke another system binary). The library preload mechanism is implemented by FSATrace and the copying system binaries trick on Mac is implemented in Shake.</p><p><strong>File system tracing</strong> An alternative approach is to implement a custom file system and have that report which files are accessed. One such implementation for Linux is <a href="https://github.com/jacereda/traced-fs">TracedFS</a>, which is unfortunately not yet complete. Such an approach can track all accesses, but may require administrator privileges to mount a file system.</p><p><strong>Custom Linux tracing</strong> On Linux, thanks to the open-source nature of the kernel, there are many custom file systems (e.g <a href="https://github.com/libfuse/libfuse">FUSE</a>) and tracing mechanisms (e.g. <a href="http://www.brendangregg.com/ebpf.html">eBPF</a>), many of which can be used/configured/extended to perform some kind of system tracing. Unfortunately, most of these are restricted to Linux only.</p><p><strong>Custom Mac tracing</strong> <a href="https://github.com/Microsoft/BuildXL/">BuildXL</a> uses a <a href="https://github.com/Microsoft/BuildXL/blob/master/Documentation/Specs/Sandboxing.md#macos-sandboxing">Mac sandbox</a> based on <a href="https://flylib.com/books/en/3.126.1.140/1/">KAuth</a> combined with <a href="http://www.trustedbsd.org/mac.html">TrustedBSD Mandatory Access Control (MAC)</a> to both detect which files are accessed and also block access to specific files. The approach is based on internal Mac OS X details which have been reversed engineered, some of which are deprecated and scheduled for removal.</p><p><strong>Windows Kernel API hooking</strong> On Windows it is possible to hook the Kernel API, which can be used to detect when any files are accessed. Implementing such a hook <a href="https://github.com/jacereda/fsatrace/blob/master/src/win/patch.c">is difficult</a>, particularly around 32bit v 64bit differences, as <a href="https://stackoverflow.com/questions/494284/createremotethread-32-64-and-or-64-32">custom assembly language trampolines must be used</a>. Furthermore, some antivirus products (incorrectly) detect such programs as viruses. Windows kernel hooking is available in both FSATrace and BigBro (sharing the same source code), although without support for 32bit processes that spawn 64bit processes.</p><p><strong>Current State</strong></p><p>Shake currently uses FSATrace, meaning it uses library preloading on Linux/Mac and kernel hooking on Windows. The biggest practical limitations vary by OS:</p><ul><li>On <strong>Linux</strong> it can't trace into Go programs (or other programs that use system calls directly) and statically linked binaries. Integrating BigBro as an alternative would address these issues.</li>
<li>On <strong>Mac</strong> it can't trace into system binaries called from other system binaries, most commonly the system C/C++ compiler. Using your own C/C++ installation, via <a href="https://brew.sh/">Homebrew</a> or Nix, is a workaround.</li>
<li>On <strong>Windows</strong> it can't trace 64bit programs spawned by 32bit programs. In most cases the 32bit binaries can easily be replaced by 64bit binaries. The only problem I've seen was caused by a five year-old version of <code>sh</code> hiding out in my <code>C:\bin</code> directory, which was easily remedied with a newer version. The code to fix this issue <a href="https://github.com/rapid7/metasploit-payloads/blob/master/c/meterpreter/source/metsrv/base_inject.c">is available</a>, but scares me too much to try integrating.</li>
</ul><p>Overall, the tracing available in Shake has a simple API, is very useful for Shake, and has been repurposed in <a href="https://blogs.ncl.ac.uk/andreymokhov/stroll/">other build systems</a>. But I do dearly wish such functionality could be both powerful and standardised!</p>Neil Mitchellhttp://www.blogger.com/profile/13084722756124486154noreply@blogger.com2tag:blogger.com,1999:blog-7094652.post-77511142213549428302020-05-03T15:07:00.000+01:002020-05-03T15:07:28.408+01:00HLint 3.0<p><em>Summary: HLint 3.0 uses the GHC parser.</em></p><p>In <a href="https://neilmitchell.blogspot.com/2019/06/hlints-path-to-ghc-parser.html">June 2019</a> I posted about our intention to move HLint to the GHC parser. Since then a small group of us have been hard at work making the conversion -- first by parsing with both GHC and haskell-src-exts, and finally, with the newly released <a href="https://hackage.haskell.org/package/hlint">HLint 3.0</a>, parsing <em>only</em> with <a href="https://www.haskell.org/ghc/">GHC</a>. As of now, if your code can be parsed with GHC, it can probably be parsed with HLint. As new GHC releases come out, with new features and new forms of syntax, HLint will follow along closely.</p><p>The <a href="https://github.com/ndmitchell/hlint/blob/master/CHANGES.txt">change list for this release</a> records 51 separate items, which is about as many as the last nine HLint releases combined. Of those changes, 11 are breaking changes (the ones marked with <code>*</code>). That count omits all the hint conversions, which (hopefully!) aren't user visible. The main API breaks are in the <code>Language.Haskell.HLint</code> API, which has switched from <a href="https://hackage.haskell.org/package/haskell-src-exts">haskell-src-exts types</a> to GHC ones. You can now take a GHC syntax tree and apply HLint to it, or (as before) give HLint the source and have it do the parsing for you. We also took the opportunity to simplify the API while we were at it -- but the underlying functionality remains much the same. We also deleted a small number of command line flags that were no longer useful, and were never used very much. If you have difficulty converting to the new API or relied on some removed functionality, <a href="https://github.com/ndmitchell/hlint/issues">raise a bug</a>.</p><p>What was especially nice about this conversion process, and the development of HLint in general, is that it is increasingly becoming a real team, where my role is more reviewer than coder. There have been <a href="https://github.com/ndmitchell/hlint/graphs/contributors?from=2019-06-10&to=2020-05-13&type=c">21 distinct contributors</a> since the start of the GHC conversion, but I'd like to particularly call out a few major pieces of work that have been completed:</p><ul><li><a href="https://github.com/shayne-fletcher">Shayne Fletcher</a> is responsible for the <a href="https://hackage.haskell.org/package/ghc-lib-parser"><code>ghc-lib-parser</code> library</a> that makes it feasible to use a single GHC API across multiple GHC versions. Without that, using the GHC library would be at least double the work (and just wouldn't be feasible). Shayne also did a lot of the conversion, mapping many of the rule types from haskell-src-exts to GHC. These API's are surprisingly different given they have the same underlying representation.</li>
<li><a href="https://github.com/googleson78">Georgi Lyubenov</a> did most of the conversions that Shayne didn't do.</li>
<li><a href="https://github.com/josephcsible">Joseph C. Sible</a> has made sure that while efforts were focused on a complete rewrite of the code base, the underlying hints have continued to improve, removing incorrect hints, adding useful additional hints.</li>
<li><a href="https://github.com/zliu41">Ziyang Liu</a> has focused on the refactoring side of HLint. The <a href="https://mpickering.github.io/gsoc2015.html">initial refactoring work</a> was completed by Matthew Pickering as part of GSoC 2015. Since then it's had mild attention at best. Ziyang has stepped into the gap, importantly adding tests, CI and improving the refactorings in lots of places. It now feels like a real part of HLint.</li>
</ul><p>We hope you enjoy HLint 3.0 and beyond!</p><p>PS. You may spot we're already on <a href="https://hackage.haskell.org/package/hlint">HLint 3.0.2 on Hackage</a> - thanks to <a href="https://github.com/RyanGlScott">Ryan Scott</a> for already finding a few bugs.</p>Neil Mitchellhttp://www.blogger.com/profile/13084722756124486154noreply@blogger.com0tag:blogger.com,1999:blog-7094652.post-22874280984144072122020-04-28T21:48:00.000+01:002020-04-28T21:49:38.642+01:00Writing a fast interpreter<p><em>Summary: Interpretation by closure is a lot faster than I expected.</em></p><p>Let's imagine we have an imperative language (expressions, assignments, loops etc.) and we want to write an interpreter for it. What styles of interpreter are there? And how fast do they perform? I was curious, so I wrote a demo with some benchmarks. The full code, in Rust, <a href="https://gist.github.com/ndmitchell/084bb1a8dd30188492fcd0b8b8c70c6e">is available here</a>.</p><p>First, let's get a taste of the the mini-language we want to interpret:</p><pre><code class="language-rust">x = 100;
y = 12;
while x != 0 {
y = y + 1 + y + 3;
x = x - 1;
}
</code></pre><p>We can immediately translate <code>x</code> and <code>y</code> from being named variables to indices in an array, namely <code>0</code> and <code>1</code>. Once we've done that there are four interpretation techniques that spring to mind (and in brackets their performance on my benchmark):</p><ol><li>Interpret the AST directly (2.1s).</li>
<li>Compile from the AST to closures (1.4s).</li>
<li>Compile from the AST to a stream of instructions (1.5s).</li>
<li>Encode those instructions as bytes (1.5s).</li>
<li>Compile to assembly or JIT - I didn't try this approach (it's a lot more work).</li>
</ol><p>All of these are vastly slower than my version written directly in Rust (which takes a mere 0.003s) -- but my benchmark didn't have any real operations in it, so this comparison will be the absolute worst case.</p><p>Let's go through the approaches.</p><p><strong>Style 1: AST evaluation</strong></p><p>One option is to directly interpret the AST. Given a vector named <code>slots</code> representing the variables by index, we need to change the <code>slots</code> as we go. A fragment of the interpreter might look like:</p><pre><code class="language-rust">fn f(x: &Expr, slots: &mut Vec<i64>) -> i64 {
match x {
Expr::Lit(i) => *i,
Expr::Var(u) => slots[*u],
Expr::Add(x, y) => f(x, slots) + f(y, slots),
...
}
}
</code></pre><p>It's as simple as the options come. Given the expression and the slots we need to write to, we do whatever the instruction tells us. But the low simplicity leads to low performance.</p><p><strong>Style 2: Conversion to closure</strong></p><p>Instead of traversing the AST at runtime, we can traverse it once, and produce a closure/function that performs the action when run (e.g. see <a href="https://blog.cloudflare.com/building-fast-interpreters-in-rust/">this blog post</a>). Given that we access the slots at runtime, we make them an argument to the closure. In Rust, the type of our closure is:</p><pre><code class="language-rust">type Compiled = Box<dyn Fn(&mut Vec<i64>) -> i64>;
</code></pre><p>Here we are defining a <code>Fn</code> (a closure -- function plus captured data) that goes from the slots to a result. Because these functions vary in how much data they capture, we have to wrap them in <code>Box</code>. With that type we can now define our evaluation function:</p><pre><code class="language-rust">fn f(x: &Expr) -> Compiled {
match x {
Expr::Lit(i) => {
let i = *i;
box move |_| i
}
Expr::Var(u) => {
let u = *u;
box move |slots| slots[u]
}
Expr::Add(x, y) => {
let x = compile(x);
let y = compile(y);
box move |slots| x(slots) + y(slots)
}
...
}
}
</code></pre><p>Instead of taking the AST (compile-time information) and the slot data (runtime information) we use the compile-time information to produce a function that can then be applied to the run-time information. We trade matching on the AST for an indirect function call at runtime. Rust is able to turn tail calls on dynamic functions into jumps and the processor is able to accurately predict the jumps/calls, leading to reasonable performance.</p><p>One large advantage of the closure approach is that adding specialised variants, e.g. compiling a nested <code>Add</code> differently, can be done locally and with no additional runtime cost.</p><p><strong>Style 3: Fixed sized instructions</strong></p><p>Instead of interpreting an AST, or jumping via indirect functions, we can define a set of instructions and interpret an array of them them using a stack of intermediate values. We are effectively virtualising a CPU, including program counter. We can define a bytecode with instructions such as:</p><pre><code class="language-rust">enum Bytecode {
Assign(u32), // assign the value at the top of the stack to a slot
Var(u32), // push the value in slot to the top of the stack
Lit(i32), // push a literal on the stack
Add, // Add the top two items on the stack
Jump(u32),
...
}
</code></pre><p>We now interpret these instructions:</p><pre><code class="language-rust">let mut pc = 0;
let mut slots = vec![0; 10];
let mut stack = Stack::new();
loop {
match xs[pc] {
Assign(x) => slots[x as usize] = stack.pop(),
Var(x) => stack.push(slots[x as usize]),
Lit(i) => stack.push(i as i64),
Add => {
let x = stack.pop();
let y = stack.pop();
stack.push(x + y)
}
Jump(pc2) => pc = pc2 as usize - 1,
...
}
pc = pc + 1;
}
</code></pre><p>Most of these operations work against the stack. I found that if I used checked array accesses on the stack (the default in Rust) it went about the same speed as AST interpretation. Moving to unchecked access made it similar in performance (slightly worse) than the closure version.</p><p>The bytecode approach is much harder to implement, requiring a compiler to the bytecode. It's also much harder to add specialised variants for certain combinations of instructions. To get good performance via the branch predictor probably requires further tricks beyond what I've shown here (e.g. <a href="http://www.cs.toronto.edu/~matz/dissertation/matzDissertation-latex2html/node6.html">direct threading</a>).</p><p>There are advantages to a bytecode though -- it's easier to capture all the program state, which is useful for garbage collection and other operations.</p><p><strong>Style 4: Byte encoded instructions</strong></p><p>Instead of having a Rust <code>enum</code> to represent the values, we can instead use bytes, so instead of:</p><pre><code class="language-rust">Lit(38)
Lit(4)
Add
Assign(0)
</code></pre><p>We would have a series of bytes <code>[0,38,0,4,1,2,0]</code> (where <code>0</code> = <code>Lit</code>, <code>1</code> = <code>Add</code>, <code>2</code> = <code>Assign</code>). This approach gives a more compact bytecode, and might have an impact on the instruction cache, but in my benchmarks performed the same as style 3.</p>Neil Mitchellhttp://www.blogger.com/profile/13084722756124486154noreply@blogger.com5tag:blogger.com,1999:blog-7094652.post-32299935506115502492020-03-16T09:23:00.000+00:002020-03-16T20:17:29.882+00:00The <- pure pattern<p><em>Summary: Sometimes <code><- pure</code> makes a lot of sense, avoiding some common bugs.</em></p><p>In Haskell, in a monadic <code>do</code> block, you can use either <code><-</code> to bind monadic values, or <code>let</code> to bind pure values. You can also use <code>pure</code> or <code>return</code> to wrap a value with the monad, meaning the following are mostly equivalent:</p><pre><code>let x = myExpression
x <- pure myExpression
</code></pre><p>The one place they aren't fully equivalent is when <code>myExpression</code> contains <code>x</code> within it, for example:</p><pre><code>let x = x + 1
x <- pure (x + 1)
</code></pre><p>With the <code>let</code> formulation you get an infinite loop which never terminates, whereas with the <code><- pure</code> pattern you take the previously defined <code>x</code> and add <code>1</code> to it. To solve the infinite loop, the usual solution with <code>let</code> is to rename the variable on the left, e.g.:</p><pre><code>let x2 = x + 1
</code></pre><p>And now make sure you use <code>x2</code> everywhere from now on. However, <code>x</code> remains in scope, with a more convenient name, and the same type, but probably shouldn't be used. Given a sequence of such bindings, you often end up with:</p><pre><code>let x2 = x + 1
let x3 = x2 + 1
let x4 = x3 + 1
...
</code></pre><p>Given a large number of unchecked indicies that must be strictly incrementing, bugs usually creep in, especially when refactoring. The unused variable warning will sometime catch mistakes, but not if a variable is legitimately used twice, but one of those instances is incorrect.</p><p>Given the potential errors, when a variable <code>x</code> is morally "changing" in a way that the old <code>x</code> is not longer useful, I find it much simpler to write:</p><pre><code>x <- pure myExpression
</code></pre><p>The compiler now statically ensures we haven't fallen into the traps of an infinite loop (which is obvious and frustrating to track down) or using the wrong data (which is much harder to track down, and often very subtly wrong).</p><p><strong>What I really want:</strong> What I actually think Haskell should have done is made <code>let</code> non-recursive, and had a special <code>letrec</code> keyword for recursive bindings (leaving <code>where</code> be recursive by default). This distinction is present in GHC Core, and would mean <code>let</code> was much safer.</p><p><strong>What HLint does:</strong> HLint is very aware of the <code><- pure</code> pattern, but also aware that a lot of beginners should be guided towards <code>let</code>. If any variable is defined more than once on the LHS of an <code><-</code> then it leaves the <code>do</code> alone, otherwise it will suggest <code>let</code> for those where it fits.</p><p><strong>Warnings:</strong> In the presence of <tt>mdo</tt> or <tt>do rec</tt> both formulations might end up being the same. If the left is a refutable pattern you change between error and fail, which might be quite different. Let bindings might be generalised. This pattern gives a warning about shadowed variables with <tt>-Wall</tt>.</p>Neil Mitchellhttp://www.blogger.com/profile/13084722756124486154noreply@blogger.com9