Tuesday, February 27, 2018

Switching to HTTPS

Summary: All my domains are now on HTTPS. In this post I describe how I did it.

It's quite clear everyone should be moving their domains to HTTPS, or face the consequences. I have recently converted three domains to HTTPS - two static sites and one Haskell server. Converting was mostly a case of finding the right how-to guide and following it, so in this post I'll link to the "right" guides.

Static Websites

I have static domains at shakebuild.com and ndmitchell.com, both of which follow a similar pattern, so I'll focus on the Shake website. The steps are:

  • Get a domain name: I bought a domain name from Soho UK, who later sold up and became Cloud Mega. I've been using them for websites for a very long time, and never had any complaints.
  • Write some content: My static websites are based of source material that is then converted via custom scripts to generate the final website. For Shake, the source is Markdown files and the converter is a Haskell program. In the case of Shake, I use the markdown package with custom tricks like hyperlinking all identifiers (see these code samples). After running the program on the Markdown files I have HTML/CSS that can be served directly.
  • Serve the content: I host and serve the content using GitHub Pages, which lets you either serve content off the branch gh-pages or a separate GitHub repo - I use the latter option. I then use the custom domain name feature to make requests to shakebuild.com serve from GitHub Pages over HTTP.
  • Serve with HTTPS: The previous steps get us an HTTP website, but last weekend I did the work to get to HTTPS. I followed these instructions, which use Cloudflare as an intermediary - serving over HTTPS and providing a cache. I have configured things to always redirect away from the www and always use HTTPS. The only minor hiccup was the HTTPS certification for Shake took about 3 days to initialise (it should take less than 24 hours, my other domain took 15 minutes) - but it went away on its own.
  • Collect email: The final step was to get email to the domains working - in general I'd prefer people email me directly at Gmail, but it's always good for email to work. I used these instructions, which use Mailgun to collect and forward emails. The only difficulty is that sending Gmail emails to yourself via a custom domain leaves the email in the Sent mail with no indication it was delivered - I had to test using a different email account.

With that, we have a static website served over HTTPS. It's quite remarkable that such a pipeline can be built using free services.

Dynamic Website

I maintain the hoogle.haskell.org server which provides a search engine for Haskell libraries. This website is dynamic, executing Haskell code to return suitable results for each search.

  • Write the program: I wrote Hoogle over the last 14 years, and when run as hoogle server it spawns a web server which can serve requests, using the Warp package to do the actual serving.
  • Configure the server: The hoogle.haskell.org server is kindly provided by the Haskell Infrastructure Committee, where I have a VM which runs Hoogle. My setup instructions for that server are in the Hoogle repo. Of note, I forward port 80 to 8080, allowing me to serve HTTP pages with a non-root program.
  • Serve static content over CDN: The static content of Hoogle (images, CSS files) could be served up by the normal server, but it's just one small server in one single location, so I make things go faster by sending most static requests to Raw GitHack, which itself is just a wrapper around Cloudflare.
  • Obtain certificates: To serve over HTTPS you need certificates that prove you own the domain. I got the certificates from Let's Encrypt, using the Certbot client. Since I run a custom server I opted for the Standalone challenge (which spawns a web server on your box), over HTTP, serving on port 8080 to account for the redirection I had put in place. Unfortunately, generating the certificates required taking Hoogle down briefly.
  • Serving over HTTPS: Fortunately a PR was submitted to Hoogle some time ago allowing users to pass a certificate at startup and serve Hoogle over HTTPS. I passed the certificates obtained in the previous step, and spawned Hoogle on 8443 (which 443 redirected too), giving me an HTTPS server.
  • Redirecting HTTP traffic: For the static websites redirecting HTTP traffic to HTTPS was as simple as checking a box on Cloudflare. For my own server I needed to run a server on port 8080 that did the redirect. I found the Haskell program rdr2tls which is small, simple, and works very well.
  • Renewing the certificate: The Let's Encrypt serve expires every 90 days, so will need renewing. I know the approximate steps, but currently am intending to manually renew the certificate.

Switching Hoogle to HTTPS was fairly painless.

Sunday, February 18, 2018

Atomic Expressions Generically

Summary: For certain hints HLint needs to determine if a Haskell expression is atomic. I wrote a generic method to generate expressions and test if they are atomic.

With HLint, if you write a statement such as:

main = print ("Hello")

You get the hint:

Sample.hs:1:14: Warning: Redundant bracket
Found:
  ("Hello")
Why not:
  "Hello"

One of ways HLint figures out if brackets are redundant is if the expression inside the brackets is "atomic" - if you never have to bracket it in any circumstances. As an example, a literal string is atomic, but an if expression is not. The isAtom function from haskell-src-exts-util has a list of the types of expression which are atomic, but the Exp type from haskell-src-exts has 55 distinct constructors, and I don't even know what many of them do. How can we check the isAtom function is correct?

One approach is to use human thought, and that's the approach used until now, with reasonable success. However, I've recently written a script which solves the problem more permanently, generating random expressions and checking that isAtom gives the right value. In this post I'm going to outline a few features of how that script works. There are basically three steps:

1) Generate a type-correct Exp

The first step is to generate a random Exp which follows the type definition. Fortunately the Data class in Haskell lets us generate values. We define:

mkValue :: forall a . Data a => Int -> IO a
mkValue depth
    | Just x <- cast "aA1:+" = randomElem x
    | Just x <- cast [-1 :: Int, 1] = randomElem x
    | Just x <- cast [-1 :: Integer, 1] = randomElem x
    | AlgRep cs <- dataTypeRep $ dataTypeOf (undefined :: a) =
        if depth <= 0 then throwIO LimitReached else fromConstrM (mkValue $ depth - 1) =<< randomElem cs

Here we are saying that given a depth, and a result type a, we generate a value of type a. Note that the a argument is the result, but we don't pass anything in of type a. The first three lines of the body follow the pattern:

    | Just x <- cast [list_of_element] = randomElem x

This tries to convert list_of_element to [a] by using runtime type information. If it succeeds, we pick a random element from the list. If it doesn't we continue onwards.

The final case uses dataTypeRep/dataTypeOf to get a list of the constructors of a. Note that we don't have a value of a, so we make one up using undefined :: a - but that's OK because dataTypeOf promises not to look at its argument. Given a list of constructors, we pick one at random, and then call fromConstrM - which says how to create a value of the right constructor, using some argument to fill in all the fields. We pass mkValue as that argument, which causes us to recursively build up random values.

One immediate problem is what if we are building a [Int] and the random generator often picks (:)? We'll take a very long time to finish. To solve this problem we keep a depth counter, decrement it in every recursive call, and when it runs out, throwIO an exception and give up.

2) Generate a parsing Exp

Now we've got a valid Exp value, but just because an Exp can be represented in the AST doesn't mean it corresponds to Haskell fragment. As an example, consider Var (UnQual (Ident "Test")). That's a valid value of type Exp, but if you pretty print it you get Test, and if you parse it back you'll get Con (UnQual (Ident "Test")) - variables must start with a leading lower-case letter.

To ignore invalid expressions we try pretty printing then parsing the expression, and ignore all expressions which don't roundtrip.

3) Determine if the Exp is atomic

Now we've got a valid Exp, which we know the user could have typed in as a source program, we need to figure out if isAtom is correct. To do that we see if given expression x whether self-application roundtrips, i.e. x x. As a positive example, foo (a variable) roundtrips as foo foo being foo applied to itself. However, if b then t else f when applied to itself gives if b then t else f if b then t else f, which parses back more like if b then t else f (if b then t else f), and is not atomic.

Putting it all together

Now we've got a random expression, and we know if the atomicity agrees with what we were expecting, we can report any differences. That approach has identified many additional patterns to match, but it's not perfect, in particular:

  • Most values either exceed the depth limit or fail to roundtrip. For 10,000 if expressions I typically get 1 or 2 which roundtrip properly. For non-if expressions it's usually 100 or so. The advantage of random testing is that throwing more time at a problem solves such issues without thinking too hard.
  • For some expressions, e.g. ParComp, I've never managed to get a valid value created. Perhaps haskell-src-exts can't parse it, or perhaps it requires constants I don't have in my hardcoded list - none of these were particularly common examples.
  • haskell-src-exts has a bug where -1 is pretty printed as (-1), which is then parsed as a paren and -1. That fails step 2, so we don't test with negative literals. As it happens, non-negative literals are atomic, but negative literals aren't, so we need to take care.
  • There are some patterns which appear to roundtrip successfully on their own, but not when surrounded by brackets, but secretly are just very weird. For example do rec\n [] parses successfully, but with source positions that are error values, and when applied to itself pretty prints incorrectly. There's at least one haskell-src-exts bug here.
  • The program appears to leak progressively more memory. I solved that by running slices of it at a time, and didn't look too hard. I've seen cases of blowup in Data constructors when recursing, so it could be that. but needs investigating.

As a result of all this work a future HLint will spot unnecessary brackets for 20 more types of expression, 8 more types of pattern and 7 more types of type.