Comments on Neil Mitchell's Blog (Haskell etc): Atomic Expressions Generically

@MitchellSalad, they are for the non-algebraic con...

2018-03-05T23:16:57.579+00:00

@MitchellSalad, they are for the non-algebraic constructors, so I have to have another way of choosing them. I just picked a few representative values that let me explore a reasonable design space and select from them.

What's the significance of the lists "aA1...

2018-03-05T17:55:18.591+00:00

What's the significance of the lists "aA1:+-", [-1::Int,1], and [-1::Integer,1]?

Warbo: Note that with the depth-based approach I h...

2018-02-27T17:26:52.867+00:00

Warbo: Note that with the depth-based approach I have opted for there STILL HAS a memory blow up, it's just I think it's not from that code, but data instances. I agree a fuel parameter can work quite nicely, and that with other types of random generation I'd do a better job - in this case a poor approach coupled with a relatively large amount of compute does the job nicely enough. The generic-random package might indeed have been less work, but it seems to want to know the weights for each argument - which would have required a per-type information.

Note that HLint never suggests avoiding brackets because of operators (apart from $, because if you didn't want to avoid the bracket, why write the $). In the examples I've caught it has more usually been because there are quite a lot of constructors and while I know the syntax is atomic, I don't know that from the haskell-src-exts constructor name.

The memory problems sound very similar to naively ...

2018-02-26T18:00:30.618+00:00

The memory problems sound very similar to naively generating recursive values in QuickCheck.

The obvious approach of picking a constructor then generating arguments, as you hinted at, quickly blows up. More specifically, if the expected number of recursive calls is >=1 then the expected depth is infinite. For example, we might have a binary tree with a nullary `Leaf` constructor and a binary `Node` constructor, which we pick between with a 50/50 chance. The expected number of recursive calls is then 0.5*0 + 0.5*2 = 1. Hence every time we generate a tree we should expect, on average, to generate one sub-tree; which will need one sub-tree; and so on.

Unfortunately, that's what most generic/derived implementations of Arbitrary will do :(

The sort of depth-based approach you've opted for does solve the memory blow up, but I've found it to be too biased towards small values. Since depth information only flows from parent to child, we can end up with lots of large siblings at the same level of the tree, and since the number of terms at each level grows exponentially with depth, we're forced to pick a very small depth to avoid the case where this exponential number of siblings are all large.

My preferred method is to pass in a "fuel" parameter, which gets decremented by each additional constructor and divided up (randomly) between sub-terms. This way, we only get large sub-terms if their siblings are small, so we can end up generating much deeper structures with confidence (since they'll necessarily be narrow).

Perhaps this isn't so much of an issue with checking atomicity, since that presumably only depends on 2 or 3 levels of nesting at most. Still, it's such a common issue that I like to mention it when random testing is brought up :)

Also, the generic-random package looks like a nice way to derive Arbitrary generators which don't suffer these blow ups. I found it via https://byorgey.wordpress.com/2016/09/20/the-generic-random-library-part-1-simple-generic-arbitrary-instances/

> As a result of all this work a future HLint will spot unnecessary brackets for 20 more types of expression, 8 more types of pattern and 7 more types of type.

Wearing my cynical hat, I would point out that the very fact that it required an automated random test suite to discover that these are "unnecessary" is proof enough that they *are* necessary ;)

Presumably HLint will have a switch to ignore some/all such checks? I routinely run HLint in my projects' test suites (because it's excellent, thank you), but I'm also of the "smug lisp weenie" camp which thinks that operator precedence is nothing other than a mental burden, a language bug and solvable by the liberal addition of extra brackets :)