Wednesday, April 17, 2013

Buffer smashing in NSIS

Summary: I've identified two fixed-size buffer errors in the NSIS compiler generator, one of which is a buffer overflow leading to a segfault. My nsis library works round them to some extent.

My nsis Haskell library provides a layer over the NSIS installer generator. NSIS scripts can be viewed as assembly code for a virtual machine with 16 general purpose registers, the ability to define new registers, and a stack - where all locations store strings and instructions are register-to-register. My nsis library abstracts over these details to provide something with more traditional flow control (e.g. while, if), compound expressions and type safety, using NSIS as an assembler.

A high-level language tends to tickle areas of an assembler that a human would not. In particular, NSIS has two fixed-buffer size errors that nsis has triggered.

Bug 1: String literals >= 4096 bytes cause segfaults

If you write a string literal in the NSIS source which is 4096 characters or longer, the NSIS generator segfaults. As an example (ignoring lots of NSIS boilerplate):

Var foo
StrCpy $foo "XXX...XXX"

If the string has 4095 X characters it works. As soon as you have 4096 X characters or more the NSIS generator segfaults. My guess is that the NSIS lexer has a 4096 character buffer that is overflowed.

As a workaround, you can do:

Var foo
StrCpy $foo1 "XXX...XXX"
StrCpy $foo2 "XXX...XXX"
StrCpy $foo $foo1$foo2

As long as both $foo1 and $foo2 are less than 4096 characters, you can combine them to produce $foo without error (as far as I can tell).

Bug 2: fileWrite truncates its output at 1023 bytes

When writing a file, all FileWrite calls are truncated to 1023 bytes. As an example:

FileOpen $h "output.txt" w
FileWrite $h "XXX...XXX"

If there are more than 1023 X characters, only the first 1023 will be written. My guess is that FileWrite has a 1024 character buffer for output.

As a workaround, you can write the file in smaller chunks, using multiple FileWrite instructions.

Workarounds in nsis-0.2.2

Manipulating long strings in NSIS is not that common. The example that caused me to look at buffer sizes was writing out a configuration file line-by-line, for example:

writeFileLines "$INSTDIR/config.ini"
    ["[config1]"
    ,"InstallDir=$INSTDIR"
    ,...
    ]

In nsis-0.2.1 writeFileLines was defined as:

writeFileLines a b = writeFile' a $ strUnlines b

This function merges all lines together then writes them to the file in one go, which truncates if the whole output is longer than 1023 characters. In addition, the nsis optimiser will often perform strUnlines at compile time, so the NSIS assembler gets a single literal, potentially exceeding 4095 characters. To avoid both these problems, in nsis-0.2.2 I have defined:

writeFileLines a b = withFile' ModeWrite a $ \hdl ->
    forM_ b $ \s -> fileWrite hdl $ s & "\r\n"

This revised definition writes the lines one by one. If any line is longer than 1023 characters it will still be truncated, but that is less likely than before. I will be reporting these issues to the NSIS team, so hopefully they can be fixed at source.

It would be possible to apply the workarounds directly in the nsis library, but I have not yet done so. Using long strings in an installer is rare, so hopefully the problem will not impact anyone. In addition, while investigating the file truncation bug I found the string literal bug, so I wonder what other buffer bugs might lurk in NSIS.

2 comments:

Mikhail Glushenkov said...

For the Haskell Platform installer I use a special build of NSIS with large strings enabled (8192 instead of 1024).

http://nsis.sourceforge.net/Special_Builds

Short strings can produce some quite nasty bugs:

http://nsis.sourceforge.net/Environmental_Variables:_append,_prepend,_and_remove_entries#Warning

Neil Mitchell said...

Thanks for the information. I never realised that fixed-size strings were intentional (and I would never have guessed a current/modern application would use them!). I guess the file write is deliberate, but the segfault in lexing is a genuine bug.