(Non-)Functional Ramblings - A Tale of Two Handles and a Segfault

Posted on May 3, 2026

Adventures in tackling race conditions and flaky tests in HLS.

I’ve recently been trying to contribute more to the Haskell ecosystem, my focus currently being on Haskell Language Server. I still remember the good (ha!) ol’ days when my main method of navigating Haskell code was through sheer memory, grep and hoogle, repeatedly running cabal build to continue on to the next compile error. In comparison, HLS is like a nice refreshing cold glass of water in a desert.

HLS’s testsuite was previously, unfortunately, quite a bit flaky. It’s now a bit better. I’ll expand on two sources of flakiness I took down as part of this issue. The errors I’d see while making PRs looked like the following. I’ve omitted the concrete tests that failed, because these were flakily appearing everywhere.

> Failure 1
  Exception: Language server unexpectedly terminated
> Failure 2
  hPutBuf: resource vanished (Broken pipe)
> Failure 3
  Segmentation fault

Wait, how do I open this door?

The first error comes from lsp-test, the framework used to write tests in HLS. It provides a minimal editor-like interface that can be used to send sequences of LSP messages to an underlying language server implementation. In HLS, an instance of the language server is spawned on a thread and communication with lsp-test occurs through pipes.

runSessionWithTestConfig ... =
  ...
  ((inR, inW), (outR, outW)) <- (,) <$> createPipe <*> createPipe
  server <- async $ defaultMain arguments { argsHandleIn = pure inR , argsHandleOut = pure outW }
  result <- runSessionWithHandles inW outR ...
  hClose inW
  timeout 3 (wait server) >>= ... cancel server

The first two failures both actually have the same origin. They occur when lsp-test tries to read an LSP message from the language server, but discovers the pipe’s been closed (or symmetrically, the write-end discovers the read-end’s been closed). I spent quite a bit of time looking for something that doesn’t exist, an explicit call to close the write-end of the handle. Internally, HLS spawns ghcide on a thread, which correctly doesn’t close the handle it’s been passed.

The fun bit that didn’t occur to me is what’s written as a footnote in the documentation of Handle

It’s so normal to see withFile* functions or explicit hClose, that it’s easy to forget what happens if you don’t use those functions.

. In GHC, a handle with no references is closed when it is GC’d. This opens up the possibility of a race condition during LSP shutdowns. Consider the following timeline of events. lsp-test initiates the shutdown sequence, ghcide confirms and sends the notification that it’s exiting. It exits with both the thread and the handles given to it subsequently GC’d. lsp-test hasn’t stopped reading from the handle yet though, reading an unexpected EOF and crashing.

The fix here is the infamous GHC touch# function, or its more modern equivalent, keepAlive#. They both act as signals to the GHC RTS that a value should be considered active at that point in the code.

runSessionWithTestConfig ... =
  ...
  pipes@((inR, inW), (outR, outW)) <- (,) <$> createPipe <*> createPipe
  keepAlive pipes $ do
    server <- async $ defaultMain arguments { argsHandleIn = pure inR , argsHandleOut = pure outW }
    result <- runSessionWithHandles inW outR ...
    hClose inW
    timeout 3 (wait server) >>= ... cancel server

This was quite confusing, so I created a similar reproducing example in the appendix that results in the same implicit handle closes.

`gdb`? Funsie

Segfaults in Haskell are a pretty scary thing to see. Good thing is I could reproduce these locally. Using a small bash script to repeatedly run the failing test, reproduces the segfault pretty quickly

Only had to run 2 repetitions to get the segfault when writing this post.

ghcide
  constructor hover (#2904)
    Constructors.hs
      ...
      E: line 22: 145581 Segmentation fault         (core dumped) "$@"

You may need to compile with -g3 to get debug symbols. Pointing gdb at the coredump, immediately gives a lead, showing a trace full of references to sqlite.

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00000000062ea359 in findElementWithHash ()
[Current thread is 1 (LWP 207720)]
(gdb) bt
#0  0x00000000062ea359 in findElementWithHash ()
#1  0x000000000632ebe5 in sqlite3FindTable ()
#2  0x00000000063b0620 in sqlite3LocateTable ()
...
#11 0x000000000639932a in sqlite3_prepare_v2 ()

A first intuition then tells me that there’s probably a connection being used after it was freed, as I’d worked with the direct-sqlite library before and I remember needing to manually clean up connections. I was somewhat familiar with how HLS uses sqlite as I’d contributed performance improvements related to indexing.

A bit of searching in the codebase for initialization gives the following with-wrapper that handles cleanup of resources, including sqlite connections in hieDb, via runWithDb.

runWithWorkerThreads ... = evalContT $ do
  (WithHieDbShield hieDb, indexQueue) <- runWithDb ...
  restartQueue <- withWorkerQueueSimple ...
  loaderQueue <- withWorkerQueueSimple ...
  liftIO $ f hieDb (ThreadQueue indexQueue restartQueue loaderQueue)

It’s hard to spot the error without another bit of information. The continuation given to runWithWorkerThreads, the f, uses background processing to asynchronously handle LSP requests. Specifically, this function does not shut down the shake session, leaving those alive to continue using the sqlite connection in hieDb.

runWithWorkerThreads ... = evalContT $ do
  (WithHieDbShield hieDb, indexQueue) <- runWithDb ...
  -- note that we're in ContT, shutdown happens bottom->up
  ContT $ \action -> action () `finally` shutdownSession
  restartQueue <- withWorkerQueueSimple ...
  loaderQueue <- withWorkerQueueSimple ...
  liftIO $ f hieDb (ThreadQueue indexQueue restartQueue loaderQueue)

This ensures the shake session is shutdown before the sqlite connection is closed, resolving the segfault. While debugging race conditions is tricky, there’s something deeply satisfying about solving them.

Discussion links: Reddit

A Tale of Two Handles and a Segfault

Wait, how do I open this door?

gdb? Funsie

`gdb`? Funsie