I recently discovered a fantastic tool by the wonderful people at Well-Typed,
hs-bindgen. In a previous life, I
worked on downloading utilities in Haskell on top of something akin to the
existing curl library on Hackage. I
was never super happy with either that interface, nor how I used it.
With hs-bindgen, I have the wonderful opportunity to scratch that itch, and
re-explore this space a bit.I feel obligated to mention. I wrote the script that did the
codegen in Python using AI. While I did have to go over bugs and rough
edges afterward, it got quite far and most of the details were in the right
spots. This was a great learning experience in getting to know how to do
this AI-coding thing.
While hs-bindgen is still in active development, it can already do quite
a lot. I usually use nix when working on personal projects so the first step
is setting stuff up. The manual on using hs-bindgen is, at the time of
writing, still quite minimal, but after looking around, I also found a nix
tutorial on how to use
it. So I pop their overlay into my flake file and set out. libcurl-bindings = pkgs.haskell.lib.compose.generateBindings
./libcurl-bindings/generate-bindings
(haskellPackages.callCabal2nix "libcurl-bindings" ./libcurl-bindings
{ });
Next step is to define a small wrapper around hs-bindgen-cli that calls it
with the necessary arguments for generating bindings. My first attempt was
just copying one of their example wrappers, and swapping the library out for
libcurlI am in absolutely no way an expert in C or low-level languages. I’ve
written some hobby-Rust, and I have some knowledge on RTS’s, that’s it.
My entire career has been in high-level languages. So this blogpost
especially, should be taken with a huge heaping teaspoon of salt.
.hs-bindgen-cli preprocess \
"${module_flags[@]}" \
"${libclang_flags[@]}" \
"${parse_flags[@]}" \
"${select_flags[@]}" \
"${debug_flags[@]}" \
"--module" "Generated.Curl.Curl" \
"curl/curl.h"
Remember when I said hs-bindgen is in active development? This results in both
a panic and a warningThis post is already out-of-date with the hs-bindgen repository.
The panic mentioned here is already fixed and now supported. Warnings
occur when
hs-bindgen explicitly doesn’t support something
and refuses to emit a binding. Variadic functions
are one example of something explicitly
not
supported.
.[Warning] [HsBindgen] [select-parse-delayed] Could not select declaration:
Parse failure:
'curl_share_setopt' at "/nix/store/mbc9lm52d48cswa4ihwcs8wz5abii3xb-curl-8.17.0-dev/include/curl/curl.h:3086:24":
No bindings generated: Unsupported variadic (varargs) function
PANIC!: the impossible happened
Please report this as a bug at https://github.com/well-typed/hs-bindgen/issues/
unexpected type void in context Top
The panic’s been fixed
in the meantime, but comes from libcurl’s definition of their opaque handles,
referenced in functions as void*. There are a few such definitions.typedef void CURL; // Used in curl_easy_* functions
typedef void CURLSH; // Used in curl_share_* functions
typedef void CURLM; // Used in curl_multi_* functions
In the absence of the fix, I could hack around this by omitting these
definitions from the header file and swapping each name with void. Nix makes
it a bit too easy to do this. Not saying this is a nice solution, but hey, I got
it to avoid the panic. patchedCurl = pkgs.curlFull.overrideAttrs (old:
assert old.version == "8.17.0"; {
postInstall = (old.postInstall or "") + ''
find $dev/include/curl -name "*.h" -type f -exec sed -i \
-e '/typedef void CURL;/d' \
-e '/typedef void CURLSH;/d' \
-e '/typedef void CURLM;/d' \
-e 's/\bCURL\b/void/g' \
-e 's/\bCURLSH\b/void/g' \
-e 's/\bCURLM\b/void/g' \
{}
'';
});
Now onto the varargs warning. This one’s a bit more annoying to “fix”. It’s
generally impossible to know what the acceptable inputs are of these functions,
so they’re skipped by hs-bindgen. libcurl makes quite a lot of use of these
variadic functions to create a thin interface that is flexible to future
extensions. They appear in functions like the following.CURL_EXTERN CURLcode curl_easy_setopt(CURL *curl, CURLoption option, ...);
CURL_EXTERN CURLcode curl_easy_getinfo(CURL *curl, CURLINFO info, ...);
CURL_EXTERN CURLMcode curl_multi_setopt(CURLM *multi_handle,CURLMoption option, ...);
What all of these functions have in common is that the second argument dispatches
the type of the third, and it is always just a single argument in the variadic
sectionThere’s also curl_share_setopt, but I haven’t looked at the share
interface yet. So I may actually be off the mark.
. The idea that immediately jumped to mind is that I can write a type
class that gives me the appropriate argument type, associating each dispatch
value with a corresponding singleton.The set of types allowed by each function is finite. While not documented in
the online API docs from what I could see, looking at the header file gives a
straightforward association between the option values and the corresponding
types. In the curl_easy_setopt functions, the types end up defining the
X * 10_000 and the option value increments on top.#define CURLOPTTYPE_LONG 0
#define CURLOPTTYPE_OBJECTPOINT 10000
#define CURLOPTTYPE_FUNCTIONPOINT 20000
#define CURLOPTTYPE_OFF_T 30000
#define CURLOPTTYPE_BLOB 40000
To sketch an example, CURLOPT_PORT requires a long integer value, while
being the third option, so its value is 3. Compare this to the important
CURLOPT_WRITEFUNCTION, which wants a function pointer and is number eleven in
the list, so becomes 20011.The next step is to actually find these dispatch values. While I could look at
the header files, and I probably should’ve to make this transformation more
stable, I chose to do this on the Haskell files instead. There, hs-bindgen
creates pattern synonyms for each value.pattern CURLOPT_PORT :: CURLoption
pattern CURLOPT_PORT = CURLoption 3
-- And a quick AI-derived regex pattern in python that gives me the binding and
-- the value.
--
-- CURLOPT_PATTERN = re.compile(
-- r"^pattern\s+(CURLOPT_\w+)\s*::\s*CURLoption\s*\n"
-- + r"pattern\s+\1\s*=\s*CURLoption\s+(\d+)",
-- re.MULTILINE,
-- )
I now need to define the FFI bindings for each type-variant for our vararg
functions. I haven’t bothered to make this interface fully type-safe, so the FFI
functions accept void* anytime an object pointer or callback is needed.-- | curl_easy_setopt with a long argument (unsafe)
foreign import ccall unsafe "curl_easy_setopt"
curl_easy_setopt_long_c
-- Remember the swap we did in the header file?
-- This is where that shows up.
:: Ptr Void -- ^ CURL handle
-> CUInt -- ^ option
-> CLong -- ^ value
-> IO CUInt
-- | Type-safe wrapper for curl_easy_setopt with a long argument
curl_easy_setopt_long :: Ptr Void -> CURLoption -> CLong -> IO CURLcode
curl_easy_setopt_long handle (CURLoption opt) val =
CURLcode <$> curl_easy_setopt_long_c handle opt val
With all of that in place I can start filling out this nice type class.-- For `curl_easy_setopt`.
class CurlOption c where
type CurlOptionArgument c :: Type
curlOption :: c -> CURLoption
curlSetOpt :: c -> Ptr Void -> CurlOptionArgument c -> IO CURLcode
-- example instance for curl_easy_setopt(curl, CURLOPT_PORT, port)
data CurloptPort = CurloptPort
instance CurlOption CurloptPort where
type CurlOptionArgument CurloptPort = CLong
curlOption _ = CURLOPT_PORT
curlSetOpt opt handle = curl_easy_setopt_long_c handle (curlOption opt)
After this I did some cleanup and split out the generated functions into
separate modules to mimic libcurl’s header file setup. For my purposes, I’m
looking at the easy and multi interfaces, so they get their own modules and
bindspecs and hs-bindgen then takes care of the rest. I’ve put these bindings
online here. hs-bindgen is great!Discussion links: Reddit