Exploring Data Race Freedom in Merlin with OxCaml

I’m going to take you behind-the-scenes of how data race freedom works using the experimental OxCaml branch of OCaml. I was recently part of a project testing out OxCaml’s data-race freedom modes on the multicore merlin-domains branch of Merlin. With the goal of evaluating the modes’ potential for making a fully parallel Merlin data-race free, we experimented liberally within the code and documented our experience along the way.

This post will showcase some of what we learned and should be interesting for anyone curious about programming using multiple domains, OxCaml, editors, or just OCaml in general!

What are Data Races? With Examples!

A data race is a specific kind of race condition. So let's first define race conditions.

A race condition occurs when the behavior of a software system depends on the relative timing of events, such as the execution of threads. For example, the following code snippet demonstrates a race condition:

let () =
  let d1 = Domain.spawn (fun () -> print_endline "Hello from domain 1") in
  let d2 = Domain.spawn (fun () -> print_endline "Hello from domain 2") in
  Domain.join d1;
  Domain.join d2

The output of this code is non-deterministic, as it depends on the scheduling of the domains. It could print "Hello from domain 1" followed by "Hello from domain 2", or vice versa. This however is not a bug: the code is correct, and the behaviour is expected.

Data races are race conditions that are way more sneaky: their behaviour does not only depend on some non-deterministic scheduling, but also on some out-of-control optimisations from the compiler and from the hardware. For example, consider the following code snippet:

let a = ref 0 and b = ref 0
 
let d1 () =
   a := 1; 
   !b

let d2 () =
   b := 1; 
   !a

let main () =
  let h1 = Domain.spawn d1 in
  let h2 = Domain.spawn d2 in
  Domain.join h1, Domain.join h2

Sequentially consistent results:

r1 = 0, r2 = 1 (d2 runs first) r1 = 1, r2 = 0 (d1 runs first) r1 = 1, r2 = 1 (any other interleaving)

Because this is a data race, r1 = 0, r2 = 0 is also possible! In this case, this behaviour would be caused by a hardware optimisation.

By definition, data races occur when:

Two or more domains run in parallel,
at least two access the same mutable value,
and at least one of them writes to it
without a synchronisation mechanism (like locks or atomic operations).

Non-atomic mutable values in OCaml are references, mutable fields of records, and arrays.

How Do We Avoid Data Races in OCaml?

Obviously, specific tools exist to detect data races. ThreadSanitizer, for example, detects data races at runtime.

In OCaml, there are also libraries like qcheck-lin and qcheck-stm that help by randomly generating tests and checking for linearisability. These types of libraries are very useful when combined with TSan, which requires a lot of tests to be effective.

None of these solutions are static however: they catch data races at run time, and as data races are non-deterministic, they can still be missed. Other ways of preventing data races is, well, simply by not sharing mutable states between domains. However, this would imply we strongly restrict how we use multicore programming in OCaml, which would not be ideal.

Statically Preventing Data Races in OxCaml

So, let's run with a very simple example:

let foo () = 
    let r = ref 42 in
    let d1 = Domain.spawn (fun () -> r := 1)in 
    let d2 = Domain.spawn (fun () -> !r) in

    Domain.join d1, Domain.join d2

In the above scenario, r is a mutable value that is shared between two domains, and at least one of them writes to it. This is a data race. Next, we are going to try to port this code to OxCaml, and see how the mode system prevents it from compiling. Finally, in the next section, we will see how to fix our code to make it compile in OxCaml.

Preventing Data Races With Modes

Two mode axes are needed to ensure data race freedom: portability and contention.

Portability

It states that a function (or a value carrying a function) can be used in a parallel context. For example, Multicore.spawn (that replaces Domain.spawn) requires as first argument a portable function, as it could run in a parallel context. The portability axis has 3 values: portable, shareable and nonportable. By default, values are nonportable: a portable annotation is needed to make them portable.

Syntax for the portable mode:

let f : (int -> int) @ portable = fun x -> x + 1

let (f @ portable) x = x + 1

(* in the signature *)
module type A = sig
  val f : int -> int @@ portable
end

(* if all the functions of a module are portable *)
module type B = sig @@ portable
  val f : int -> int
  val g : bool -> bool
end

So to ensure data race freedom, all the functions that provide some form of parallelism need to take portable functions as arguments. Also for backwards compatibility reasons, Domain.spawn is still available in OxCaml, but it's not annotated with the right modes: it does not provide any DRF guarantee (you get a warning if you use it in OxCaml). For the following example, we will first use Multicore.spawn and later Parallel.fork_join2 (that also requires a portable function) which are both different ways to provide parallelism in OxCaml.

Here is the signature of Multicore.spawn:

val spawn :  ('a : value_or_null).
     ('a @ contended once portable unique -> unit) @ once portable unyielding
  -> 'a @ contended once portable unique
  -> 'a spawn_result @ contended once portable unique

But we can simplify it for our needs. We will use the following function:

val spawn: (unit -> unit) @ portable -> unit

Contention

The contention axis has 3 values: contended, shared, and uncontended. This axis is only pertinent for the non-atomic mutable values, i.e. the ones that can cause a data race. When a mutable value is contended, both write and read operations are forbidden on it. Using shared only prevents write operations, and uncontended does not prevent any operations.

The default value is uncontended: a value becomes contended when it is captured by a portable function. The idea is simple: if a function is portable it can be used in parallel with another one, so if it captures a mutable value, to prevent data races, this value should not be accessed – it becomes contended. We will talk about shared and shareable later.

Let's see how this works in the following example:

let foo () =
  let r = ref 42 in
  (* r is a mutable value and is uncontended by default. *)

  let (work @ portable) () = r := 1 in
  (* r is contended because it is used in a portable function. 
     The `write` operation requires an `uncontended` value: this does not compile *)
  
  spawn work

This example does not compile because r is contended and the write operation requires an uncontended value. We could try to keep r uncontended by removing the portable annotation on work, but then we would not be able to use it in spawn.

let foo () = 
    let a = ref 42 in
    (* a is uncontended *)

    let work () =  a := 1 in
    (* a is uncontended and `work` is at its default value: nonportable *)

    spawn work
    (* `spawn` requires a portable function, but `work` is nonportable : this does not compile.*)

All right,, so you could say that in this example there are actually no data races since there is just a single access to a. However, there is no way for the spawn function to know about that: it just knows that it may run concurrently with another function and thus prevents any risk of a data race. Let's not forget that this is a dummy example, there is also no reason for parallelism. We could have written:

let foo () =
  let r = ref 42 in

  let (work @ portable) () = r := 1 in

  spawn work;
  spawn work

Here there is some parallelism, and still the error message will be the same.

And this is how modes prevent data races statically!

What About `shareable` and `shared`?

Note 1: This part is a bonus, as, as explained above, portable/nonportable and contended/uncontended are sufficient to statically prevent data races. The shareable and shared modes are more fine-grained annotations that allow you to perform concurrent read operations on shared mutable values.

Note 2: The current published version of OxCaml does not completely support shareable and shared annotations. In particular, there is no Reference module with the right annotations for the read operation. You will have to write your own reference module to try the following code snippets! I will show you how to do this at the bottom of the article.

The idea is the following: two concurrent read operations on a mutable value is not a data race, so it should be permitted. This is the idea behind shared and shareable. A shared value can be read but not written, and a shareable function captures shared values.

let foo () =
  let open Ref in 
  (* As explained above, for now, you need a homemade Reference module to make this work. *)

  let r = ref 42 in
  (* r is uncontended *)

  let (work @ shareable) () = !r in
  (* r is shared because it is used in a shareable function. 
     The `read` operation requires a `shared` value: this does compile *)

  spawn work
  (* Does not compile because `spawn` requires a portable function, but `work` is shareable. *)

But here is the difficulty: Multicore.spawn takes a portable function as argument, not a shareable one. Why is that? This is because it spawns a new domain that is running a portable function, but then resumes executing the current domain. The current domain can run any non-portable function, including write and read on shared mutable values, which means data races!

An example:

let foo () =
  let r = ref 42 in

  let (work @ portable) () = r := 1 in

  spawn work;
  r := 2

Whatever work performs a write operation on r or just a read operation, this is a data race: the constraint on work cannot be weakened.

We actually need another way to provide parallelism with more control: Parallel.fork_join2. The fork_join2 is blocking: it runs two portable functions (possibly) in parallel and waits for both of them to finish. Here is its signature:

val fork_join2
  :  t @ local
  -> (t @ local -> 'a) @ forkable local once shareable
  -> (t @ local -> 'b) @ once shareable
  -> #('a * 'b)

We have well contained parallelism here: only the two functions passed as arguments to fork_join2 can run in parallel, and they are both shareable, meaning they can only read shared mutable values, but not write to them, which is not a data race! Let's try it:

let foo par =
  let open Ref in
  let r = create 0 in

  let (read @ shareable) _par = !r in

  let #(_, _) = Parallel.fork_join2 par read read in
  ()

There are no data races here since the two concurrent accesses to r are read operations, so this should compile, and it does because read is a shareable function.

On the opposite, the following code has a data race and does not compile:

let foo par =
  let open Ref in
  let r = create 0 in

  let (read @ shareable) _par = !r in
  let write _par = r := 1 in
  (* If we try to make `write` portable or shareable, we would get a mode error on `r` as will become respectively contended or shared but the (:=) operation requires it to be uncontended. *)

  let #(_, _) = Parallel.fork_join2 par read write in
  (* Don't compile: write is nonportable but `fork_join2` requires both functions to be shareable. *)
  ()

Sharing in OxCaml

So, how can we make our simple example compile in OxCaml? Remember, we want to perform a read and a write operation on a shared mutable value in parallel:

let foo () = 
    let r = ref 42 in
    let d1 = Domain.spawn (fun () -> r := 1)in 
    let d2 = Domain.spawn (fun () -> !r) in

    Domain.join d1, Domain.join d2

We have to prove to the compiler that there is no data race. As a reminder, the recipe for a data race are the following

Two or more domains run in parallel,
at least two access the same mutable value,
and at least one of them writes to it
without a synchronisation mechanism (like locks or atomic operations).

1, 2, and 3 are what we are trying to do, so we need to provide a synchronisation mechanism. For our example, atomic is the easiest path: we can just replace ref with Atomic and we are good to go!

let foo () =
  let r = Atomic.make 0 in

  let read () = Atomic.get r |> ignore in
  let write () = Atomic.set r 1 in

  spawn read;
  spawn write

This compiles!

What about locks? In OxCaml, locks are actually provided through the Capsule API. Explaining it fully is beyond the scope of this article, but the general idea is the following: mutable values are enclosed in a capsule. The capsule can be shared between domains without getting contented (meaning it is still possible to perform operations on it). It looks like it may create data races, since it allows us to share mutable values between domains, but actually, it is pretty hard to open the capsule to perform operation on its contents. This is how data race freedom is guaranteed.

For the operations we are trying to do, we actually need the most restrictive capsule opener: a mutex. The API for mutex is not that different to the ones in mainline OCaml. There is a big difference however: the mutable value must be created under the capsule, meaning it can never be accessed without acquiring the mutex first (the with_lock function).

To illustrate:

let foo () =
  let open Await in
  let await = Await_blocking.await Terminator.never in
  let capsule_with_lock = Capsule.With_mutex.create (fun () -> ref 42) in
  (* The reference is created in a capsule that can only be accessed with the mutex. *)

  let read r = !r in
  let write r v = r := v in

  spawn (fun () ->
      let _ = Capsule.With_mutex.with_lock await capsule_with_lock ~f:(fun r -> read r) in
      (* `with_lock` is equivalent to `Mutex.protect` in OCaml.*)
      ());
  spawn (fun () ->
      Capsule.With_mutex.with_lock await capsule_with_lock ~f:(fun r -> write r 1))

Interfacing OCaml Code in OxCaml: Challenges and Solutions

Some projects mix OxCaml with plain OCaml: external libraries, vendored modules, or legacy code we cannot (or do not want to) rewrite. When that code holds a mutable state, we face the data race problem again, but with a twist: we cannot add mode annotations since the code is not ours to modify.

Let's go back to our running example, but this time assume that the read and write operations come from a vendored module we do not want to modify:

module Vendored : sig
  val read : unit -> int
  val write : int -> unit
end = struct
  let r = ref 42
  let read () = !r
  let write v = r := v
End

Replacing ref with Atomic (as we did earlier) is not an option here, since we are not allowed to touch the vendored code.

From the OxCaml side, all the functions exported by Vendored are nonportable (the default value) and cannot be called from a portable context (e.g. inside a spawn or a fork_join2 callback). OxCaml provides an unsafe escape hatch with Obj.magic_portable which would make them callable, but it's Obj.magic and so it's ‘cheating’. Also, it gives the compiler no way to prevent two domains from calling them concurrently, so the mutable state inside Vendored.t would still be racy.

What we actually need is a way to leverage OxCaml's mode system to enforce mutual exclusion on vendored function calls at compile time, without modifying the vendored code.

The trick is to encode the mutex discipline directly in the signature of the wrapper functions. We create a mutex, and require a token of type k Capsule.Access.t as an argument of every wrapper function:

module Lock = Capsule.Mutex.Create ()
(* [Lock] exposes two things: a fresh type [k], and a mutex of type
   [k Mutex.t]. Because [k] is fresh and tied to this mutex, a value
   of type [k Capsule.Access.t] can only be produced by acquiring it. *)
type k = Lock.k

module Wrapper : sig @@ portable
  val read : access:k Capsule.Access.t -> int
  val write : access:k Capsule.Access.t -> int -> unit
end = struct
  let read_ = Obj.magic_portable Vendored.read
  let read ~access:_ = read_ ()
  (* [access] is unused at runtime, but the caller must supply one,
     and the only way to obtain one is to hold [Lock.mutex]. *)

  let write_ = Obj.magic_portable Vendored.write
  let write ~access:_ v = write_ v
  (* Same [k] as [read]: both functions are guaranteed to run under
     the same mutex. *)
end

By taking ~access as an argument, the wrapper signature forces callers to hold Lock.mutex before calling any of its functions – the token is a compile-time proof of authorisation. Obj.magic_portable is still used internally to bridge the vendored code, but it is now safely contained: the wrapper is the trust boundary, and the compiler enforces the mutex discipline everywhere else.

On the caller side, concurrent access to the vendored state now looks like this:

let foo () =
  let open Await in
  let await = Await_blocking.await Terminator.never in

  let safe_write () =
    Mutex.with_key await Lock.mutex ~f:(fun key ->
        Capsule.Expert.Key.access key ~f:(fun access ->
            Wrapper.write ~access 1))
  in
  let safe_read () =
    Mutex.with_key await Lock.mutex ~f:(fun key ->
        Capsule.Expert.Key.access key ~f:(fun access ->
            let _ = Wrapper.read ~access in
            ()))
  in
  spawn safe_write;
  spawn safe_read

Both domains can call read and write, but only under the protection of the mutex: this code is data race free!

This pattern generalises to any non-OxCaml dependency with one important caveat: the guarantee is only as good as the wrapper. A function missing from the wrapper can still be called directly via Obj.magic_portable, bypassing the mutex. The wrapper must therefore be audited manually, ideally complemented by runtime tools like TSan.

Final Thoughts on the Reference Module

A simpler version for int ref:

module Ref : sig @@ portable
  type t = { mutable content : int }

  val ref : int -> t
  val ( ! ) : t @ shared -> int
  val ( =: ) : t -> int -> unit
end = struct
  type t = { mutable content : int }
  let ref x = { content = x }

  let ( ! ) r = r.content

  let ( =: ) r v = r.content <- v
end

A more general version:

module Ref : sig @@ portable
  type ('a : value mod contended shareable) t = { mutable content : 'a }


  val ref : 'a -> 'a t
  val ( ! ) : 'a t @ shared -> 'a
  val ( =: ) : 'a t -> 'a -> unit
end = struct
  type ('a : value mod contended shareable) t = { mutable content : 'a }
  let ref x = { content = x }

  let ( ! ) r = r.content

  let ( =: ) r v = r.content <- v
end

The additional annotation on 'a defines the kind of 'a. What it means is that our reference can not carry mutable values or nonportable functions. We need this restriction because of the read function that returns an 'a, and remember, we want to be able to call it from two different domains. So we could write the following:

type p = { mutable x : int }

let foo () =
  let open Ref in
  let r = ref { x = 0 } in

  spawn (fun () ->
      let t = !r in
      t.x <- 1);
  spawn (fun () ->
      let t = !r in
      t.x <- 2)

There are two concurrent write operations on the same mutable value t.x, with no synchronisation mechanism: this is a data race. The kind annotation statically prevents us from writing this code, as the kind of p is mutable_data: it is incompatible with our annotation, and therefore this code does not compile.

Stay in Touch

Hopefully this gives you a better understanding of what data race freedom is, why it matters, and how it works in OCaml and OxCaml! If you’re trying out OxCaml’s data race free modes and have thoughts and feedback, please share them with us on the OCaml Discuss forum.

Connect with Tarides on Bluesky, Mastodon, and LinkedIn or sign up to our mailing list to stay updated on our latest projects. We look forward to hearing from you!

Exploring Data Race Freedom in Merlin with OxCaml

What are Data Races? With Examples!

How Do We Avoid Data Races in OCaml?

Statically Preventing Data Races in OxCaml

Preventing Data Races With Modes

Portability

Contention

What About `shareable` and `shared`?

Sharing in OxCaml

Interfacing OCaml Code in OxCaml: Challenges and Solutions

Final Thoughts on the Reference Module

Stay in Touch

Open-Source Development

Explore Commercial Opportunities

Stay Updated on OCaml and MirageOS!

Subscription Succesful

Subscription Succesful

Exploring Data Race Freedom in Merlin with OxCaml

What are Data Races? With Examples!

How Do We Avoid Data Races in OCaml?

Statically Preventing Data Races in OxCaml

Preventing Data Races With Modes

Portability

Contention

What About shareable and shared?

Sharing in OxCaml

Interfacing OCaml Code in OxCaml: Challenges and Solutions

Final Thoughts on the Reference Module

Stay in Touch

Open-Source Development

Explore Commercial Opportunities

Stay Updated on OCaml and MirageOS!

What About `shareable` and `shared`?