Avoiding Data Loss With Elixir Dets

Using Elixir we can build powerful systems that are highly concurrent, distributed, and fault-tolerant. One of nice things we get out of the fact it’s been battle tested for 25+ years is that it even includes tooling to help us cache items in memory, providing constant O(1) key-value access using the native :ets module.

Erlang also contains a disk persistant version :dets which is used in :mnesia and is similar, however we loose a bit of the performance from :ets so that we can persist to disk.

To use :dets we can do something like this:

defmodule MyCache do
  @table_name :my_table

  def start_link(opts \\ []) do
    Task.start_link(fn ->
      {:ok, _} = :dets.open_file(opts[:table_name] || @table_name, opts)

      Process.hibernate(Function, :identity, [nil])
    end)
  end

  def get(table_name \\ @table_name, key) do
    case :dets.lookup(table_name, key) do
      [{^key, value}] -> {:ok, value}
      [] -> {:ok, nil}
    end
  end

  def put(table_name \\ @table_name, key, value) do
    :dets.insert(table_name, {key, value})
  end
end

In this example of MyCache from the code above, we create a module that starts a Task process to hold the :dets table, and then hibernates the process to stop it from consuming any additional resources.

The start_link function opens a :dets table called :my_table (we use a default name but allow overrides for ease of testing), then hibernates to avoid consuming resources since :dets is then accessed outside of the process.

It is vital &get/2 and &put/3 are being accessed outside of a process since this also removes limitations we get normally from having a single process be the bottleneck to accessing either :dets or :ets.

Now we can use this cache very easily by adding the cache to our application.ex and then just using get/put:

application.ex

children = [MyCache]

Supervisor.start_link(children, opts)

In code

MyCache.put("key", 1234) # :ok
MyCache.get("key") # {:ok, 1234}

Super cool, we can shutdown our application and restart it, and our data saves and is persisted across restarts.

Potential Data Loss

One thing we can try is quickly entering an item into cache using &put/2 and then pressing Ctrl-C two times right away. Doing this, the majority of the time we should notice our data is not persisted, and this is because when using :dets, it has a default flush time of 3 seconds. This means that if the process running :dets is terminated ungracefully, any data that has not been flushed to disk will be lost.

On the other hand, we can do another test and run a MyCache.put(key, value) followed by System.stop() to gracefully kill and exit our application, we can see that our value is still saved so as long as we’re gracefully exiting we shouldn’t loose our data.

One thing we can do, is we can use the auto_save option when initializing :dets in order to tune the amount of time between saves to file. To pass this parameter to :dets we can change our start_link function like so:

def start_link(opts \\ []) do
  Task.start_link(fn ->
    {:ok, _} = :dets.open_file(
      opts[:table_name] || @table_name,
      [{:auto_save, :timer.seconds(1)} | opts]
    )

    Process.hibernate(Function, :identity, [nil])
  end)
end

Here we set the auto_save option to 1 second, which means that :dets will save its data to disk every second which can help prevent data loss if the process is terminated ungracefully. This doesn’t prevent it completely and come at a cost however, because saving to disk more often will consume more system resources, so we should be careful setting a value too low if we expect to have large data sizes.

TL;DR

Elixir has some pretty cool data storage mechanisms with :ets and :dets. While :dets takes a performance hit in comparison to :ets it allows us to persist things to disk which can be very useful in many situations.

While we can tune the auto_save it’s worth thinking about what you choose to store in :dets and how availabile it needs to be. For larger systems that are multi-node or where you need to guarantee the data makes it into storage, using something more consistent like Postgres and Ecto is often a good idea.

Avoiding Data Loss with Elixir's :dets module

Potential Data Loss

TL;DR