Dipping My Toe into Elixir

Elixir is a functional programming language that promise both scalability and great fault-tolerance. It also provides great tools to support development. But how much does elixir live up to their promise?

In this post, we will create a theoretical chatbot using elixir. The chatbot will read user messages and will be triggered if the message contain url with tracking query in it. Then, the cleaned up url will be sent accordingly.

While I’m not going to write into too much detail, I’ll outline some important aspects that I’ve encountered while writing this project. Also for context, I’m not entirely unfamiliar with elixir as I’ve used it before at my job. But, it’s the first time I’ve written a complete elixir project from scratch. So, please takes this post with a grain of salt.

Preparation

There are two ways to install elixir on your system. Either you install elixir through your package manager (e.g. apt, dnf, etc) or use language version manager, in this case asdf.

# clone asdf core
$ git clone https://github.com/asdf-vm/asdf.git ~/.asdf

# add asdf to shell
$ echo '. "$HOME/.asdf/asdf.sh"' >> ~/.bashrc
# or zsh
$ echo '. "$HOME/.asdf/asdf.sh"' >> ~/.zshrc

Because elixir is built on top of erlang, we also need to install erlang. Fortunately asdf also support erlang.

# install erlang
$ asdf plugin add erlang https://github.com/asdf-vm/asdf-erlang.git
$ asdf install erlang latest

# install elixir
$ asdf plugin-add elixir https://github.com/asdf-vm/asdf-elixir.git
$ asdf install elixir latest

$ elixir --version

Code Editor

Personally, I use emacs. Doom emacs specifically. It already comes with support for elixir, with a few configs and tweaks we are ready to go.

;; uncomment this on your .doom.d/init.el config file
(elixir +lsp +tree-sitter)

Project Structure

$ mix new anti_tracking --umbrella

Mix is a build tool to help manage elixir project. It’s installed automatically when we install elixir on our machine.

A directory named anti_tracking will be created. Inside, we have a Mix project named AntiTracking.

The --umbrella parameter indicates that the generated project will be an umbrella project. Umbrella project is a term used in elixir for a monorepo project. Each applications in the umbrella project will share configurations and dependencies. Check the mix documentation for further details.

Storages

Storages application is responsible for handling data. Similarly, we’re going to use Mix to initiate the application.

$ cd anti_tracking/apps
# create the application
$ mix new storages --sup

The --sup parameter will create a supervised application. You can refer to the official documentation for details. The short explanation is that the project will be run in persistent mode and restarted if the child process crashed.

Back to the storages, let’s take a quick look at the application. The main purpose of this application is to serve as source of truth for the whole project. The data itself are web domains and its list of query parameters that known to be a tracker. Those data will be stored in both database (postgresQL) and simple in-memory cache. The application also comes with an interface to query and modify the data.

defmodule Storages.Cache do
  use GenServer
  ...

  @impl true
  def init(:ok) do
    {:ok, %{}}
  end

  @impl true
  def handle_call({:lookup, site_name}, _from, cache) do
    case Map.fetch(cache, site_name) do
      {:ok, parameters} -> {:reply, MapSet.to_list(parameters), cache}
      _ -> {:reply, [], cache}
    end
  end
  
  ...
end

While implementing the cache using genserver might create a bottleneck, the traffic and operation in this module is quite lightweight. In case that this approach is not performed enough, we can substitute this module with a proper cache (e.g. redis) to avoid bottleneck.

The database is an interface to postgresql database through Ecto library. Ecto is an ORM library for elixir that comes with handy tools (even migration included, so cool!) and extensive documentation. Adding the dependencies is quite straightforward.

# Mix.exs
defp deps do
  [
    {:ecto_sql, "~> 3.0"},
    {:postgrex, ">= 0.0.0"}
  ]
end

While I’m not really a fans of ORM, using Ecto is very simple and easy to understand. You can check their getting started documentation yourself.

According to Ecto convention, each table is defined in their own module. Also following that, we will put related functions (e.g. fetching ane updating data) on each table module accordingly.

defmodule Storages.Site do
  use Ecto.Schema
  require Ecto.Query
  import Ecto.Changeset
  import Ecto.Query

  schema "sites" do
    field(:name, :string)
    many_to_many(:parameters, Storages.Parameter, join_through: "site_parameters")
  end

  @spec get_or_insert(String.t()) :: {:error, %Ecto.Changeset{}} | {:ok, integer()}
  def get_or_insert(name) do
  ...
end

Last but not least, the main Storages module. This module is the interface that will serves logic to handle the data.

defmodule Storages do
  require Logger

  @spec insert(String.t(), [String.t()], {:update_cache, true}) ::
          {:server_error, String.t()} | {:error, String.t()} | :ok
  def insert(site_name, parameter_names, {:update_cache, true}) do
    case insert(site_name, parameter_names) do
      :ok ->
        Storages.Cache.update(site_name, parameter_names)
        :ok
  ...
end

Interaction to the Storages application, be it from another application or interactive shell (more on this later) will be started from this module.

Because we want the application to start (and supervise) both services during startup, they needed to be added to the application children list.

defmodule Storages.Application do
  @moduledoc false

  use Application

  @impl true
  def start(_type, _args) do
    children = [
      Storages.Repo,
      {Storages.Cache, name: Storages.Cache}
    ]

    opts = [strategy: :one_for_one, name: Storages.Supervisor]
    Supervisor.start_link(children, opts)
  end
end

All done for the Storages application. Now, let’s test this using elixir interactive shell (iex).

$ iex -S mix

iex(1)> Storages.get("google.com")
["utm_params", "utm_sources", "utm_campaign"]

The iex command will execute elixir REPL in the current terminal. The -S mix argument will tell the iex to start the current mix project and attach the shell to the project. From the shell, we can execute modules and functions included in the project. Pretty useful for testing, even more the shell can reload code changes without needed to restart.

iex(1)> recompile()

In my opinion, this is one of the coolest feature of elixir. This make iterating very simple.

Web

Web application will serve as an interface to modify the bot data. With a simple authentication, the ‘bot admin’ can easily modifying the data without needed to update the database (or cache) directly. Just like with the Storages, we’re going to create a new supervised application.

$ mix new web --sup

The application itself is pretty simple, it will handle user input through web interface and pass that to our Storages app. For this application we’re utilizing three libraries:

# Mix.exs
  defp deps do
    [
      {:bandit, "~> 1.0"},
      {:poison, "~> 6.0"},
      {:storages, in_umbrella: true}
    ]
  end

Because the web will call storages application to fetch or modify data, it needs to be added in the dependency list. Notice that instead of version, we put in_umbrella: true. This will tell mix that this certain dependency is reside within the same project.

The bandit dependency is already include plug in it. While bandit is responsible for the heavylifting of the webserver, plug will handle and ‘glue’ our application. Let’s talk a bit about plug.

Plug, the heart of this application is so powerful yet so simple. So much that it deserved a writeup on its own. To put it simply, by utilizing Plug we can create a simple webserver and focus on the business logic (akin to Express of nodejs).

defmodule Web do
  use Plug.Router

  plug(:match)
  plug(Plug.Logger)
  plug(:dispatch)

  get "/ping" do
    send_resp(conn, 200, "pong")
  end

  forward("/tracker", to: Web.TrackerRouter)

  match _ do
    send_resp(conn, 404, "not found")
  end
end

You might notice that the module doesn’t follow the elixir standard of function definition. It’s because Plug heavily utilizing macro to make defining path clearer.

On the Web.TrackerRouter module, we handle user inputs. Including authenticating request, json serialize and deserializing, and function to Storages application invocation.

defmodule Web.TrackerRouter do
  require Logger

  use Plug.Router

  plug(:match)

  import Plug.BasicAuth

  plug(:basic_auth,
    username: Application.compile_env(:web, :username),
    password: Application.compile_env(:web, :password)
  )

  plug(Plug.Parsers, parsers: [:json], json_decoder: Poison)

  plug(:dispatch)

  get "/:site_name" do
    payload =
      %{parameters: Storages.get(site_name)}
      |> Poison.encode!()

    conn
    |> send_json_response(200, payload)
  end

  ...
end

Last, we need to define our child application. But unlike last time, we’re going to add bandit as the child instead of our module directly.

defmodule Web.Application do
  @moduledoc false

  use Application

  @impl true
  def start(_type, _args) do
    children = [
      {Bandit, plug: Web, scheme: :http, port: Application.fetch_env!(:web, :port)}
    ]

    opts = [strategy: :one_for_one, name: Web.Supervisor]
    Supervisor.start_link(children, opts)
  end
end

This is the advantage of using plug (and bandit in this case). Most of the time, we can simply implement plug as we need and all other heavylifting of the web will be handled by the library, neat!

$ iex -S mix

[info] Running Web with Bandit 1.6.1 at 0.0.0.0:8080 (http)
Erlang/OTP 26 [erts-14.2.3] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [jit:ns

Because both applications reside in a same umbrella project, we don’t need to start each application individually. And we can test that the webserver is working by invoking a simple curl.

$ curl localhost:8080/tracker/www.google.com \
   -H "Authorization: Basic aGVsbG86d29ybGQ="

{"parameters":["utm_params", "utm_sources", "utm_campaign"]}%

Bot

The bot application is the core of the project. This application will connect to a webserver and handle all received messages. Supposed the message contains known url with tracking parameters, the bot will then send the url with all tracking parameters removed.

We’re using Websockex, an elixir websocket client. The library is choosen due to their API that resembled elixir genserver. Thus, make the application more familiar and easy to work with.

defmodule Bot do
  use WebSockex
  require Logger

  def start_link(opts \\ []) do
    WebSockex.start_link(Keyword.get(opts, :url), __MODULE__, :state, opts)
  end

  @spec cast_message(String.t()) :: :ok
  def cast_message(message) do
    Logger.info("casting message: #{message}")
    WebSockex.cast(__MODULE__, message)
  end

  ...

  def handle_frame({:text, message}, state) do
    Task.Supervisor.start_child(Bot.TaskSupervisor, Bot.MessageHandler, :handle, [message])
    {:ok, state}
  end

  ...
end

The nature of a chat bot requires them to handle big stream of messages. Hence, we’re utilizing task to handle each message. Task is a convenient way to spawn a process in elixir (similar to goroutine of golang). By using task supervisor, we can even retry the task in the event of error or failure. Because each message handled by different process, the main process won’t be blocked and the bot can handle multiple messages ‘concurrently’.

Concurrently in a quote due to the nature of the websocket client library itself. While the message is processed concurrently, sending it back to the server can only be done using a single connection. Making the websocket connection being the bottleneck. And it might be a good thing, because most chat servers has a rate limit. Breaking that and our bot might be timed out or worse getting banned. To prevent that, we might also need to implement rate limiter for message sent. But, because we’re only implementing this for a theoretical chat server, current implementation is sufficient.

defmodule Bot.MessageHandler do
  @spec handle(String.t()) :: nil | :ok
  def handle(message) do
    message
    ...
    |> Enum.map(&cleanup_uri/1)
    ...
    |> send()
  end

  defp cleanup_uri(uri) do
    %URI{host: host, query: query} = uri
    query = URI.decode_query(query)

    cleaned_query =
      Storages.get(host, {:with_cache, true})
      |> Enum.reduce(query, fn item, acc ->
        Map.delete(acc, item)
      end)
    ...
  end
  
  ...
  
  defp send(uris) do
    uris
    |> Enum.reduce("Here's your cleaned up urls:", fn item, acc ->
      acc <> "\n - " <> URI.to_string(item)
    end)
    |> Bot.cast_message()
  end
end

The handler implementation itself is quite straightforward. We parse the message, and if it contains a known url with tracking parameters we send it back with the cleaned up url.

defmodule Bot.Application do
  @moduledoc false

  use Application

  @impl true
  def start(_type, _args) do
    children = [
      {Task.Supervisor, name: Bot.TaskSupervisor},
      {Bot, [async: true, name: Bot, url: Application.fetch_env!(:bot, :url)]}
    ]

    opts = [strategy: :one_for_one, name: Bot.Supervisor]
    Supervisor.start_link(children, opts)
  end
end

Same as before, we register our supervised module in our application module. Notice that we also register the task supervisor despite not having the module in the application. It’s due to the nature of the supervisor module, we can simply started it and give it a name. We can also add some options to handle error events, but let’s keep it simple for now.

Get It Running!

As we can see, our bot will react to message with known url and tracking parameter. It won’t reply to message without url or url without tracking parameter.

Deployment

We need to add some informations to our root mix.exs file before we can start the deployment.

defmodule AntiTracking.MixProject do
  use Mix.Project

  def project do
    [
      apps_path: "apps",
      version: "0.1.0",
      start_permanent: Mix.env() == :prod,
      deps: deps(),
      releases: [
        bot: [
          version: "0.0.1",
          include_executables_for: [:unix],
          applications: [
            web: :permanent, 
            storages: :permanent, 
            bot: :permanent
          ]
        ]
      ]
    ]
  end
  ... 
end

We need to define the applications included in the release. If desired we can define multiple releases with different applications (e.g. release only web application, etc).

You might notice that we set the start_permanent key with comparison to environment and atom :prod. This key will determine whether to crash the erlang vm when the application supervisor tree shutdown, we don’t want this to happen in development as the stacktraces are important to debugging. But, what is the environment being referred here? Let’s take a quick detour.

Environment

By default, our mix project is run on :dev environment. We can set this by passing MIX_ENV variable.

$ MIX_ENV=prod iex -S mix

Mix support three environments out of the box: :dev, :test, and :prod. Those environments should be self explanatory. Other than setting various compiler config (debug artifacts, etc), environment also used to set project configuration.

Configuration

Configuration is a module that used to define project configuration. Typically, it reside in config/config.exs file. But it’s configurable through application Mix.exs file in config_path key (for umbrella project, this is defined on each application mix file).

import Config

config :storages, ecto_repos: [Storages.Repo]
import_config "#{config_env()}.exs"

Other than config data for ecto, we can see that the config file will import another config file with current environment as the filename. This enable per environment configuration, pretty helpful for separating configuration for different scenario (e.g. development and testing). For production, it’s a good practice not to commit the configuration value. Instead, configuration values will be defined from environment variables. (In the real word though, usually the production config values are set using more sophisticated way like kubeseal. Which in turn will set the production machine environment variables).

import Config

config :storages, Storages.Repo,
  database: System.get_env("DB_NAME"),
  username: System.get_env("DB_USERNAME"),
  password: System.get_env("DB_PASSWORD"),
  hostname: System.get_env("DB_HOST")

config :web,
  port: System.get_env("WEB_PORT"),
  username: System.get_env("WEB_USERNAME"),
  password: System.get_env("WEB_PASSWORD")

config :bot,
  url: System.get_env("BOT_WS_URL")

There’s a “gotcha” for this approach. In elixir, by default all configuration value is determine while compiling. Which mean, if we set configuration values while both compiling and running, the one used will be the one set while compiling. For configuration that set during runtime, we need to set it on config/runtime.exs instead.

Now we have learn about environment and configuration. Let’s create our first production release for AntiTracking bot.

$ export DB_NAME=antitracking_db
...
$ MIX_ENV=prod mix compile
$ MIX_ENV=prod mix release

Remember, we need to set the environment variables before compiling. Otherwise, configuration values will be empty.

The compiled binary will then available on the _build/prod/rel/bot/bin/ directory with the release name as the binary name (in this case bot). The binary itself contains set of commands.

$ ./_build/prod/rel/bot/bin/bot 
Usage: bot COMMAND [ARGS]

The known commands are:

    start          Starts the system
    start_iex      Starts the system with IEx attached
    daemon         Starts the system as a daemon
    daemon_iex     Starts the system as a daemon with IEx attached
    eval "EXPR"    Executes the given expression on a new, non-booted system
    rpc "EXPR"     Executes the given expression remotely on the running system
    remote         Connects to the running system via a remote shell
    restart        Restarts the running system via a remote command
    stop           Stops the running system via a remote command
    pid            Prints the operating system PID of the running system via a remote command
    version        Prints the release name and version to be booted

Lesson Learned and Future Works

Coming from OO / procedural programming paradigm, learning elixir is quite challenging. I need to “rebuild” my thinking process just to solve a simple problem.

The book “Elixir in Action” by Sasa Juric is pretty good on introducing the language, environment, and guideline to writing elixir. Solving some leetcode / hackerrank problem also helps a lot to give “sense” of the language. The official documentation on hexdocs provides great step-by-step tutorial on creating an elixir project.

I realized that for a first project, this is far more perfect. There are several topics / directions that I want to explore for my next elixir project.

While working on this project is challenging, I’m having so much fun writing elixir. Exploring elixir gives me a fresh feeling compared to my day programming job. But, Elixir as praised as it is, still has its own flaws.

The idiom “there’s no silver bullet solution, just pick the right tool for the right job” stand true even for it. But I think, it’s okay to have your own favourite tool, in my case it might be elixir.