Homomorphic static analysis

It is somehow shared knowledge among Haskellers that applicatives allow performing static analysis, while monads do not. But what does it mean in practice? And what kind of static analysis is this referring to?

The best answer I could find online is this answer on StackOverflow. In this post I’d like to expand on it a bit and take the concept a bit further¹.

Contextual interface

Consider an example interface dependent on a generic context m, implemented as a record of functions:

data Interface m = Interface
  { foo :: FooInput -> m FooOutput
  , bar :: BarInput -> m BarOutput
  , baz :: BazInput -> m BazOutput
  }

This is an approach for defining interfaces in a pure language which has been discussed and proposed multiple times² and works pretty well. It defines a set of operations which work in the same context and which can be used to define more complex programs².

myProgram :: Interface m -> Input -> m Output
myProgram interface input = ...

myProgram will use the basic operations defined by Interface m, but how such operations can be combined is defined by m. If m admits an instance of Functor, we are able to execute one of its operations and then map the result with a pure function. In case m admits an instance of Applicative, we are able to independently execute operations from Interface m and then combine their results. If m admits an instance of Monad, we will be able to combine operations from Interface m sequentially, deciding whether to execute an operation based on the result of a previous one.

Interpreting the code without executing it

In our production code myProgram will be interpreted in some effectful context m where we will execute all the required effects. For example m could be a concrete monad, a stack of monad transformers or an Eff monad from your favourite effect system.

Still, we might want to collect some information about our code without performing any of the concrete effects; examples of this could be analysing some code metric, computing a list of dependencies or creating a graphical representation of our code flow.

One way to achieve this is by interpreting myProgram in another context m' which happen not to perform any of the effects, but it’s there just to collect information. Once we have this we can write another executable which interprets myProgram in m' and retrieves the information we are interested in without executing any effect. This could be done without running our application and that’s why it is called static analysis, even if it requires running some code.

Interpreting in a data-agnostic context

One way of defining a context m' like the one described in the last paragraph of the previous section is to make it data-agnostic. With this I mean a data structure like

newtype StaticInterpreter a = StaticInterpreter StaticDatum

The relevant thing of StaticInterpreter is that a is a phantom type, meaning that it appears in the type, but there is nothing of that type at the value level.

The benefit of this is that we can now combine values of StaticInterpreter even if they are tagged with different types. For example, suppose we have StaticDatum = Int, then we can combine StaticInterpreter i1 :: StaticInterpreter a1 and StaticInterpreter i2 :: StaticInterpreter a2 and obtain StaticInterpreter (i1 + i2) :: StaticInterpreter a.

Constraining context constrains static datum

Let’s now try to see what happens to StaticDatum as soon as we start imposing constraints on the m from myProgram.

If we need m to be a Functor, then also StaticInterpreter should admit a Functor instance

instance Functor StaticInterpreter where
  fmap :: (a -> b) -> StaticInterpreter a -> StaticInterpreter b
  fmap f (StaticInterpreter datum) = StaticInterpreter datum

To implement it, we have no constraint to impose on StaticDatum. This means that if we just need m to be a Functor, we can statically analyse myProgram and interpret it with any possible type.

Usually though our programs will require something more than Functor. Our next step is requiring m to have an Applicative instance. Let’s see what happens to StaticDatum as soon as we do that.

instance Applicative StaticInterpreter where
  pure :: a -> StaticInterpreter a
  pure a = StaticInterpreter _

  (<*>) :: StaticInterpreter (a -> b) -> StaticInterpreter a -> StaticInterpreter b
  (<*>) (StaticInterpreter sd1) (StaticInterpreter sd2) = StaticInterpreter _

We basically need a way to create a StaticDatum out of nothing, and a way to combine two StaticDatum values into a new StaticDatum value. In other terms, we need

_outOfNothing :: StatidDatum

_combineTwo :: StaticDatum -> StaticDatum -> StaticDatum

But this is exactly what a Monoid instance on StaticDatum gives us!

This means that we can use any type with a Monoid instance to statically analyse a program which is constrained by an Applicative constraint. For example StaticDatum could be a list of values and interpreting myProgram in StaticInterpreter would collect and concatenate all elements³.

Let’s now see what happens if we require StaticInterpreter to have a Monad instance.

instance Monad StaticInterpreter where
  (>>=) :: StaticInterpreter a -> (a -> StaticInterpreter b) -> StaticInterpreter b
  (>>=) (StaticInterpreter sd) f = _

If we want a non-trivial implementation of >>= (i.e. we don’t want to return StaticInterpreter mempty every time), we would need to use f to obtain a value of type StaticInterpreter b. But to call f we need a value of type a, which we can’t extract from StaticInterpreter a, because a there is actually a phantom type and there’s no value of type a in there. The fact that we can not implement >>= is the reason why we usually say that Monads do not allow static analysis⁴.

Let’s play the same game with categories

In the previous section we had a look at what kind of static analysis we can perform on a program written in a given context m. We concluded that if m is just required to have an Applicative instance, we can statically analyse our program with any monoid, while if we require an instance of Monad, then we basically lose our ability to introspect our code statically.

What could be interesting now (it is for me!) is trying to apply the same reasoning working with categories instead of functors/applicatives/monads. This would mean considering types with kind Type -> Type -> Type instead of Type -> Type. Examples of this could be simply (->) or Kleisli m⁵.

Let’s consider, as above, a data type which is defined with phantom types

newtype CatStaticInterpreter a b = CatStaticInterpreter StaticDatum

and see what happens when we try to implement instances of several type classes. Let’s start with the Profuctor typeclass.

instance Profunctor CatStaticInterpreter where
  dimap :: (a -> b) -> (c -> d) -> CatStaticInterpreter b c -> CatStaticInterpreter a d
  dimap f g (CatStaticInterpreter sd) = CatStaticInterpreter sd

As we did above with Functor, we just ignore the f and g functions, and we keep the same StaticDatum value.

Next we can investigate what happens with the Category typeclass

instance Category CatStaticInterpreter where
  id :: CatStaticInterpreter a a
  id = _

  (.) :: CatStaticInterpreter b c -> CatStaticInterpreter a b -> CatStaticInterpreter a c
  (.) (CatStaticInterpreter sd1) (CatStaticInterpreter sd2) = _

Similarly to what was happening with the Applicative typeclass in the previous section, we can notice that we can implement the Category instance for CatSTaticInterpreter if StaticDatum admits a Monoid instance. In that case we can define

  id = CatStaticInterpreter mempty

  (.) (CatStaticInterpreter sd1) (CatStaticInterpreter sd2) = CatStaticInterpreter (sd1 <> sd2)

Please notice here that, while Monad and Category are equally powerful from a computational point of view, the former does not admit any kind of static analysis, while the latter does.

An example with `Strong` and `Choice`

Thing start to get even more interesting as soon as we start other constraints like Strong or Choice. For each one of these typeclasses we need another binary operation on StaticDatum which would correspond to (***) and (+++) respectively.

For this use case, let’s define StaticDatum as

data StaticDatum
  = Unit Text
  | Compose StaticDatum StaticDatum
  | Alternative StaticDatum StaticDatum
  | Parallel StaticDatum StaticDatum

With this definition we can define instances for Category, Strong and Choice for CatStaticInterpreter.

instance Category CatStaticInterpreter where
  id = CatStaticInterpreter Unit
  (.) (CatStaticInterpreter sd1) (CatStaticInterpreter sd2) = CatStaticInterpreter (Compose sd1 sd2)

instance Strong CatStaticInterpreter where
  first' :: CatStaticInterpreter a b -> CatStaticInterpreter (a, c) (b, c)
  first' (CatStaticInterpreter sd) = CatStaticInterpreter (Parallel sd (Unit "id"))

  second' :: CatStaticInterpreter a b -> CatStaticInterpreter (c, a) (c, b)
  second' (CatStaticInterpreter sd) = CatStaticInterpreter (Parallel (Unit "id") sd)

instance Choice CatStaticInterpreter where
  left' :: CatStaticInterpreter a b -> CatStaticInterpreter (Either a c) (Either b c)
  left' (CatStaticInterpreter sd) = CatStaticInterpreter (Alternative sd (Unit "id"))

  right' :: CatStaticInterpreter a b -> CatStaticInterpreter (Either c a) (Either c b)
  right' (CatStaticInterpreter sd) = CatStaticInterpreter (Alternative (Unit "id") sd)

At this point let’s define an interface depending on a category p

data Interface p a b c d e f g h = Interface
  { foo1 :: p a b
  , foo2 :: p a c
  , foo3 :: p a (Either a a)
  , foo4 :: p a d
  , foo5 :: p a e
  , foo6 :: p (b, c) f
  , foo7 :: p (Either d e) g
  , foo8 :: p (f, g) h
  }

With it, we can define an example function foo consuming that interface

foo :: (Category p, Strong p, Choice p) => Interface p a b c d e f g h -> p a h
foo Interface {..} =
  foo8 . ((foo6 . (foo1 &&& foo2)) &&& (foo7 . (foo4 +++ foo5) . foo3))

Using our CatStaticInterpreter we can now define an implementation for Interface CatStaticInterpreter

implementation :: Interface CatStaticInterpreter a b c d e f g h
implementation =
  Interface
    { foo1 = CatStaticInterpreter (Unit "foo1")
    , foo2 = CatStaticInterpreter (Unit "foo2")
    , foo3 = CatStaticInterpreter (Unit "foo3")
    , foo4 = CatStaticInterpreter (Unit "foo4")
    , foo5 = CatStaticInterpreter (Unit "foo5")
    , foo6 = CatStaticInterpreter (Unit "foo6")
    , foo7 = CatStaticInterpreter (Unit "foo7")
    , foo8 = CatStaticInterpreter (Unit "foo8")
    }

Eventually we can statically analyse foo with CatStaticInterpreter just by evaluating foo implementation

>>> foo implementation
Compose
  (Unit "foo8")
  (Compose
    (Parallel
      (Compose
        (Unit "foo6")
        (Compose
          (Parallel (Unit "foo1") (Unit "id"))
          (Parallel (Unit "id") (Unit "foo2"))
        )
      )
      (Unit "id")
    )
    (Parallel
      (Unit "id")
      (Compose
        (Unit "foo7")
        (Compose
          (Compose
            (Alternative (Unit "foo4") (Unit "id"))
            (Alternative (Unit "id") (Unit "foo5"))
          )
          (Unit "foo3")
        )
      )
    )
  )

For example, with this tree-like representation, it’s not hard to traverse it and check whether a specific pattern is used (just a single function, or a more complex combination), or even optimise and restructure the tree. For example any (Compose (Parallel (Unit "foo1") (Unit "id")) (Parallel (Unit "id") (Unit "foo2"))) could probably be reinterpreted to (Parallel (Unit "foo1") (Unit "foo2")).

Another possibility is printing out a graphical representation of the resulting tree, which documents the flow of our program. For foo it could look like

Profunctor flow

Conclusion

To recap what we went through in this post, we saw that we can analyse statically any code working in an Applicative context with any monoid; on the other hand monads do not allow non-trivial static analysis.

Moving to a categorical realm, which is as powerful as the monadic one, we saw that we can actually perform static analysis, and actually adding more constraints allow us to have more powerful understanding of the structure of our code. In our example we saw what we can do using Strong and Choice. One can use the same trick with Closed, which deals with functions, CoStrong, which deals with recursion, or CoChoice.