Darcs internals, part 1: typesafe directed datastructures

I am editing this post from IRILL, where the darcs sprint is taking place

One of the things that I particularly like about the Darcs codebase is that you can see that the developers were not shy about using intermediate-slash-advanced Haskell/GHC features to help achieving type safety. You can see GADTs, associated types, phantom types, existential types actively used.

In this post I would like to discuss the representation of patches and the use of type witnesses in darcs.

A word about patches and contexts

This post is intended for people interested in patch theory and its implementation in darcs. A passing familiarity with patches, contexts, inverses, etc is useful, but not required, as we restate some basic definitions in this section.

A primitive patch is a basic unit in patch theory. It constitutes an atomic change in the repository. The definition of a primitive patch may vary, but usually the following changes are considered primitive patches:

  • Removing a file from the repository
  • Adding file to the repository
  • Changing a line n in a file in the repository
  • Identity patch, i.e. an empty/non-existing change that does not modify the state of the repository at all

Every primitive patch has a pre-context and a post-context. Roughly, you can think of a pre-context as the full state of the repository before the change was made, and of the post-context as the full state of the repository after the change was applied. We write (x)-A->(y) for a patch A with a pre-context x and a post-context y.

If a primitive patch A has a pre-context a, a post-context o, and a primitive patch B has a pre-context o, a post-context b, then we can combine two patches to obtain a sequential patch AB with the pre-context a and the post-context b.

Every primitive patch (x)-A->(y) has an inverse (y)-A^{-1}->(x), such that (x)-A->(y)-A^{-1}->(x) is equivalent to the identity patch (x)-1->(x).

In the next sections we will see how those things are implemented in darcs.

Primitive patches and witnesses

A primitive patch, which constitutes a single fine grained change, can be represented as a (G)ADT:

data Prim where
    Move :: FileName -> FileName -> Prim
    RmFile :: FileName -> Prim
    AddFile :: FileName -> Prim
    Hunk :: FileName -> Int -> ByteString -> ByteString -> Prim
    RmDir :: FileName -> Prim
    …

We can represent complex patches as sequences of primitive patches:

data Patch where
    NilP :: Patch
    PrimP :: Prim -> Patch
    SeqP :: Patch -> Patch -> Patch

This seems reasonable enough. But if we implement our patch theory this way we seem to be missing something — patches have (pre- and post-) contexts. Having contexs allows us to enforce patch composition on the level of type system. Consider the following definition, which uses phantom types as type witnesses for contexts.

data Prim x y where
    Move :: FileName -> FileName -> Prim x y
    RmFile :: FileName -> Prim x y
    AddFile :: FileName -> Prim x y
    Hunk :: FileName -> Int -> ByteString -> ByteString -> Prim x y
    RmDir :: FileName -> Prim x y
    …

data Patch x y where
    NilP :: Patch x x
    PrimP :: Prim x y -> Patch x y
    SeqP :: Patch x y -> Patch y z -> Patch x z

We call the types with witnesses representing pre- and post-contexts directed types. Intuitively, the directed type D x y has a “direction” from x to y, written as (x)->(y). The Prim datatype looks pretty much like the type actually used in Darcs. The Patch datatype, however, is completely artificial. We will see in the next sections how Darcs really models complex patches.

Directed programming

FL & RL

Two particularly useful directed types used in darcs are directed lists: forwards lists of the type FL a x y and reverse lists RL a x y. Forward lists “go” from the head to the tail; reverse lists “go” from the tail to the head. The lists are polymorphic over a just like regular lists.

data FL a x y where
    NilFL :: FL a x x    {- The empty list stays at (x) -}
    (:>:) :: a x y -> FL a y z -> FL a x z
infixr 5 :>:

For myself, I visualise forward lists like this:

(x) —a—> (y) —b—> (z) ——Nil——> (z)
a :: Patch x y
b :: Patch y z
(a :>: b :>: NilFL) :: FL Patch x z

The reverse lists are “going” from tail to head1

data RL a x y where
    NilRL :: RL a x x
    (:<:) :: RL a x y -> a y z -> RL a x z
infixl 5 :<:

(Mnemonically, the the head is always “greater” than the tail)

The reason I used the word “go” inside quotation marks so far is the following. Reverse lists and forward lists represent the same concept: a sequence of patches (or a sequence of directed things for that matter). They only differ in the associativity of the elements. Forward lists associate to the right, but reverse lists associate to the left.

p = Move "foo" "bar"

-- [[(x) --Nil--> (x) -p-> (y)] -id-> (y)] -p^{-1}-> (x)
example :: RL Patch x x
example = NilRL :<: p :<: Identity :<: inverse p

-- (x) -p-> [(y) -p^{-1}-> (x) --Nil--> (x)]
example2 :: FL Patch x x
example2 = p :>: inverse p :>: NilFL

The right-associated/reverse lists provide easy access to the last element of the sequence; the left-associated/forward lists provide easy access to the first element of the sequence. Therefore, if we view a repository as a directed sequence of patches, right-associated lists are useful for operations that work on the “latest” patches in the repository (such as record/unrecord), and left-associated lists are useful for commands that scan the repository from the beginning (like clone).

We can reassociate the lists easily, and verify that the two representations are isomoprhic:2

reverseRL :: RL a wX wY -> FL a wX wY
reverseRL = r NilFL
-- the type signature of r basically gives us an invariant
-- wZ is slowly "decreasing" reaching the point where
-- wZ = wX; at that point the first argument is of type FL a wX wY
  where r :: FL a wZ wY -> RL a wX wZ -> FL a wX wY
        r a NilRL = a
        r a (xs :<: x) = r (x :>: a) xs

For example,

-- [[(x) --Nil--> (x) -p-> (y1)] -q-> (y2)] -r-> (z)

Turns into

-- (x) -p-> [(y1) -q-> (y2) -r-> [(z) --Nil--> (z)]]

Exercise: write a function reverseFL :: FL a wX wY -> RL a wX wY

We can write a lot of directed analogues of familiar list functions. For example, here is a directed append:

infixl 5 +<+
(+<+) :: RL a wX wY -> RL a wY wZ -> RL a wX wZ
xs +<+ NilRL = xs
xs +<+ (ys :<: y) = (xs +<+ ys) :<: y

Exercise: write directed append for forward lists: (+>+) :: FL a wX wY -> FL a wY wZ -> FL a wX wZ

Type witnesses

So we wrote a bunch of standard list functions for directed lists; what about some of the other functions? Can we, for example, implement filter for directed lists. Can it look like filterFL :: (a wX wY -> Bool) -> FL a wX wY? Well, we can try writing

filterFL :: (a wX wY -> Bool) -> FL a wX wY
filterFL p NilFL = NilFL
filterFL p (x :>: xs) | p x = filterFL p xs
                      | otherwise = x :>: filterFL p xs

However, under closer scrutiny we realize that it does not typecheck! In the second clause of filterFL we have the following information:

x :: a x y, xs :: FL a y z
filterFL xs :: FL a y z

Thus, in the first case (in which p x holds) we try to return something of the type FL a wY wZ, when FL a wX wZ was expected. It is clear that generally we can do this only if x :: a wX wX, i.e. wY = wX. But a simple predicate of the type p :: a wX wZ -> Bool won’t tell us anything about that. We need an additional type witness in our system telling us that if p x holds, then x :: a wX wX. For that purpose we introdue the EqCheck datatype.

data EqCheck wX wY where
    IsEq :: EqCheck wX wX
    NotEq :: EqCheck wX wY

then the type of a predicate would be

type Pred a = forall wX wY. a wX wY -> EqCheck wX wY

If (p x) = IsEq, then the typechecker will know that x :: a wX wX. We can then finally write

filterFL :: Pred a -> FL a wX wY -> FL a wX wY
filterFL p NilFL = NilFL
filterFL p (x :>: xs) | IsEq <- p x = filterFL p xs
                      | otherwise   = x :>: filterFL p xs

EqCheck is used this way in the darcs source code to e.g., filter our internal patches. Sometimes darcs stores information — like suspended patches — in the so called internal patches. Every patch type implements the internal patch checker (code slightly adapted):

-- |Provides a hook for flagging whether a patch is "internal" to the repo
-- and therefore shouldn't be referred to externally, e.g. by inclusion in tags.
-- Note that despite the name, every patch type has to implement it, but for
-- normal (non-internal) types the default implementation is fine.
-- Currently only used for rebase internal patches.
class MaybeInternal p where
    -- | @maybe (const NotEq) (fmap isInternal patchInternalChecker) p@
    -- returns 'IsEq' if @p@ is internal, and 'NotEq' otherwise.
    -- The two-level structure is purely for efficiency: 'Nothing' and 'Just (InternalChecker (const NotEq))' are
    -- semantically identical, but 'Nothing' allows clients to avoid traversing an entire list.
    -- The patch type is passed as an 'FL' because that's how the internals of named patches are stored.
    patchInternalChecker :: Maybe (forall wX wY . FL p wX wY -> EqCheck wX wY)
    patchInternalChecker = Nothing

When the user runs darcs tag in the repository, darcs creates a dummy patch that explicitly depends on all the previous patches — apart from the internal ones of course. Thus, the tag command uses the following function (slightly adapted):

filterNonInternal :: MaybeInternal p => PatchSet p wX wY -> PatchSet p wX wY
filterNonInternal =
  case patchInternalChecker of
     Nothing -> id
     Just f -> \l -> PatchSet (filterRL (f . patchcontents . hopefully) (unPatchSet l))

where a PatchSet is the list of PatchInfoAnd patches — patches together with the meta-information.

It is worth noting that EqCheck x y is isomorphic Maybe (x :~: y), but the propositional equality datatype has only been added to base since 4.7.0.0. In the future versions darcs will probably switch to using Data.Type.Equality.

Conclusion

We’ve briefly touched upon patch representation in darcs and talked about directed types and directed programming.

A good if a bit outdated reference is Jason Dagit’s master thesis (specifically the bits from chapter 4). The wiki is currently lacking in material, but I hope to improve the situation eventually.

Next time we will probably discuss either directed pairs and their use in darcs, or sealed datatypes, or both.


  1. Essentially, a reverse list is a directed version of a snoc-list.

  2. Before darcs 2.10.1 the right-associated lists had a slightly different datatype, and for the old lists those functions would “reverse” the list. Hence the names of the functions.

Advertisements

3 thoughts on “Darcs internals, part 1: typesafe directed datastructures

  1. Pingback: Darcsden imporvements and Darcs sprint | (parentheses)

  2. Pingback: Darcsden improvements and Darcs sprint | (parentheses)

  3. Pingback: Links of the Week, W40 2015 | One More Game-Dev and Programming Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s