I am editing this post from IRILL, where the darcs sprint is taking place
One of the things that I particularly like about the Darcs codebase is that you can see that the developers were not shy about using intermediate-slash-advanced Haskell/GHC features to help achieving type safety. You can see GADTs, associated types, phantom types, existential types actively used.
In this post I would like to discuss the representation of patches and the use of type witnesses in darcs.
A word about patches and contexts
This post is intended for people interested in patch theory and its implementation in darcs. A passing familiarity with patches, contexts, inverses, etc is useful, but not required, as we restate some basic definitions in this section.
A primitive patch is a basic unit in patch theory. It constitutes an atomic change in the repository. The definition of a primitive patch may vary, but usually the following changes are considered primitive patches:
- Removing a file from the repository
- Adding file to the repository
- Changing a line
n
in a file in the repository - Identity patch, i.e. an empty/non-existing change that does not modify the state of the repository at all
Every primitive patch has a pre-context and a post-context. Roughly, you can think of a pre-context as the full state of the repository before the change was made, and of the post-context as the full state of the repository after the change was applied. We write (x)-A->(y)
for a patch A
with a pre-context x
and a post-context y
.
If a primitive patch A
has a pre-context a
, a post-context o
, and a primitive patch B
has a pre-context o
, a post-context b
, then we can combine two patches to obtain a sequential patch AB
with the pre-context a
and the post-context b
.
Every primitive patch (x)-A->(y)
has an inverse (y)-A^{-1}->(x)
, such that (x)-A->(y)-A^{-1}->(x)
is equivalent to the identity patch (x)-1->(x)
.
In the next sections we will see how those things are implemented in darcs.
Primitive patches and witnesses
A primitive patch, which constitutes a single fine grained change, can be represented as a (G)ADT:
data Prim where
Move :: FileName -> FileName -> Prim
RmFile :: FileName -> Prim
AddFile :: FileName -> Prim
Hunk :: FileName -> Int -> ByteString -> ByteString -> Prim
RmDir :: FileName -> Prim
…
We can represent complex patches as sequences of primitive patches:
data Patch where
NilP :: Patch
PrimP :: Prim -> Patch
SeqP :: Patch -> Patch -> Patch
This seems reasonable enough. But if we implement our patch theory this way we seem to be missing something — patches have (pre- and post-) contexts. Having contexs allows us to enforce patch composition on the level of type system. Consider the following definition, which uses phantom types as type witnesses for contexts.
data Prim x y where
Move :: FileName -> FileName -> Prim x y
RmFile :: FileName -> Prim x y
AddFile :: FileName -> Prim x y
Hunk :: FileName -> Int -> ByteString -> ByteString -> Prim x y
RmDir :: FileName -> Prim x y
…
data Patch x y where
NilP :: Patch x x
PrimP :: Prim x y -> Patch x y
SeqP :: Patch x y -> Patch y z -> Patch x z
We call the types with witnesses representing pre- and post-contexts directed types. Intuitively, the directed type D x y
has a “direction” from x
to y
, written as (x)->(y)
. The Prim
datatype looks pretty much like the type actually used in Darcs. The Patch
datatype, however, is completely artificial. We will see in the next sections how Darcs really models complex patches.
Directed programming
FL & RL
Two particularly useful directed types used in darcs are directed lists: forwards lists of the type FL a x y
and reverse lists RL a x y
. Forward lists “go” from the head to the tail; reverse lists “go” from the tail to the head. The lists are polymorphic over a
just like regular lists.
data FL a x y where
NilFL :: FL a x x {- The empty list stays at (x) -}
(:>:) :: a x y -> FL a y z -> FL a x z
infixr 5 :>:
For myself, I visualise forward lists like this:
(x) —a—> (y) —b—> (z) ——Nil——> (z)
a :: Patch x y
b :: Patch y z
(a :>: b :>: NilFL) :: FL Patch x z
The reverse lists are “going” from tail to head1
data RL a x y where
NilRL :: RL a x x
(:<:) :: RL a x y -> a y z -> RL a x z
infixl 5 :<:
(Mnemonically, the the head is always “greater” than the tail)
The reason I used the word “go” inside quotation marks so far is the following. Reverse lists and forward lists represent the same concept: a sequence of patches (or a sequence of directed things for that matter). They only differ in the associativity of the elements. Forward lists associate to the right, but reverse lists associate to the left.
p = Move "foo" "bar"
-- [[(x) --Nil--> (x) -p-> (y)] -id-> (y)] -p^{-1}-> (x)
example :: RL Patch x x
example = NilRL :<: p :<: Identity :<: inverse p
-- (x) -p-> [(y) -p^{-1}-> (x) --Nil--> (x)]
example2 :: FL Patch x x
example2 = p :>: inverse p :>: NilFL
The right-associated/reverse lists provide easy access to the last element of the sequence; the left-associated/forward lists provide easy access to the first element of the sequence. Therefore, if we view a repository as a directed sequence of patches, right-associated lists are useful for operations that work on the “latest” patches in the repository (such as record/unrecord), and left-associated lists are useful for commands that scan the repository from the beginning (like clone).
We can reassociate the lists easily, and verify that the two representations are isomoprhic:2
reverseRL :: RL a wX wY -> FL a wX wY
reverseRL = r NilFL
-- the type signature of r basically gives us an invariant
-- wZ is slowly "decreasing" reaching the point where
-- wZ = wX; at that point the first argument is of type FL a wX wY
where r :: FL a wZ wY -> RL a wX wZ -> FL a wX wY
r a NilRL = a
r a (xs :<: x) = r (x :>: a) xs
For example,
-- [[(x) --Nil--> (x) -p-> (y1)] -q-> (y2)] -r-> (z)
Turns into
-- (x) -p-> [(y1) -q-> (y2) -r-> [(z) --Nil--> (z)]]
Exercise: write a function reverseFL :: FL a wX wY -> RL a wX wY
We can write a lot of directed analogues of familiar list functions. For example, here is a directed append:
infixl 5 +<+
(+<+) :: RL a wX wY -> RL a wY wZ -> RL a wX wZ
xs +<+ NilRL = xs
xs +<+ (ys :<: y) = (xs +<+ ys) :<: y
Exercise: write directed append for forward lists: (+>+) :: FL a wX wY -> FL a wY wZ -> FL a wX wZ
Type witnesses
So we wrote a bunch of standard list functions for directed lists; what about some of the other functions? Can we, for example, implement filter
for directed lists. Can it look like filterFL :: (a wX wY -> Bool) -> FL a wX wY
? Well, we can try writing
filterFL :: (a wX wY -> Bool) -> FL a wX wY
filterFL p NilFL = NilFL
filterFL p (x :>: xs) | p x = filterFL p xs
| otherwise = x :>: filterFL p xs
However, under closer scrutiny we realize that it does not typecheck! In the second clause of filterFL we have the following information:
x :: a x y, xs :: FL a y z
filterFL xs :: FL a y z
Thus, in the first case (in which p x
holds) we try to return something of the type FL a wY wZ
, when FL a wX wZ
was expected. It is clear that generally we can do this only if x :: a wX wX
, i.e. wY = wX
. But a simple predicate of the type p :: a wX wZ -> Bool
won’t tell us anything about that. We need an additional type witness in our system telling us that if p x
holds, then x :: a wX wX
. For that purpose we introdue the EqCheck datatype.
data EqCheck wX wY where
IsEq :: EqCheck wX wX
NotEq :: EqCheck wX wY
then the type of a predicate would be
type Pred a = forall wX wY. a wX wY -> EqCheck wX wY
If (p x) = IsEq
, then the typechecker will know that x :: a wX wX
. We can then finally write
filterFL :: Pred a -> FL a wX wY -> FL a wX wY
filterFL p NilFL = NilFL
filterFL p (x :>: xs) | IsEq <- p x = filterFL p xs
| otherwise = x :>: filterFL p xs
EqCheck
is used this way in the darcs source code to e.g., filter our internal patches. Sometimes darcs stores information — like suspended patches — in the so called internal patches. Every patch type implements the internal patch checker (code slightly adapted):
-- |Provides a hook for flagging whether a patch is "internal" to the repo
-- and therefore shouldn't be referred to externally, e.g. by inclusion in tags.
-- Note that despite the name, every patch type has to implement it, but for
-- normal (non-internal) types the default implementation is fine.
-- Currently only used for rebase internal patches.
class MaybeInternal p where
-- | @maybe (const NotEq) (fmap isInternal patchInternalChecker) p@
-- returns 'IsEq' if @p@ is internal, and 'NotEq' otherwise.
-- The two-level structure is purely for efficiency: 'Nothing' and 'Just (InternalChecker (const NotEq))' are
-- semantically identical, but 'Nothing' allows clients to avoid traversing an entire list.
-- The patch type is passed as an 'FL' because that's how the internals of named patches are stored.
patchInternalChecker :: Maybe (forall wX wY . FL p wX wY -> EqCheck wX wY)
patchInternalChecker = Nothing
When the user runs darcs tag
in the repository, darcs creates a dummy patch that explicitly depends on all the previous patches — apart from the internal ones of course. Thus, the tag
command uses the following function (slightly adapted):
filterNonInternal :: MaybeInternal p => PatchSet p wX wY -> PatchSet p wX wY
filterNonInternal =
case patchInternalChecker of
Nothing -> id
Just f -> \l -> PatchSet (filterRL (f . patchcontents . hopefully) (unPatchSet l))
where a PatchSet
is the list of PatchInfoAnd
patches — patches together with the meta-information.
It is worth noting that EqCheck x y
is isomorphic Maybe (x :~: y)
, but the propositional equality datatype has only been added to base since 4.7.0.0. In the future versions darcs will probably switch to using Data.Type.Equality
.
Conclusion
We’ve briefly touched upon patch representation in darcs and talked about directed types and directed programming.
A good if a bit outdated reference is Jason Dagit’s master thesis (specifically the bits from chapter 4). The wiki is currently lacking in material, but I hope to improve the situation eventually.
Next time we will probably discuss either directed pairs and their use in darcs, or sealed datatypes, or both.
Pingback: Darcsden imporvements and Darcs sprint | (parentheses)
Pingback: Darcsden improvements and Darcs sprint | (parentheses)
Pingback: Links of the Week, W40 2015 | One More Game-Dev and Programming Blog