undefined
.
Eventually, the complete
version will be made available.
A Persistent Set Interface
Warning: we are going to use a few recent GHC extensions in this lecture. We'll explain these extensions as we use them below.
> {-# LANGUAGE AllowAmbiguousTypes #-}
> {-# LANGUAGE ScopedTypeVariables #-}
> {-# LANGUAGE TypeApplications #-}
> {-# LANGUAGE GeneralizedNewtypeDeriving #-}
>
> module Persistent where
> import Control.Monad
> import Test.QuickCheck hiding (elements)
> import Data.Maybe as Maybe
> import Data.List (sort,nub)
A persistent data structure is one where all of the operations are pure functions.
For example, let's look at an interface of a simple persistent set of ordered elements. We can tell that the implementation is persistent just by looking at the types of the operations. In particular, the empty operation is not a function, it is just a set --- there is only one empty set. If we were allowed to mutate it, it wouldn't be empty any more.
> class Set s where
> empty :: s a
> insert :: Ord a => a -> s a -> s a
> delete :: Ord a => a -> s a -> s a
> member :: Ord a => a -> s a -> Bool
> elements :: Ord a => s a -> [a]
We can tell that the set implementation is persistent just by looking at the types of these operations. For example, the insert
and delete
operations return a new set instead of modifying their argument.
Set Properties
What properties should instances of the Set
interface satisfy? Can we test those properties with QuickCheck?
When we define an abstract interface like Set
above, using a type class, we should also specify properties that all implementations should satisfy. What properties do you think a set should satisfy?
Below, we will define some of these properties as QuickCheck tests.
Here's an example property: The empty set has no elements.
Although this property is simple to state, it is difficult to encode via QuickCheck. The problem is that the type of this function is ambiguous --- the type variable s
only appears in the Set
constraint and no where else in the type. This is usually a bad situation, it means that type inference will be unable to determine the type s
. However, we will work around this situation with GHC extensions. In particular, the extension AllowAmbiguousTypes
instructs GHC to permit types like the one above. To use definitions with ambiguous types we also need to enable TypeApplications
. (Below, we will use a TypeApplication to tell GHC what s
should be.)
The other tricky part of this definition is that, due to the ambiguity, we need to add a type annotation to empty
. Otherwise, GHC doesn't know what empty set to use. However, we want our property to be generic, so we need a yet another extension, called ScopedTypeVariables
to bring the type variable s
into scope in the body of prop_empty
using the keyword forall
. That way we can mention this type variable in the annotation s Int
. Without this forall
the type variable s
would be unbound.
We can also define properties for set insertion. Here are two potential ones:
> -- Anything inserted can be found in the set
> prop_insert :: Set s => Int -> s Int -> Bool
> prop_insert x y = member x (insert x y)
> -- We don't lose any set elements after an insertion.
> prop_insert_inequal :: Set s => Int -> Int -> s Int -> Property
> prop_insert_inequal x y s =
> x /= y ==> member y s == member y (insert x s)
And here's one for elements
:
> -- No duplicates in the elements
> prop_elements :: Set s => s Int -> Bool
> prop_elements s = length (nub l) == length l where
> l = elements s
These are only a few of the properties that a Set
should satisfy. What else can you think of?
Define at least one of your own set properties below (you'll probably need to adjust the type annotation):
List implementation
One trivial implementation of sets uses lists. Any list can represent a set; though if we want a canonical representation of that set we will need to remove duplicates and sort the list.
We'll create a newtype for this version so we don't confuse these lists with regular lists. The GeneralizedNewtypeDeriving
extension allows us to reuse any class instance for lists as an instance for ListSet
. This is useful for Arbitrary
--- we want to generate ListSet
s in the same way as generating lists.
One reason that we define this newtype is to avoid existing list instances that we don't want. For example, we don't want to have the Eq
or Foldable
instances around because they would leak the fact that we aren't ordering the list or removing duplicates until we absolutely have to.
> instance Set ListSet where
> empty = ListSet []
> insert x (ListSet xs) = ListSet (x:xs)
> member x (ListSet xs) = x `elem` xs
> delete x (ListSet xs) = ListSet (filter (/= x) xs)
> elements (ListSet xs) = sort (nub xs)
Let's make sure our implementation satisfies properties of sets.
> main :: IO ()
> main = do
> quickCheck $ prop_empty @ListSet
> quickCheck $ prop_elements @ListSet
> quickCheck $ prop_insert @ListSet
> quickCheck $ prop_insert_inequal @ListSet
> quickCheck $ prop_myproperty @ListSet
Note that in each case we provide the list type constructor ListSet
as a type argument to the property. (Type arguments must begin with @
in Haskell. We can only supply them if the TypeApplications
extension is enabled.)
*Persistent> :set -XTypeApplications
*Persistent> quickCheck $ prop_empty @ListSet
Persistent vs. Ephemeral
- An ephemeral data structure is one for which only one version is available at a time: after an update operation, the structure as it existed before the update is lost.
For example, conventional arrays are ephemeral. After a location in an array is updated, its old contents are no longer available.
- A persistent structure is one where multiple version are simultaneously accessible: after an update, both old and new versions are available.
For example, a binary tree can be implemented persistently, so that after insertion, the old value of the tree is still available.
Persistent data structures can sometimes be more expensive than their ephemeral counterparts (in terms of constant factors and sometimes also asymptotic complexity), but that cost is often insignificant compared to their benefits:
- better integration with concurrent programming (naturally lock-free)
- simpler, more declarative implementations
- better semantics for equality, hashing, etc.
- access to all old versions (git for everything)
Next, we'll look at another persistent version of a common data structure: Red-Black trees. These lectures demonstrate that functional programming is adept at implementing sophisticated data structures. In particular, datatypes and pattern matching make the implementation of persistent tree-like data structures remarkably straightforward. These examples are drawn from Chris Okasaki's excellent book Purely Functional Data Structures.
However, we'll only scratch the surface. There are many industrial-strength persistent data structures out there.
- Finger trees/Ropes, see Data.Sequence
- Size balanced trees, see Data.Map
- Big-endian Patricia trees, see Data.IntMap
- Hash array mapped tries, used in the Clojure language
- and many more