Sorted Lists
Edit the file SortedList.hs for this problem.
> {-# OPTIONS -fwarn-tabs -fwarn-incomplete-patterns -fno-warn-type-defaults #-}
> {-# LANGUAGE ScopedTypeVariables #-}
In this small excursion, we're going to define an abstract type of sorted lists. A SortedList a
is just an ordinary list of elements of type a
, but ordered according to the ordering on a
. In order to prevent users from constructing values of this type which violate this invariant, we're defining the type and its operations in a separate module (this one) and only exposing to other importing modules those functions which respect the invariants of the type.
Abstract types in Haskell
The identifiers after the module name below define exactly what is exported by this module.
> module SortedList ( SortedList, -- the abstract type (and its instances)
> singleton, -- other functions
> toNormalList,
> minimum,
> numDistinct,
> count
> ) where
> import Test.HUnit
> import Prelude hiding ( minimum, maxmimum )
> import qualified Data.List as List
> import Data.Coerce
In this module, below we will define the SortedList
type (see below), but not export the data constructor (also called SortedList
) for this type.
What that means is that within this module, we can use the data constructor to make any SortedList
we want, even a bad one. But outside this module, we ensure that users can construct only sorted SortedList
s by only providing functions that construct sorted lists. For example, because the one-element list is always sorted, we can safely expose the singleton
function below.
It's also safe to extract the wrapped list from a SortedList
because this doesn't affect the original list.
These two functions alone are not very useful, though. In order to do interesting things with sorted lists, we need to be able to combine them with each other to build sorted lists larger than one element. It would also be useful to define how to build a sorted list with zero elements.
Monoids
> import Data.Monoid
There is a particular structure to what we've just described above, and it's a structure that's ubiquitous in Haskell programming: the Monoid
. The typeclass Monoid
may be the first example of a typeclass you've seen which does not easily align to overloading interfaces defined in your (previous) favorite programming language. We're going to use the Monoid
typeclass to define the remainder of our interface to sorted lists.
The Monoid
typeclass defines two methods, both of which we need to fill in when making a type an instance of it.
mempty :: Monoid a => a
mappend :: Monoid a => a -> a -> a
Furthermore, Data.Monoid
defines the infix operator (<>)
as a synonym of mappend
-- Haskell programmers almost always use this version in their code.
The intended meaning of these two methods is that mappend
combines two elements of the type in some way, and that mempty
is some element of the type such that when mappend
ed to anything, it does nothing. In other words, any type that has an associative binary operation (mappend
) with an identity element (mempty
) is a Monoid
. Now, that's all very abstract! Let's look at some instances.
instance Monoid [a] where
mempty = []
mappend = (++)
Lists are the canonical example of a Monoid
-- they can be combined together with (++)
, and the empty list, when combined with any other list via (++)
, gives that other list as a result. This instance of Monoid
is in the standard library -- so you don't actually need to use the specialized (++)
for lists; you can use the more general (<>)
almost whenever you please.
Furthermore, the test case below demonstrates that lists satisfy the required properties of monoids: the empty list is a left and right identity for append, and concatenation is an associative operation.
> testListMonoid :: Test
> testListMonoid =
> let t1 = [1,2] in
> let t2 = [3,4] in
> let t3 = [1,2,3,4] in
> TestList [ mempty <> t1 ~?= t1, -- left identity
> t1 <> mempty ~?= t1, -- right identity
> (t1 <> t2) <> t3 ~?= t1 <> (t2 <> t3) -- associativity
> ]
What else? Another example that may jump to mind is numbers. Any integer can be added to any other, with zero being the identity element. So you might expect that the standard library would have an instance like this:
instance Monoid Integer where
mempty = 0
mappend = (+)
But it does not. After all, you could just as well realize that integers can be combined by multiplying them, with one being the identity element! In that case, we'd write an instance like this:
instance Monoid Integer where
mempty = 1
mappend = (*)
Who's to say which monoidal interpretation of integers is "more right"?
In cases like this, we usually use a newtype
to differentiate between which interpretation we want to use. That is, we can instead say:
newtype Sum a = Sum { getSum :: a }
newtype Product a = Product { getProduct :: a }
instance Num a => Monoid (Sum a) where
mempty = Sum 0
x `mappend` y = Sum $ getSum x + getSum y
instance Num a => Monoid (Product a) where
mempty = Product 1
x `mappend` y = Product $ getProduct x * getProduct y
Notice that in the above, these Monoid
instances require a Num
instance for the types they're wrapping. Num
, as you have seen in class, is a typeclass which provides overloading for numeric operations and literals -- so using it as a superclass of our Monoid
instance allows us to be generic over what kind of numbers we're manipulating, rather than fixing a particular type of number.
For example, we can calculate the sum and product of a list of integers by coercing the elements to type Sum Int
and Product Int
respectively.
> reduce :: Monoid b => [b] -> b
> reduce = foldr mappend mempty
> ten :: Int
> ten = getSum (reduce (map Sum [1,2,3,4]))
> twentyFour :: Int
> twentyFour = getProduct (reduce (map Product [1,2,3,4]))
> ---------------------------------------------------------
Sorted lists
Like the above, we need to define our abstract type as a wrapper around ordinary lists. For this, we use Haskell's newtype
keyword, which creates a new type much like data
, but with the guarantee that our access to the wrapped type will be with zero runtime overhead. The toNormalList
function is just the record selector from this type definition.
> newtype SortedList a = SortedList [a] deriving (Eq, Show)
> toNormalList :: SortedList a -> [a]
> toNormalList (SortedList as) = as
> singleton :: a -> SortedList a
> singleton a = SortedList [a]
Now, fill in the Monoid
instance for SortedList
s.
Hint: keep in mind the properties of sorted lists when writing this instance. This invariant lets you write faster code than you would otherwise be able to do.
> instance Ord a => Monoid (SortedList a) where
> mempty = undefined
> l1 `mappend` l2 = undefined
Make sure that your implementation only produces sorted lists, and also satisfies the properties of monoids!
> testSortedList :: Test
> testSortedList =
> let t1 = SortedList [2,4] in
> let t2 = SortedList [1,5] in
> let t3 = SortedList [2,3] in
> TestList [ t1 <> t3 ~?= SortedList [2,2,3,4], -- <> preserves sorting
> mempty <> t1 ~?= t1, -- left identity
> t1 <> mempty ~?= t1, -- right identity
> (t1 <> t2) <> t3 ~?= t1 <> (t2 <> t3) -- associativity
> ]
> ---------------------------------------------------------
Some Other Operations
While merely the operations defined above are sufficient to define the analogues of most list functions for SortedList
s also, implementing a replica of the list library only in terms of the above abstraction would necessarily come at a performance cost; it would necessitate conversion to and from the SortedList
representation, which requires computational work.
On the other hand, if we were to implement these functions here, we could take advantage of the internal sorted-ness invariant of the list in order to make certain operations faster. Let's do that.
A first example: minimum
.
> minimum :: SortedList a -> Maybe a
> minimum = undefined
> testMinimum :: Test
> testMinimum =
> let t1 = SortedList [1,3,5] in
> let t2 = SortedList ([] :: [Int]) in
> let t3 = SortedList [1, error "kaboom!"] <> SortedList[2] in
> TestList [ minimum t1 ~?= Just 1 -- the minimum of a non-empty sorted list
> , minimum t2 ~?= Nothing -- the minimum of an empty sorted list
> , minimum t3 ~?= Just 1 -- minimum need not examine whole list
> ]
In the above test cases, you will get an error if your implementation does not take advantage of the sorted-ness invariant to avoid extra computation.
Another operation which can be made more efficient for SortedList
s is calculating the number of distinct values in the list.
> numDistinct :: Ord a => SortedList a -> Int
> numDistinct = undefined
> testNumDistinct :: Test
> testNumDistinct = TestList
> [numDistinct (SortedList [1::Int,1,3,3,5]) ~?= 3,
> numDistinct (SortedList ([]::[Int])) ~?= 0]
We can also count how many times every distinct value occurs in the list:
> count :: Eq a => SortedList a -> [(a, Integer)]
> count = undefined
Your implementation of count
should result in another genuine, legal SortedList
. Convince yourself that it does before moving on, keeping in mind the Ord
instances for tuples are left-to-right lexicographic orderings, dependent on the underlying Ord
instances of the tuple's elements.
> testCount :: Test
> testCount =
> let xs = SortedList "abbcccdddd" in
> count xs ~?= [('a', 1),('b',2),('c',3),('d',4)]
At this point, one important typeclass seems to have been left out in our interface to the SortedList
type: Functor
. It seems natural that we should be able to map a function over a SortedList
, just like we can over an ordinary list. This doesn't work, though. Why?
> whyNoFunctor :: String
> whyNoFunctor = undefined
At this point, we have finished defining the internal implementation of SortedList
s. Because all the operations we expose to the user of this module respect the sorted-ness property of SortedList
s, we know that any value of this type must be sorted. So, once we go back to the file Main.lhs
, we will be we are prevented from making "illegal" values of SortedList
s.