Sorted Lists
Edit the file SortedList.hs for this problem.
In this small excursion, we're going to define an abstract type of sorted lists. A SortedList a
is just an ordinary list of elements of type a
, but ordered according to the ordering on a
. In order to prevent users from constructing values of this type which violate this invariant, we're defining the type and its operations in a separate module (this one) and only exposing to other importing modules those functions which respect the invariants of the type.
Abstract types in Haskell
The identifiers after the module name below define exactly what is exported by this module.
> module SortedList ( SortedList, -- the abstract type (and its instances)
> singleton, -- other functions
> toNormalList,
> minimum,
> numDistinct,
> count
> ) where
> import Test.HUnit
> import Prelude hiding ( minimum, maxmimum )
> import qualified Data.List as List
> import Data.Coerce
In this module, below we will define the SortedList
type (see below), but not export the data constructor (also called SortedList
) for this type.
What that means is that within this module, we can use the data constructor to make any SortedList
we want, even a bad one. But outside this module, we ensure that users can construct only sorted SortedList
s by only providing functions that construct sorted lists. For example, because the one-element list is always sorted, we can safely expose the singleton
function below.
It's also safe to extract the wrapped list from a SortedList
because this doesn't affect the original list.
These two functions alone are not very useful, though. In order to do interesting things with sorted lists, we need to be able to combine them with each other to build sorted lists larger than one element. It would also be useful to define how to build a sorted list with zero elements.
Semigroups and Monoids
There is a particular structure to what we've just described above, and it's a ubiquitous in Haskell programming: the Monoid
. The Semigroup
and Monoid
typeclasses may be the first examples of something you've seen which does not easily align to overloading interfaces defined in your (previous) favorite programming language. We're going to use these typeclasses to define the remainder of our interface to sorted lists.
The Semigroup
type class defines what it means to "combine elements together". In particular, it includes the following operator, which needs to be defined in when making a type an instance of this class.
This operation can be any binary operation on the type, with one important caveat: it must be ASSOCIATIVE. In otherwords, it shouldn't matter in your code if you type a <> (b <> c)
or (a <> b) <> c
. Either expression should produce the same result.
You've probably seen some examples of types with associative operations before. For example, list concatenation is associative, so the standard library includes the following instance that says that when we are using lists as semigroups, we should interpret (<>)
to mean (++)
.
Because this instance is in the standard library, you don't actually need to use the specialized (++)
for lists; you can use the more general (<>)
whenever you please (as long as it is scope, and Haskell can tell you are using lists).
The Monoid
typeclass extends Semigroup
class with a designated element, called mempty
. A type must first be declared to be an instance of the Semigroup
class before it can be declared to be a Monoid
.
The intended meaning is that mempty
is some element of the type such that when <>
ed to anything, it does nothing. In other words, for any type that has an associative binary operation (<>
) with an identity element (mempty
) is a Monoid
.
Now, that's all very abstract! Let's look at some instances. We can extend our list Semigroup
by adding an identity element for list concatenation.
In fact, lists are the canonical example of a Monoid
-- they can be combined together with (++)
, and the empty list, when combined with any other list via (++)
, gives that other list as a result.
Furthermore, the test case below demonstrates that lists satisfy the required properties of monoids: the empty list is a left and right identity for append, and concatenation is an associative operation.
> testListMonoid :: Test
> testListMonoid =
> let t1 = [1,2] in
> let t2 = [3,4] in
> let t3 = [1,2,3,4] in
> TestList [ mempty <> t1 ~?= t1, -- left identity
> t1 <> mempty ~?= t1, -- right identity
> (t1 <> t2) <> t3 ~?= t1 <> (t2 <> t3) -- associativity
> ]
What else? Another example that may jump to mind is numbers. Any integer can be added to any other, with zero being the identity element. So you might expect that the standard library would have an instance like this:
But it does not. After all, you could just as well realize that integers can be combined by multiplying them, with one being the identity element! In that case, we'd write an instance like this:
Who's to say which monoidal interpretation of integers is "more right"?
In cases like this, we usually use a newtype
to differentiate between which interpretation we want to use. That is, we can instead say:
newtype Sum a = Sum { getSum :: a }
newtype Product a = Product { getProduct :: a }
instance Num a => Semigroup (Sum a) where
x <> y = Sum $ getSum x + getSum y
instance Num a => Monoid (Sum a) where
mempty = Sum 0
instance Num a => Semigroup (Product a) where
x <> y = Product $ getProduct x * getProduct y
instance Num a => Monoid (Product a) where
mempty = Product 1
Notice that in the above, these instances require a Num
instance for the types they're wrapping. Num
, as you have seen in class, is a typeclass which provides overloading for numeric operations and literals -- so using it as a superclass of our instance allows us to be generic over what kind of numbers we're manipulating, rather than fixing a particular type of number.
For example, we can calculate the sum and product of a list of integers by coercing the elements to type Sum Int
and Product Int
respectively.
Sorted lists
Like the above, we need to define our abstract type as a wrapper around ordinary lists. For this, we use Haskell's newtype
keyword, which creates a new type much like data
, but with the guarantee that our access to the wrapped type will be with zero runtime overhead. The toNormalList
function is just the record selector from this type definition.
Now, fill in the Monoid
instance for SortedList
s.
Hint: keep in mind the properties of sorted lists when writing this instance. This invariant lets you write faster code than you would otherwise be able to do.
Make sure that your implementation only produces sorted lists, and also satisfies the properties of monoids!
> testSortedList :: Test
> testSortedList =
> let t1, t2, t3 :: SortedList Int
> t1 = SortedList [2,4]
> t2 = SortedList [1,5]
> t3 = SortedList [2,3] in
> TestList [ t1 <> t3 ~?= SortedList [2,2,3,4], -- <> preserves sorting
> mempty <> t1 ~?= t1, -- left identity
> t1 <> mempty ~?= t1, -- right identity
> (t1 <> t2) <> t3 ~?= t1 <> (t2 <> t3) -- associativity
> ]
Some Other Operations
While merely the operations defined above are sufficient to define the analogues of most list functions for SortedList
s also, implementing a replica of the list library only in terms of the above abstraction would necessarily come at a performance cost; it would necessitate conversion to and from the SortedList
representation, which requires computational work.
On the other hand, if we were to implement these functions here, we could take advantage of the internal sorted-ness invariant of the list in order to make certain operations faster. Let's do that.
A first example: minimum
.
> testMinimum :: Test
> testMinimum =
> let t1 = SortedList [1,3,5] in
> let t2 = SortedList ([] :: [Int]) in
> let t3 = SortedList [1, error "kaboom!"] <> SortedList[2] in
> TestList [ minimum t1 ~?= Just 1 -- the minimum of a non-empty sorted list
> , minimum t2 ~?= Nothing -- the minimum of an empty sorted list
> , minimum t3 ~?= Just 1 -- minimum need not examine whole list
> ]
In the above test cases, you will get an error if your implementation does not take advantage of the sorted-ness invariant to avoid extra computation.
Another operation which can be made more efficient for SortedList
s is calculating the number of distinct values in the list.
> testNumDistinct :: Test
> testNumDistinct = TestList
> [numDistinct (SortedList [1::Int,1,3,3,5]) ~?= 3,
> numDistinct (SortedList ([]::[Int])) ~?= 0]
We can also count how many times every distinct value occurs in the list:
Your implementation of count
should result in another genuine, legal SortedList
. Convince yourself that it does before moving on, keeping in mind the Ord
instances for tuples are left-to-right lexicographic orderings, dependent on the underlying Ord
instances of the tuple's elements.
> testCount :: Test
> testCount =
> let xs = SortedList "abbcccdddd" in
> count xs ~?= [('a', 1),('b',2),('c',3),('d',4)]
At this point, one important typeclass seems to have been left out in our interface to the SortedList
type: Functor
. It seems natural that we should be able to map a function over a SortedList
, just like we can over an ordinary list. This doesn't work, though. Why?
At this point, we have finished defining the internal implementation of SortedList
s. Because all the operations we expose to the user of this module respect the sorted-ness property of SortedList
s, we know that any value of this type must be sorted. So, once we go back to the file Main.lhs
, we will be we are prevented from making "illegal" values of SortedList
s.