Sorted Lists

Edit the file SortedList.hs for this problem.

> {-# OPTIONS  -fwarn-tabs -fwarn-incomplete-patterns -fno-warn-type-defaults #-}
> {-# LANGUAGE ScopedTypeVariables #-}

In this small excursion, we're going to define an abstract type of sorted lists. A SortedList a is just an ordinary list of elements of type a, but ordered according to the ordering on a. In order to prevent users from constructing values of this type which violate this invariant, we're defining the type and its operations in a separate module (this one) and only exposing to other importing modules those functions which respect the invariants of the type.

Abstract types in Haskell

The identifiers after the module name below define exactly what is exported by this module.

> module SortedList ( SortedList,    -- the abstract type (and its instances)
>                     singleton,     -- other functions
>                     toNormalList,
>                     minimum,
>                     numDistinct,
>                     count
>                   ) where

> import Test.HUnit
> import Prelude hiding ( minimum, maxmimum )
> import qualified Data.List as List
> import Data.Coerce

In this module, below we will define the SortedList type (see below), but not export the data constructor (also called SortedList) for this type.

What that means is that within this module, we can use the data constructor to make any SortedList we want, even a bad one. But outside this module, we ensure that users can construct only sorted SortedLists by only providing functions that construct sorted lists. For example, because the one-element list is always sorted, we can safely expose the singleton function below.

It's also safe to extract the wrapped list from a SortedList because this doesn't affect the original list.

These two functions alone are not very useful, though. In order to do interesting things with sorted lists, we need to be able to combine them with each other to build sorted lists larger than one element. It would also be useful to define how to build a sorted list with zero elements.

Monoids

> import Data.Monoid

There is a particular structure to what we've just described above, and it's a structure that's ubiquitous in Haskell programming: the Monoid. The typeclass Monoid may be the first example of a typeclass you've seen which does not easily align to overloading interfaces defined in your (previous) favorite programming language. We're going to use the Monoid typeclass to define the remainder of our interface to sorted lists.

The Monoid typeclass defines two methods, both of which we need to fill in when making a type an instance of it.

mempty  :: Monoid a => a
mappend :: Monoid a => a -> a -> a

Furthermore, Data.Monoid defines the infix operator (<>) as a synonym of mappend -- Haskell programmers almost always use this version in their code.

The intended meaning of these two methods is that mappend combines two elements of the type in some way, and that mempty is some element of the type such that when mappended to anything, it does nothing. In other words, any type that has an associative binary operation (mappend) with an identity element (mempty) is a Monoid. Now, that's all very abstract! Let's look at some instances.

instance Monoid [a] where
  mempty  = []
  mappend = (++)

Lists are the canonical example of a Monoid -- they can be combined together with (++), and the empty list, when combined with any other list via (++), gives that other list as a result. This instance of Monoid is in the standard library -- so you don't actually need to use the specialized (++) for lists; you can use the more general (<>) almost whenever you please.

Furthermore, the test case below demonstrates that lists satisfy the required properties of monoids: the empty list is a left and right identity for append, and concatenation is an associative operation.

> testListMonoid :: Test
> testListMonoid =
>   let t1 = [1,2] in
>   let t2 = [3,4] in
>   let t3 = [1,2,3,4] in
>   TestList [ mempty <> t1     ~?= t1,              -- left identity
>              t1 <> mempty     ~?= t1,              -- right identity
>              (t1 <> t2) <> t3 ~?= t1 <> (t2 <> t3) -- associativity
>             ]

What else? Another example that may jump to mind is numbers. Any integer can be added to any other, with zero being the identity element. So you might expect that the standard library would have an instance like this:

instance Monoid Integer where
  mempty  = 0
  mappend = (+)

But it does not. After all, you could just as well realize that integers can be combined by multiplying them, with one being the identity element! In that case, we'd write an instance like this:

instance Monoid Integer where
  mempty  = 1
  mappend = (*)

Who's to say which monoidal interpretation of integers is "more right"?

In cases like this, we usually use a newtype to differentiate between which interpretation we want to use. That is, we can instead say:

newtype Sum     a = Sum     { getSum     :: a }
newtype Product a = Product { getProduct :: a }

instance Num a => Monoid (Sum a) where
  mempty = Sum 0
  x `mappend` y = Sum $ getSum x + getSum y

instance Num a => Monoid (Product a) where
  mempty = Product 1
  x `mappend` y = Product $ getProduct x * getProduct y

Notice that in the above, these Monoid instances require a Num instance for the types they're wrapping. Num, as you have seen in class, is a typeclass which provides overloading for numeric operations and literals -- so using it as a superclass of our Monoid instance allows us to be generic over what kind of numbers we're manipulating, rather than fixing a particular type of number.

For example, we can calculate the sum and product of a list of integers by coercing the elements to type Sum Int and Product Int respectively.

> reduce :: Monoid b => [b] -> b
> reduce = foldr mappend mempty

> ten :: Int
> ten = getSum (reduce (map Sum [1,2,3,4]))

> twentyFour :: Int
> twentyFour = getProduct (reduce (map Product [1,2,3,4]))

> ---------------------------------------------------------

Sorted lists

Like the above, we need to define our abstract type as a wrapper around ordinary lists. For this, we use Haskell's newtype keyword, which creates a new type much like data, but with the guarantee that our access to the wrapped type will be with zero runtime overhead. The toNormalList function is just the record selector from this type definition.

> newtype SortedList a = SortedList [a] deriving (Eq, Show)

> toNormalList :: SortedList a -> [a]
> toNormalList (SortedList as) = as

> singleton :: a -> SortedList a
> singleton a = SortedList [a]

Now, fill in the Monoid instance for SortedLists.

Hint: keep in mind the properties of sorted lists when writing this instance. This invariant lets you write faster code than you would otherwise be able to do.

> instance Ord a => Monoid (SortedList a) where
>   mempty = undefined
>   l1 `mappend` l2 = undefined

Make sure that your implementation only produces sorted lists, and also satisfies the properties of monoids!

> testSortedList :: Test
> testSortedList =
>   let t1 = SortedList [2,4] in
>   let t2 = SortedList [1,5] in
>   let t3 = SortedList [2,3] in
>   TestList [ t1 <> t3 ~?= SortedList [2,2,3,4],    -- <> preserves sorting
>              mempty <> t1 ~?= t1,                  -- left identity
>              t1 <> mempty ~?= t1,                  -- right identity
>              (t1 <> t2) <> t3 ~?= t1 <> (t2 <> t3) -- associativity
>            ]

> ---------------------------------------------------------

Some Other Operations

While merely the operations defined above are sufficient to define the analogues of most list functions for SortedLists also, implementing a replica of the list library only in terms of the above abstraction would necessarily come at a performance cost; it would necessitate conversion to and from the SortedList representation, which requires computational work.

On the other hand, if we were to implement these functions here, we could take advantage of the internal sorted-ness invariant of the list in order to make certain operations faster. Let's do that.

A first example: minimum.

> minimum :: SortedList a -> Maybe a
> minimum = undefined

> testMinimum :: Test
> testMinimum =
>   let t1 = SortedList [1,3,5] in
>   let t2 = SortedList ([] :: [Int]) in
>   let t3 = SortedList [1, error "kaboom!"] <> SortedList[2] in
>   TestList [ minimum t1 ~?= Just 1   -- the minimum of a non-empty sorted list
>            , minimum t2 ~?= Nothing  -- the minimum of an empty sorted list
>            , minimum t3 ~?= Just 1   -- minimum need not examine whole list
>            ]

In the above test cases, you will get an error if your implementation does not take advantage of the sorted-ness invariant to avoid extra computation.

Another operation which can be made more efficient for SortedLists is calculating the number of distinct values in the list.

> numDistinct :: Ord a => SortedList a -> Int
> numDistinct = undefined

> testNumDistinct :: Test
> testNumDistinct = TestList
>  [numDistinct (SortedList [1::Int,1,3,3,5]) ~?= 3,
>   numDistinct (SortedList ([]::[Int])) ~?= 0]

We can also count how many times every distinct value occurs in the list:

> count :: Eq a => SortedList a -> [(a, Integer)]
> count = undefined

Your implementation of count should result in another genuine, legal SortedList. Convince yourself that it does before moving on, keeping in mind the Ord instances for tuples are left-to-right lexicographic orderings, dependent on the underlying Ord instances of the tuple's elements.

> testCount :: Test
> testCount =
>   let xs = SortedList "abbcccdddd" in
>   count xs ~?= [('a', 1),('b',2),('c',3),('d',4)]

At this point, one important typeclass seems to have been left out in our interface to the SortedList type: Functor. It seems natural that we should be able to map a function over a SortedList, just like we can over an ordinary list. This doesn't work, though. Why?

> whyNoFunctor :: String
> whyNoFunctor = undefined

At this point, we have finished defining the internal implementation of SortedLists. Because all the operations we expose to the user of this module respect the sorted-ness property of SortedLists, we know that any value of this type must be sorted. So, once we go back to the file Main.lhs, we will be we are prevented from making "illegal" values of SortedLists.

CIS 552: Advanced Programming

Fall 2017

Sorted Lists

Abstract types in Haskell

Monoids

Sorted lists

Some Other Operations