Note: this is the stubbed version of module Parsers. Try to figure out how to fill in all parts of this file marked undefined. CIS 5520 students should be able to access this code through github. Eventually, the completed version will be available.

Parsing with Applicative Functors

> module Parsers where

> import Prelude hiding (filter)
> import Data.Char ( ord, isDigit, isAlpha )
> import Text.Read (readMaybe)
> import Control.Applicative
> import Control.Monad(guard)

What is a Parser?

A parser is a piece of software that takes a raw String (or sequence of bytes/characters) and returns some structured object -- for example, a list of options, an XML tree or JSON object, a program's Abstract Syntax Tree, and so on. Parsing is one of the most basic computational tasks.

For example, we use parsers in:

Shell Scripts (command-line options)
Web Browsers (duh!)
Games (level descriptors)
Routers (packets)
etc.

(Indeed I defy you to find any serious system that does not do some parsing somewhere!)

The simplest way to think of a parser is as a function -- i.e., its type should be roughly this:

type Parser = String -> StructuredObject

Composing Parsers

The usual way to build a parser is by specifying a grammar and using a parser generator (e.g., yacc, bison, antlr) to create the actual parsing function. Despite its advantages, one major limitation of the grammar-based approach is its lack of modularity. For example, suppose we have two kinds of primitive values, Thingy and Whatsit.

Thingy : ...rule...   { ...action... } ;

Whatsit : ...rule...  { ...action... } ;

If we want a parser for sequences of Thingy and Whatsit we have to painstakingly duplicate the rules:

Thingies : Thingy Thingies  { ... }
           EmptyThingy      { ... } ;

Whatsits : Whatsit Whatsits { ... }
           EmptyWhatsit     { ... } ;

That is, the languages in which parsers are usually described are lacking in features for modularity and reuse.

In this lecture, we will see how to compose mini-parsers for sub-values to get bigger parsers for complex values.

To do so, we will generalize the above parser type a little bit, by noting that a (sub-)parser need not (indeed, in general will not) consume all of its input, in which case we need to have the parser return the unconsumed part of its input:

type Parser = String -> (StructuredObject, String)

Of course, it would be silly to have different types for parsers for different kinds of structured objects, so we parameterize the Parser type over the type of structured object that it returns:

type Parser a = String -> (a, String)

One last generalization is to observe that not all strings are parseable. Therefore, we allow a parser to fail by wrapping the result in Maybe.

type Parser a = String -> Maybe (a, String)

As the last step, let's wrap this type definition up as a newtype and define a record accessor to let us conveniently extract the parser:

> newtype Parser a = P { doParse :: String -> Maybe (a, String) }

> -- >>> :t doParse
> -- doParse :: Parser a -> String -> Maybe (a, String)

This type definition will make sure that we keep parsers distinct from other values of this type and, more importantly, will allow us to make parsers an instance of one or more typeclasses, if this turns out to be convenient (spoiler alert, it will!).

Below, we will define a number of operators on the Parser type, which will allow us to build up descriptions of parsers compositionally. The actual parsing happens when we use a parser by applying it to an input string, using doParse.

Now, the parser type might remind you of something else... Remember this?

newtype State s a = S { runState :: s -> (a, s) }

Indeed, a Parser, like a state transformer, is a monad! There are good definitions of the return and (>>=) functions.

However, most of the time, we don't need the full monadic structure for parsing. Just deriving the applicative operators for this type will allow us to parse any context-free grammar. So in the material below, keep your eye out for applicative structure for this type.

Now all we have to do is build some parsers!

We'll start with some primitive definitions, and then generalize.

Parsing a Single Character

Here's a very simple character parser that returns the first Char from a (nonempty) string. Recall the parser type:

newtype Parser a = P { doParse :: String -> Maybe (a, String) }

So we need a function that pattern matches its argument, and pulls out the first character of the string, if there is one. There is at most one unique character at the beginning of the String, so in the successful case we return a single result of that character and the rest of the (unparsed) string.

> get :: Parser Char
> get = P $ \s -> case s of
>                   (c : cs) -> Just (c, cs) 
>                   []       -> Nothing

Try it out!

> -- >>> doParse get "hey!"

> -- >>> doParse get ""

See if you can modify the above to produce a parser that looks at the first char of a (nonempty) string and interprets it as an int in the range 0-9. (Hint: remember the readMaybe function.)

> oneDigit :: Parser Int
> oneDigit = undefined

> -- >>> doParse oneDigit "1"
> -- Just (1,"")

> -- >>> doParse oneDigit "12"
> -- Just (1,"2")

> -- >>> doParse oneDigit "hey!"
> -- Nothing

And here's a parser that looks at the first char of a string and interprets it as the unary negation operator, if it is '-', and an identity function if it is '+'.

> oneOp :: Parser (Int -> Int)
> oneOp = P $ \s -> case s of
>                     ('-' : cs) -> Just (negate, cs)
>                     ('+' : cs) -> Just (id, cs) 
>                     _          -> Nothing

Can we generalize this pattern? What if we pass in a function that specifies whether the character is of interest? The satisfy function constructs a parser that succeeds if the first character satisfies the predicate.

> satisfy :: (Char -> Bool) -> Parser Char
> satisfy f = undefined

> -- >>>  doParse (satisfy isAlpha) "a"
> -- Just ('a',"")

> -- >>> doParse (satisfy isUpper) "a"
> -- Nothing

> --    SPOILER SPACE
> --     |
> --     |
> --     |
> --     |
> --     |
> --     |
> --     |
> --     |
> --     |
> --     |
> --     |
> --     |
> --     |
> --     |
> --     |
> --     |
> --     |
> --     |

Here's how I implemented satisfy, taking advantage of the Maybe monad. (The do notation below is syntactic sugar for the Maybe monad's bind operation.)

> satisfy' :: (Char -> Bool) -> Parser Char
> satisfy' f = P $ \s -> do 
>                          (c , cs) <- doParse get s
>                          guard (f c)
>                          return (c , cs)

With this implementation, we can see that we can generalize again, so that it works for any parser, not just get...

> filter :: (a -> Bool) -> Parser a -> Parser a
> filter f p = P $ \s ->  do 
>                          (c , cs) <- doParse p s
>                          guard (f c)
>                          return (c , cs)

Parsing nothing!

Now let's write a parser that only succeeds if we have reached the end of the input. If there are no characters in the input, then it returns a successful parse of a unit value and the remaining string (still nil). Otherwise, if there are any characters at all, this parser fails.

> eof :: Parser ()
> eof = P $ \s -> case s of
>                  []  -> Just ((), [])
>                  _:_ -> Nothing

Parser is a Functor

The name filter is directly inspired by the filter function for lists. And indeed, just like we can think of [a] as a way to get values of type a, we can likewise think of Parser a as a way to potentially get a value of type a.

So, are there other list-like operations that our parsers should support?

Of course! Like lists, the type constructor Parser is a functor.

> instance Functor Parser where
>     fmap :: (a -> b) -> Parser a -> Parser b
>     fmap = undefined

With get, satisfy, filter, and fmap, we now have a small library to build new (single character) parsers.

For example, we can write some simple parsers for particular sorts of characters. The following definitions parse alphabetic and numeric characters respectively.

> alphaChar, digitChar :: Parser Char
> alphaChar = satisfy isAlpha
> digitChar = satisfy isDigit

> -- >>> doParse alphaChar "123"
> -- Nothing

> -- >>> doParse digitChar "123"
> -- Just ('1',"23")

Similarly, finish this parser that should parse just one specific Char:

> char :: Char -> Parser Char
> char c = undefined

> -- >>> doParse (char 'a') "ab"
> -- Just ('a',"b")

> -- >>> doParse (char 'a') "ba"
> -- Nothing

And now let's use fmap to rewrite oneDigit:

> oneDigit' :: Parser Int
> oneDigit' = cvt <$> digitChar where    -- <$> is fmap!
>   cvt :: Char -> Int
>   cvt c = ord c - ord '0'

> -- >>> doParse oneDigit' "92"
> -- Just (9,"2")

> -- >>> doParse oneDigit' "cat"
> -- Nothing

Parser Composition

What if we want to parse more than one character from the input?

Using get we can write a composite parser that returns a pair of the first two Char values from the front of the input string. Again, we'll use do notation with the Maybe monad.

> twoChar0 :: Parser (Char, Char)
> twoChar0 = P $ \s -> do (c1, cs)  <- doParse get s
>                         (c2, cs') <- doParse get cs
>                         return ((c1,c2), cs')

> -- >>> doParse twoChar0 "ab"
> -- Just (('a','b'),"")

More generally, we can write a parser combinator that takes two parsers and returns a new parser that uses first one and then the other and returns the pair of resulting values...

> pairP0 ::  Parser a -> Parser b -> Parser (a,b)
> pairP0 = undefined

and use that to rewrite twoChar more elegantly like this:

> twoChar1 :: Parser (Char, Char)
> twoChar1 = pairP0 get get

> -- >>> doParse twoChar1 "hey!"

> -- >>> doParse twoChar1 ""

> -- >>> doParse (pairP0 oneDigit get) "1a"

> -- >>> doParse (pairP0 oneDigit get) "a1"

Parser is an Applicative Functor

Suppose we want to parse two characters, where the first should be a sign (i.e. '+' or '-') and the second a digit?

We've already defined single character parsers that should help. We just need to put them together.

oneOp    :: Parser (Int -> Int)
oneDigit :: Parser Int

And we put them together in a way that looks a bit like fmap above. However, instead of passing in the function as a parameter, we get it via parsing.

> signedDigit0 :: Parser Int
> signedDigit0 = P $ \ s -> do (f, cs)  <- doParse oneOp s
>                              (x, cs') <- doParse oneDigit cs
>                              return (f x, cs')

> -- >>> doParse signedDigit0 "-1"
> -- Just (-1,"")

> -- >>> doParse signedDigit0 "+3"
> -- Just (3,"")

Can we generalize this pattern? What is the type when oneOp and oneDigit are arguments to the combinator?

> apP :: Parser (t -> a) -> Parser t -> Parser a
> apP p1 p2 = P $ \ s -> do (f, s') <- doParse p1 s
>                           (x,s'') <- doParse p2 s'
>                           return (f x, s'')

Does this type look familiar?

Whoa! That is the type of the (<*>) operator from the Applicative class. What does this combinator do? It grabs a function value out of the first parser (if one exists) and then grab the argument (using the remaining part of the string) from the second parser, and then returns the application.

What about pure?

The definition of pure is very simple -- we can let the types guide us. This parser always succeeds and produces a specific character without consuming any of the input string.

> pureP :: a -> Parser a
> pureP x = P $ \s -> Just (x,s)

So we can put these two definitions together in our class instance.

> instance Applicative Parser where
>   pure :: a -> Parser a
>   pure   = pureP
>   (<*>) :: Parser (a -> b) -> Parser a -> Parser b
>   (<*>)  = apP

Let's go back and reimplement our examples with the applicative combinators:

> twoChar :: Parser (Char, Char)
> twoChar = pure (,) <*> get <*> get

> signedDigit :: Parser Int
> signedDigit = oneOp <*> oneDigit

> -- >>> doParse twoChar "hey!"

> -- >>> doParse twoChar ""

> -- >>> doParse signedDigit "-1"

> -- >>> doParse signedDigit "+3"

Now we're picking up speed. First, we can use our combinators to rewrite our more general pairing parser (pairP) like this:

> pairP :: Parser a -> Parser b -> Parser (a,b)
> pairP p1 p2 = pure (,) <*> p1 <*> p2

Or, more idiomatically, we can replace pure f <*> with f <$>. (The hlint tool will suggest this rewrite to you.)

> pairP' :: Parser a -> Parser b -> Parser (a,b)
> pairP' p1 p2 = (,) <$> p1 <*> p2

We can even dip into the Control.Applicative library and write pairP even more succinctly using this liftA2 combinator:

liftA2 :: (a -> b -> c) -> Parser a -> Parser b -> Parser c
liftA2 f p1 p2 = pure f <*> p1 <*> p2

> pairP'' :: Parser a -> Parser b -> Parser (a,b)
> pairP'' = liftA2 (,)

And, Control.Applicative gives us even more options for constructing parsers. For example, it also includes a definition of liftA3.

> tripleP :: Parser a -> Parser b -> Parser c -> Parser (a,b,c)
> tripleP = liftA3 (,,)

The *> and <* operators are also defined in Control.Applicative. The first is the Applicative analogue of the (>>) operator for Monads.

-- sequence actions, discarding the value of the first action
(*>) :: Applicative f => f a -> f b -> f b

The second is the dual to the first---it keeps the first result but discards the second.

-- sequence actions, discarding the value of the second action
(<*) :: f a -> f b -> f a

Here's an example of a parser that uses both operators. When we parse something surrounded by parentheses, don't want to keep either the opening or closing characters.

> -- | Parse something surrounded by parentheses
> parenP :: Parser a -> Parser a
> parenP p = char '(' *> p <* char ')'

> -- >>> doParse (parenP get) "(1)"

Monadic Parsing

Although we aren't going to emphasize it in this module, the Parser type is also a Monad. Just like State and list, we can make Parser an instance of the Monad type class. To make sure that you get practice with the applicative operators, such as <*>, we won't do that here. However, for practice, see if you can figure out an appropriate definition of (>>=).

> bindP :: Parser a -> (a -> Parser b) -> Parser b
> bindP = undefined

> twoChar' :: Parser (Char, Char)
> twoChar' = bindP get $ \c1 -> 
>            bindP get $ \c2 ->
>            pure (c1,c2)

> -- >>> doParse twoChar' "hey!"
> -- Just (('h','e'),"y!")

Recursive Parsing

However, to parse more interesting things, we need to add some kind of recursion to our combinators. For example, it's all very well to parse individual characters (as in char above), but it would a lot more fun if we could recognize whole Strings.

Let's try to write it!

> string :: String -> Parser String
> string ""     = pure ""
> string (x:xs) = (:) <$> char x <*> string xs

Much better!

> -- >>> doParse (string "mic") "mickeyMouse"

> -- >>> doParse (string "mic") "donald duck"

For fun, try to write string using foldr for the list recursion.

> string' :: String -> Parser String
> string' = foldr undefined undefined

Furthermore, we can use natural number recursion to write a parser that grabs n characters from the input:

> grabn :: Int -> Parser String
> grabn n = if n <= 0 then pure "" else (:) <$> get <*> grabn (n-1)

> -- >>> doParse (grabn 3) "mickeyMouse"
> -- Just ("mic","keyMouse")

> -- >>> doParse (grabn 3) "mi"
> -- Nothing

Choice

The Applicative operators give us sequential composition of parsers (i.e. run one parser then another). But what about parallel composition (i.e. run both parsers on the same input)?

Let's write a combinator that takes two sub-parsers and chooses between them.

> chooseFirstP :: Parser a -> Parser a -> Parser a
> p1 `chooseFirstP` p2 = P $ \s -> doParse p1 s `firstJust` doParse p2 s

How to write it? Well, we want to return a successful parse if either parser succeeds. The order of the subparsers matters here --- we want to try the second parser only if the first parser fails. So we need to be careful about how we compose the results together. Due to laziness, we will only try out the second parser in the case that the first parser fails.

> firstJust :: Maybe a -> Maybe a -> Maybe a
> firstJust (Just x) _ = Just x     
> firstJust Nothing  y = y

In the definition of chooseFirstP, note how we duplicate the input string s and give the same string to both parsers. This code naturally implements backtracking. If the first parser fails, we go back to the state of the input where it started and then continue with the second parser.

Example: We can use the above combinator to build a parser that returns either an alphabetic or a numeric character

> alphaNumChar :: Parser Char
> alphaNumChar = alphaChar `chooseFirstP` digitChar

> -- >>> doParse alphaNumChar "cat"

> -- >>> doParse alphaNumChar "2cat"

Parsing multiple inputs

Let's write a combinator that takes a parser p that returns an a and constructs a parser that recognizes a sequence of strings (each recognized by p) and returns a list of a values. That is, it keeps grabbing a values as long as it can and returns them in a list of type [a].

We can do this by writing a parser that either parses one thing and then calls itself recursively (if possible) or succeeds without consuming any input. In either case, the result is a list.

> manyP :: Parser a -> Parser [a]
> manyP p = ((:) <$> p <*> manyP p) `chooseFirstP` pure []

> -- >>> doParse (manyP oneDigit) "12345a"

> -- >>> doParse (manyP alphaChar) "12345a"

Look out! What happens if we swap the order of the arguments to chooseFirstP?

> manyP' :: Parser a -> Parser [a]
> manyP' p = pure [] `chooseFirstP` ((:) <$> p <*> manyP p)

We don't want to do this --- the pure [] parser always succeeds, so the result will always be [].

> -- >>> doParse (manyP' oneDigit) "12345a"

Alternative

We can use choice and failure together to make the Parser type an instance of the Alternative type class from Control.Applicative.

The Alternative type class has two methods:

class Applicative f => Alternative f where
  empty :: f a
  (<|>) :: f a -> f a -> f a

where empty is an applicative computation with zero results, and (<|>), a "choice" operator that combines two computations. The Alternative type class laws require the choice operator to be associative but it need not be commutative (and it isn't here).

The empty computation should be an identity for the choice operator. In other words we should have

empty <|> a   === a

and

a  <|> empty  === a

For parsers, this means that we need to have a failure parser that never parses anything (i.e. one that always returns Nothing):

> failP :: Parser a
> failP = P $ const Nothing

Putting these two definitions together gives us the Alternative instance.

> instance Alternative Parser where
>   empty = failP            -- always fail
>   (<|>) = chooseFirstP     -- try the left parser, if that fails then try the right

The Alternative type class automatically gives definitions for functions many and some, defined in terms of (<|>).

The many operation corresponds to running the applicative computation zero or more times, whereas some runs the computation one or more times. Both return their results in a list.

many :: Alternative f => f a -> f [a]
many v = some v <|> pure []

some :: Alternative f => f a -> f [a]   --- result list is guaranteed to be nonempty
some v = (:) <$> v <*> many v

For parsing, the many combinator returns a single, maximal sequence produced by iterating the given parser, zero or more times

> -- >>> doParse (many digitChar) "12345a"

> -- >>> doParse (many digitChar) ""

> -- >>> doParse (some digitChar) "12345a"

> -- >>> doParse (some digitChar) ""

This sequence is maximal because the definition of many tries some v before returning Nothing. If the definition had been the other way around, then the result would always be the empty list (because pure [] always succeeds).

Let's use some to write a parser that will return an entire natural number (not just a single digit.)

> oneNat :: Parser Int
> oneNat = fmap read (some digitChar)   -- know that read will succeed because input is all digits

> -- >>> doParse oneNat "12345a"

> -- >>> doParse oneNat ""

Challenge (will not be on the quiz): use the Alternative operators to implement a parser that parses zero or more occurrences of p, separated by sep.

> sepBy :: Parser a -> Parser b -> Parser [a]
> sepBy p sep = undefined

> -- >>> doParse (sepBy oneNat (char ',')) "1,12,0,3"
> -- Just ([1,12,0,3],"")

> -- >>> doParse (sepBy oneNat (char ',')) "1"
> -- Just ([1],"")

> -- >>> doParse (sepBy oneNat (char ',')) "1,12,0,"
> -- Just ([1,12,0],",")

> -- >>> doParse (sepBy oneNat (char '8')) "888"
> -- Just ([888],"")

> -- >>> doParse (sepBy (char '8') (char '8')) "888"
> -- Just ("88","")

> -- >>> doParse (sepBy oneNat (char ',')) ""
> -- Just ([],"")

Parsing Arithmetic Expressions

Now let's use the above to build a small calculator that parses and evaluates arithmetic expressions. In essence, an expression is either a binary operand applied to two sub-expressions or else an integer.

First, we parse arithmetic operations as follows:

> intOp :: Parser (Int -> Int -> Int)
> intOp = plus <|> minus <|> times <|> divide
>   where plus   = char '+' *> pure (+)
>         minus  = char '-' *> pure (-)
>         times  = char '*' *> pure (*) 
>         divide = char '/' *> pure div

Note how this parser returns a binary function if it succeeds. Then we parse simple expressions by parsing a digit followed by an operator and another calculation, or by parsing a single digit alone.

> infixAp :: Applicative f => f a -> f (a -> b -> c) -> f b -> f c
> infixAp = liftA3 (\i1 o i2 -> i1 `o` i2)

> calc1 ::  Parser Int
> calc1 = infixAp oneNat intOp calc1 <|> oneNat

This works pretty well...

> -- >>> doParse calc1 "1+2+33"

> -- >>> doParse calc1 "11+22-33"

But things get a bit strange with minus:

> -- >>> doParse calc1 "11+22-33+45"

Huh? Well, if you look back at the code, you'll realize the above was parsed as

11 + (22 - (33 + 45))

because in each binary expression we require the left operand to be an integer. In other words, we are assuming that each operator is right associative hence the above result. Making this parser left associative is harder than it looks — we can't just swap oneNat and 'calc1', as below.

> calcBad ::  Parser Int
> calcBad = infixAp calc1 intOp oneNat <|> oneNat

If you try this parser out, you'll see that it hangs on all inputs.

Furthermore, things also get a bit strange with multiplication:

> -- >>> doParse calc1 "10*2+100"

This string is parsed as:

10 * (2 + 100)

But the rules of precedence state that multiplication should bind tighter that addition. Our calc1 doesn't do anything different between multiplication and addition operators. So we have two problems to solve: precendence and associativity.

Precedence

We can introduce precedence into our parsing by stratifying the parser into different levels. Here, let's split our binary operations into addition-like and multiplication-like ones.

> addOp :: Parser (Int -> Int -> Int)
> addOp = char '+' *> pure (+) <|> char '-' *> pure (-)

> mulOp :: Parser (Int -> Int -> Int)
> mulOp = char '*' *> pure (*) <|> char '/' *> pure div

Now, we can stratify our language into mutually recursive sub-languages, where each top-level expression is parsed first as an addition expression (addE) starting with a multiplication expressions (mulE). Multiplication expressions must then start with a basic factors: either natural numbers or arbitrary expressions inside parentheses.

> calc2 :: Parser Int
> calc2 = addE

> addE :: Parser Int
> addE = infixAp mulE addOp addE <|> mulE

> mulE :: Parser Int
> mulE = infixAp factorE mulOp mulE <|> factorE

> factorE :: Parser Int
> factorE = oneNat <|> parenP calc2

Now our parser is still right associative, but multiplication binds tighter than addition.

> -- >>> doParse calc2 "1+10*2+100"

> -- >>> doParse calc2 "1+10*(2+100)"

Do you understand why the first parse returned 121?

Parsing Pattern: Associativity via Chaining

But we're still not done: we need to fix the associativity problem.

> -- >>> doParse calc2 "10-1-1"

Ugh! I hope you understand why: it's because the above was parsed as 10 - (1 - 1) (right associative) and not (10 - 1) - 1 (left associative). You might be tempted to fix that simply by flipping the order in infixAp, thus

addE = infixAp addE addOp mulE <|> mulE

but this would be disastrous. Can you see why? The parser for addE directly (recursively) calls itself without consuming any input! Thus, it goes off the deep end and never comes back.

Let's take a closer look at what is going on with our current definitions. In essence, an addE is of the form:

mulE + ( mulE + ( mulE + ... mulE ))

That is, we keep chaining together mulE values and adding them for as long as we can. Similarly a mulE is of the form

factorE * ( factorE * ( factorE * ... factorE ))

where we keep chaining factorE values and multiplying them for as long as we can.

Instead, we want to parse the input as starting with a multiplication expression followed by any number of addition operators and multiplication expressions. We can temporarily store the operators and expressions in a list of pairs. Then, we'll foldl over this list, using each operator to combine the current result with the next number.

> type IntOp = Int -> Int -> Int

> addE1 :: Parser Int
> addE1 = process <$> first <*> rest where

>            -- start with a multiplication expression
>            first :: Parser Int
>            first = mulE1

>            -- parse any number of `addOp`s followed
>            -- by a multiplication expression
>            -- return the result in a list of tuples
>            rest :: Parser [(IntOp, Int)]
>            rest = many ((,) <$> addOp <*> mulE1)

>            -- process the list of tuples with a left fold
>            process :: Int -> [(IntOp, Int)] -> Int
>            process = foldl comb

>            -- combine each operator and argument with
>            -- the current value of the parser
>            comb :: Int -> (IntOp, Int) -> Int
>            comb x (op,y) = x `op` y

> mulE1 :: Parser Int
> mulE1 = foldl comb <$> factorE1 <*> rest where
>            comb x (op,y) = x `op` y
>            rest = many ((,) <$> mulOp <*> factorE1)

> factorE1 :: Parser Int
> factorE1 = oneNat <|> parenP addE1

The above is indeed left associative:

> -- >>> doParse addE1 "10-1-1"

Also, it is very easy to spot and bottle the chaining computation pattern: the only differences are the base parser (mulE1 vs factorE1) and the binary operation (addOp vs mulOp). We simply make those parameters to our chain-left combinator:

> -- chainl1 :: Parser Int -> Parser IntOp -> Parser Int
> p `chainl1` pop = foldl comb <$> p <*> rest where
>            comb x (op,y) = x `op` y
>            rest = many ((,) <$> pop <*> p)

after which we can rewrite the grammar in three lines:

> addE2, mulE2, factorE2 :: Parser Int
> addE2    = mulE2    `chainl1` addOp
> mulE2    = factorE2 `chainl1` mulOp
> factorE2 = parenP addE2 <|> oneNat

> -- >>> doParse addE2 "10-1-1"

> -- >>> doParse addE2 "10*2+1"

> -- >>> doParse addE2 "10+2*1"

Of course, we can generalize chainl1 even further so that it is not specialized to parsing Int expressions. Try to update the type above so that it is more polymorphic.

This concludes our exploration of applicative parsing, but what we've covered is really just the tip of an iceberg. Though parsing is a very old problem, studied since the dawn of computing, algebraic structures in Haskell bring a fresh perspective that has now been transferred from Haskell to many other languages.

CIS 5520: Advanced Programming

Fall 2024