Monday, 29 September 2014

Parsers in Haskell 1

Prelude

I found quite a few gems when trying to roll my own parser which I'll talk about over a series of posts. I think parsers illustrate in italic cursive a few of Haskell's strengths: the elegance and power of Haskell EDSLs, the guidance types provide in programming and code reuse.

Define the interface

First lets define what we want to build. A parser is a program that analyzes a string of tokens based on a formal grammar. Thus we want to expose a grammar to the user and something that consumes sentences in the grammar and spits out an answer.

We'll use regular expressions for that grammar, our interface will thus comprise:
concatenate :: Parser Token Result -> Parser Token Result -> Parser Token Result
alternate :: Parser Token Result -> Parser Token Result -> Parser Token Result
star :: Parser Token Result -> Parser Token Result
Concatenate should join two regular expressions end to end, such that it doesn't succeed until both succeed. While alternate is 'opposite' in some sense; it succeeds when either of its arguments succeed.

Let's formalize a bit

These look a fair bit like logical and and or, lets see if they really fit together by applying their laws on our parser.

p `concatenate` (p' `concatenate` p'') = (p `concatenate` p') `concatenate` p'' -- Associativity
p `alternate` (p' `alternate` p'') = (p `alternate` p') `alternate` p'' -- Associativity
p `alternate` p' = p' `alternate` p -- Commutativity

-- Distributivity
p `concatenate` (p' `alternate` p'') = (p `concatenate` p' `alternate` p `concatenate` p'')
p `alternate` (p' `concatenate` p'') = (p `alternate` p') `concatenate` (p `alternate` p'')
The laws quite snugly capture the essence of these operations.


Generalizing

I think parsers are exactly conjuncts and disjuncts. The above laws form the description for the MonadPlus interface with mplus as alternate and (>>=) as concatenate. So in order to build our parser we need not worry about which Monad to use as long as they are legitimate instances of the MonadPlus interface. We can thus build our parser with a high level of abstraction, in addition we have a huge library to power our parser and can switch out Monads as we see fit. 


What about the others?

In addition we will need a way to construct and deconstruct a parser:
match :: Token -> Result -> Parser Token Result
step :: (Token, Parser Token Result) -> (Result, Parser Token Result)
The match combinator should, if it matches a token output the result and step should extract a result and give the new parser. The parser thus seems extremely similar to a stream with step as a form of pattern matching on it and match as a constructor for the stream. In addition it also transforms an input stream into an output one.

Finally we need to build the star combinator that will match 0, 1 or many concatenations of a particular parser. This combinator can be constructed out of what we have so far.

Conclusion

So having defined the interface, applied some generalizations and made some observations the next goal is to build the actual parser interface.

Rumination

I think Haskell has an array of very composable building blocks. They are extremely powerful and general. In point of fact, the MonadPlus interface unifies the concept of conjunctions and disjunctions which crop up all over the place. 


No comments:

Post a Comment