Tuesday 30 September 2014

Parsers in Haskell 2

Previously we defined our interface, so lets set about implementing it. We will choose to develop parsers for context independent grammars, that is grammars that do not depend on the surrounding tokens to define the productions.

Combining Parsers:


Defining concatenate, is simple,

conc :: (Semigroup b, Monad m) => Parser m a b -> Parser m a b -> Parser m a b
conc r r' = do
b <- r
b' <- r'
return (b <> b')

We extract the results, and combine them with some monodial sum (we do not need the identity element, which is what the semigroup class captures). However as can be seen, the structure of the parsers are completely independent of each other. They compute their results independently and then combine them together, which is my intuition for applicatives so lets replace that with the cleaner definition:

conc :: (Semigroup b, Monad m) =>Parser m a b -> Parser m a b -> Parser m a b
conc r r' = (<>) <$> r <*> r'

Alternate is trivial, with it's implementation already provided.

alter :: Monad m => Parser m a b -> Parser m a b -> Parser m a b
alter = mplus


Constructing parsers:

The match function requires us to receive input from the user. However since we are developing a non-deterministic parser we will have to jump about in the stream which is why generation of input must be a seperate concern from the querying of input.

State of the stream


Querying of the input must return the value we are at in the streams history, and input is generated when have no more history left to consume.  Thus our input will consist of a stream generator and our position in the stream. Zippers are excellent at keeping track of positions in the stream. Together these constitute the current state of the input.
The english naturally describes the type for the parser

type Zip' m a = m (Seq a, a, Seq a, Producer a m ())
type Parser m a b = StateT (Zip' m a) (LogicT m) b

The state transformer is already an instance of monad logic and handles the notion of backtracking on the state as well, +1 for code reuse. The types beautifully describe exactly what our parser does. It allows us to build back tracking computations with the ability to keep track of a position in a stream. We have absolutely no need for implementation to tell us that with the types acting as the abstraction.
Before we can begin using our state, lets build some utilities to manage it:

forward :: Monad m => Zip' m a -> Zip' m a
forward m = do
(s, a, s', p) <- m
case viewl s' of
EmptyL -> next p >>= \x -> case x of
Left _ -> error "Stream dried up"
Right (a', p'') -> return (a <| s, a', s', p'')
(a' :< s'') -> return (s |> a, a', s'', p)

previous :: Monad m => Zip' m a -> Zip' m a
previous m = do
(s, a, s', p) <- m
case viewl s of
EmptyL -> error "Reached limit of history"
(a' :< s'') -> return (s'', a', a <| s', p)

view :: Monad m => Zip' m a -> m a
view m = m >>= \(_, a, _, _) -> return a

match

The match function is then simply (where mzero represents failure)

match :: (Monad m, Eq a) => a -> b -> Parser m a b
match a b = do
s <- get
inp <- (lift . lift) (view s)
if inp == a then put (forward s) >> return b else mzero

(*) Operator:


However regular expressions derive their power from the * operator. * matches, 0 or more occurences of a regex,
The sequence of values that partially define * are
e, 
e `alternate` (e `concatenate` e), 
e `alternate` (e `concatenate` e) `alternate` (e `concatenate` e `concatenate` e)
e `alternate` (e `concatenate` e) `alternate` (e `concatenate` e `concatenate` e) ...
We can use our distributivity law to rewrite the terms of the sequence as:
e, 
e `alternate` (e `concatenate` e),
e `alternate` (e `concatenate` (e `alternate` (e `concatenate` e))), 
e `alternate` (e `concatenate` (e `alternate` (e `concatenate`(e `alternate` (e `concatenate` e)))

Thus s(n+1) = e `alternate` (e `concatenate` s(n)). We want * e to be the smallest fixed point of this sequence, thus  * e = fix $ (e `alternate`) . (e `concatenate`) ~ * e = e `alternate` (e `concatenate` * e)

Interface:

match :: (Monad m, Eq a) => a -> b -> Reg m a b
conc :: (Semigroup b, Monad m) => Reg m a b -> Reg m a b -> Reg m a b
alter :: Monad m => Reg m a b -> Reg m a b -> Reg m a b
* :: (Semigroup b, Monad m) => Reg m a b -> Reg m a b

And that's the interface we present to a parser writer. In the next post I'll use these to build a simple lexer. 

No comments:

Post a Comment