Thursday, 6 August 2015

rPrINwtF


I've been playing around with type classes of late. My type checker and me have had some epic fights. We're not speaking now, it's late and I don't want to get into another shouting match.
Seriously, type classes and type inference don't mesh but they unlock some cool stuff. 

I'm going to wrangle a mangled attempt at printf. Hopefully we come out with something like: printf "Msg1: {}; Msg2: {}" "Hello" "World" == "Msg1: Hello Msg2: World"
We need to parse the format string and be able to move from hole to hole. For that we create a zipper class that will allow us to move from hole to hole in the string as well insert into a hole.

class Zip a where
  type Comp a
  next :: a -> a
  insert :: Comp a -> a -> a

instance Zip ([a], a, [a]) where
  type Comp ([a], a, [a]) = a
  next (l, a, []) = (l, a, [])
  next (l, a, r:rs) = (a:l, r, rs)
  insert x (l, a, r) = (l, x, a:r)
-- We insert before the cursor

instance Zip a => Zip (r -> a) where
  type Comp (r -> a) = Comp a
  -- We push the next function inside the arrow
  next f r = next (f r)
  -- We push the insert function inside the arrow
  insert lx ra r = insert lx (ra r)

We need a parser that will parse the string into a zipper where every time we go 'next' we move to the next hole. For every hole we add an extra empty string for the hole. This helps with holes present at the end of the string.

parser :: String -> ([String], String, [String])
parser s = case l of
      [] -> ([], "", [])
      (x:x':xs) -> ([x], x', xs)
      [x] -> ([x], "", [])
    where
      x = try (manyTill anyChar (string "{" *> manyTill anyChar (char '}'))) <|> many1 anyChar
      l = getRes (many x) [] s >>= (:[""])

As well as its inverse

s_ :: ([String], String, [String]) -> String
s_ (l, x, r) = concat $ reverse l ++ x:r
class Printf a where
  p_ :: String -> a
Base Case 1:
instance Printf ([String], String, [String]) where
    p_ = parser
Inductive Case:
instance (x ~ Comp a, Zip a, Printf a) => Printf (x -> a) where
  -- The first next is to move past the inserted value, the second is to move past the empty string denoting the hole, the final next is to move past the string.
  p_ s x = next . next . next . insert x . p_ $ s

That allows us to do something like this: s_ (p_ "{}, {}, ||| {}" "Hello" "World" "!!!")
p_ now takes an arbitrary number of parameters and attempts to insert them into the holes in the string to be formatted then s_ converts it into the formatted string. It's a bit of a muddled up version of rPrINwtf, but at least I spelt it right.

Tuesday, 4 August 2015

KMP 2: Haskell

In a previous post I wrote about the KMP algorithm and how it supposedly worked for Haskell. Further investigation reveals that the complexity is O(n^2). Through the next two posts I'll iteratively transform this O(n^2) version into the O(n) KMP algorithm by annotating the table we developed with various bits of information; I'll use this information to optimize the traversal of the table.

All the tender juicy bits reside in the table: the finite automata driving the cursor over the substring to be matched, so lets create that first.

Lets create a simple machine that is constructed from the needle and consumes the haystack; it progresses if the character matches and gives up when the first failed match.

Thus the state of the machine at any point of time could have one of three values: Finished with Failure, Finished Successfully, In Progress But Undecided. We can use a Maybe Any, for a 3 valued type (Maybe's instances will come in handy; we use Any for its Monoid instance)

type KMP a = Mealy a (Maybe Any)

simpleMachine :: Eq a => [a] -> KMP a
simpleMachine [] = pure (Just (Any True))
simpleMachine (a:as) = Mealy f
    where
      f a' | a == a' = (Just (Any False), simpleMachine as)
           | otherwise = (Nothing, pure Nothing)

KMP says, upon failure, we jump to the point with the next longest match. 
The machine above will start matching the haystack starting from the beginning of the haystack.
If we want to push back the point at which we start matching we need to delay the machine.

delay :: KMP a -> KMP a
delay k = Mealy $ \a -> (Just (Any False), k)

delay simpleMachine would start matching from the second character of the haystack.
delay (delay simpleMachine) would start matching from the third character of the haystack.

The list of machines starting from every possible point in any haystack.
ds :: [KMP a] 
ds = g : map delay ds

We need to run the machines in parallel and the state of the machine at any point should be the leftmost one (in the above list) that has not yet failed (because the leftmost one will have run the longest corresponding to the longest match).
The applicative instance of the Mealy machine allows us to run the machines in parallel. We gather up the internal states using an or.
table = foldl1 (liftA2 (<>)) ds

The problem with the machine above is, when the needle is not present in the haystack,  it does not terminate because we keep looking to the next simple machine in a list of infinite machines to check if it has had any success. So we need to limit the number of machines we run in parallel: i.e. the size of the haystack.

table sizeOfHaystack = foldl1 ((liftA2 . liftA2) (||)) (take sizeOfHaystack ds)

Thus the machine
kmpMachine :: forall a. Eq a => Int -> [a] -> KMP a
kmpMachine sizeOfHaystack needle = foldl1 (liftA2 (<>)) (take sizeOfHaystack ds)
    where
      ds = g : map delay ds
      g :: KMP a
      g = simpleMachine needle
can be driven with
driveMachine :: [a] -> Bool -> KMP a -> Bool
driveMachine [] b k = b
driveMachine (a:as) b k = uncurry (driveMachine as) . h . next k $ a
    where 
      h (x, y) = (getAny . maybe (Any False) id $ x, y)

solve :: Eq a => [a] -> [a] -> Bool
solve needle haystack = driveMachine haystack False (kmpMachine (length haystack) needle)

This implementation is the naive O(n^2) version: whenever we switch to a new machine we have to drive it all the way from the start.

In a subsequent post, we'll see how to transform this naive O(n^2) version into the O(n) version.
The way we optimize the algorithm is to optimize its pivotal point: the merging of two machines.