Friday, 12 December 2014

Recursion Schemes: Excursion 2

Unfolding:

We're required to implement embed in order to use Unfoldable.
embed puts our base functor into the recursive type.


For a list for example we have:

embed :: Base [a] [a] -> [a]
embed (Cons a l) = a:l
embed Nil = []

And this allows us to infinitely unfold our base functor into the recursive type with an anamorphism.

ana :: Unfoldable t => (a -> Base t a) -> a -> t
ana f = m where m = embed . fmap m . f

The anamorphism is the inverse of the catamorphism. In a catamorphism, we project onto the base functor, then fold the values encased in the functor and then fold the outer. In an anamorphism we unfold once, then use the values inside the functor as seeds for the next unfolds, and then inject it into a recursive type.

The generalized anamorphism is a bit more fun:
gana :: (Unfoldable t, Monad m) => (forall b. m (Base t b) -> Base t (m b))
             -> (a -> Base t (m a)) -> a -> t
Constructing a value for this type follows similarly from the anamorphism except the monad now gives us a bit more trouble. We want something of the form Base t(Base t(Base t ....

We proceed using the same structure as before, first unfurling one layer to get Base t (m a).
Now we need to recursively unfold the inner layers. So we're going to need a function f that does that.
f will eventually need to inject the base functor into the recursive type, so we're going to need a value
Base t t. What the monad does is provide us with an additional little algebra over the 'a'. So our function that collapses the base functor must evaluate the monad and then fold that into the recursive type. So f must have type : Base t (m a) -> Base t t. In order to evaluate the monad we have been provided with a distributive law, that pushes the monad inside the functor.

m : Base t (m a) -> Base t t
m x = fmap (embed . m . fmap join . f . fmap g) x

We first unfurl the inner 'a' one more time to obtain a Base t (m (Base t (m a)). Now we distribute the monad across our base functor to obtain Base t(Base t( m (m a)). We can now evaluate the monad using join to get Base t(Base t(m a)). Now the inner functor is in the exact form we need to unfurl it further using m. Finally we can embed this fixed form of our base functor into our recursive type, and we are done.

gunfold' :: (Unfoldable t, Monad m, Functor m) =>
          (forall b. m (Base t b) -> Base t (m b))
          -> (a -> Base t (m a)) -> a -> t
gunfold' f g a = embed $ m (g a)
    where
      m x = fmap (embed . m . fmap join . f . fmap g) x


Recursion Schemes Excursions

I thought I'd document my excursion into recursion schemes.
For ever and always, recursive patterns have and will pop up, but I had to squash the combinators to fit. I thought I'd finally cut a swathe through the higher dimensional morph-ine like haze recursion schemes induces.

So, to be terse:
Recursion schemes maps our recursive type into its base functor using a type family 'Base'.
In order to use harness the power of its combinators we need to create instances of the classes Foldable and Unfoldable.
An instance of Foldable implies that we can 'project' our recursive type onto the base functor.
An instance of Unfoldable implies we can embed our base functor into the recursive type.
Instances of both mean that the recursive type and the base functor are isomorphic. The base-functor should always be <= the recursive type.

So onto the first of them:
A catamorphism is a fold, it builds up a tower of applications of f and then collapses them from the inside - a right fold. The projection allows us to talk in terms of the base functor which we know how to reach inside of and twiddle around in. The 'fmap m' folds the inner part and provides the folded version to our algebra which simply performs a single step of the fold.

cata :: Foldable t => (Base t a -> a) -> t -> a
cata f = m where m = f . fmap m . project


Next post, para-morphisms.

Wednesday, 10 December 2014

TopSort


Our story today begins with the rather unmotivated type

eval :: Functor f => f (f a -> a) -> f a

Lets construct a value of the type,
eval f = xs where xs = fmap ($ xs) f

Plugging the list functor in, we have:

eval :: [[a] -> a] -> [a]

It becomes a little clearer from looking at the function that what we're doing is creating a list where each value in the list is a function of the list itself. The base case would be the constant function.
Creating a value of the argument to eval we have something like:

arg = [\list -> list !! 1, const 3, \list -> list !! 0 + list !! 1 + list !! 5const 2const 5const 4]

If we eval arg , it provides arg as the argument to each index in arg . If the graph isn't acyclic we're going to <<loop>>; this is very reminiscent of a spreadsheet.

Thus eval provides a method of traversal of a directed acyclic graph when the functor is a list. Among the many algorithms related to graph traversal I'm going to pick topological sort.
So each element in the list has a list of dependencies.

type AdjList = [[[Int]] -> [Int]]

topSort :: AdjList -> [Int]
topSort a = map fst res
    where 
     indices = [0..]
     res = sortBy comparator $ zip indices (eval a)
      comparator x y = compare (length (snd x)) (length (snd y))

We evaluate the dependencies and tag them with their indices. Then we sort them by the number of dependencies. The algorithm works because for any two nodes u and v, if there is a directed path from u to v, the number of dependencies associated with u is strictly greater than that of v.

What remains is to construct sample adjacency lists for which we will need to construct a list of dependencies:

ref = flip (!!)

deps l = foldl f (const []) l
   where
      f g e = (++) <$> g <*> fmap (e:) (ref e)

adj1 = [deps [1, 2, 3], const [], deps [1, 3], const []]

Full code at https://github.com/joshcc3/HaskellExperiments/blob/master/interestingTypes/Loeb.hs




Monday, 10 November 2014

Iteration: functional style

for(int i = 0; i < a.length; i++){
a[i] = f(a[i], constant_info); // where f is a pure function
}
for(int i = 0; i < a.length; i++) {
accum = g(a[i], accum); // where g is a pure function
}
while(x >= 0 && s < a.length) {
ans *= h(a[x], accum);
x = a[x];
}

In the first example, we have a function purely in terms of the i'th element of a. It cannot affect anything else about a and it doesn't have any other information about a.

In the second example, we ease up and give ourselves access to an accumulated value that could possibly summarize the previous part of the array.

In the final one we let all hell loose and allow the value to determine where we go next. We're no longer accumulating in a fixed direction from left to right. We're allowing the value at our location to determine where we jump to next.

The above are three general ways of iteration. Now imagine if we had specific loop constructs that tackled each of the above cases. Aside from the clutter, this would actually make stuff more clear. Using these constructs would guarantee that any effect on our array was localized, or that an update to our accumulator is restricted to the correct values. I mean if all we wanted to do was simply modify the individual characters of a string (say by applying a shift cipher) having access to the entire array seems a bit overkill right. It's easy to imagine messing up and referencing some other part of the array in the update. So we could implement something like,

forIndividual (new Function <Dest_Type, Source_Type> ( Source_Type valueAtInd) {
public Dest_Type apply(Source_Type valueAtInd){
return some_jiggery_pokery(valueAtInd);
}
});
forAccumulate (new Function <Dest_Type, Tuple<Dest_Type, Source_Type>> (){
public Dest_Type apply(Source_Type valueAtInd, Dest_Type accum){
return accumulate_stuff(valueAtInd, accum);
}
});
forJumpAround (new Function <Tuple <Source_Type, Dest_Type>, Source_Type> (){
public Tuple<Dest_Type, Source_Type> apply(Source_Type stream){
return new Tuple(process_element_in_stream(stream), next_in_stream(stream));
}
});

That looks particularly ugly. It's because the idea of an update is not first class in Java, reifying it results in the above. However the idea behind these elaborate schemes is to protect the programmer from himself. 
Restrict the user by giving him the minimal amount of information necessary to perform the task
Separate the update from the manner of iteration.

Moving to a functional language where an update can be reified to a function, transcribing the above to Haskell we get:

map ~ forIndividual
map :: (a -> b) -> [a] -> [b]
foldl ~ forAccumulate
foldl :: (a -> b -> a) -> [a] -> b -> [b]
unfoldr ~ forJumpAround
unfoldr :: (a -> Maybe(b, a)) -> a -> [b]
We can abstract these even further, where we generalize the container we iterate over.

Friday, 7 November 2014

Why I like Haskell

I would say I'm comfortable with a few languages, Java, C++, C and Haskell. The reason I prefer Haskell to the others is for the following reasons.

To me, Haskell is a better software product. The interface offered by the language is much cleaner. For example, using some of the abstractions in C and C++ correctly requires an understanding of their implementation. For Haskell, the only properties we care about are those needed to define the abstraction (mostly)

The language constructs in imperative languages seem less powerful. For example control flow structures like if statements, loops and exceptions are hard-coded into the language, supported by syntax. In a functional language, we can create these control flow structures using functions and manipulate them as first class values.

The abstractions commonly used in functional programming are very powerful. Although they aren't a property of it, the Functor, Monoid, Applicative and Monad abstractions are extremely powerful and general. They provide a framework to manipulate data very rigorously. Using this framework ensures correctness for achieving the particular task. 

Speaking of abstractions, types are of course absolutely awesome. They provide the best abstraction yet. Some types perfectly describe their implementation counterpart and much more succinctly. In fact parametricity will guarantee certain theorems hold which is very nice to have and nigh impossible in real-world Java and C++ systems.


Lets say we wanted to implement a function that removes duplicates.

removeDups :: Ord a => [a] -> [a]
removeDups = snd . foldl step (S.empty, [])
where
step (t, acc) a = if S.member a t
then (t, acc)
else (S.insert a t, acc ++ [a])


public static List<A> removeDups(List<A> inp){
HashSet<A> s = new HashSet<A>();
List<A> ans = new List<A>();
for(A a : inp){
if(!s.contains(a)){
s.append(a);
s.put(a);
}
}
}

The difference over here is that using the model for iteration through the 'iterate' function we separate out the logic for a single step allowing us to observe and preserve our invariants with greater clarity. The invariant here is that after processing the first i characters, the set contains entries of all distinct characters upto that point and the list is a subsequence of the original having the first occurrence of all distinct characters upto the ith index.  We don't have to worry about the edge cases of the string. The logic for collapsing a structure by accumulating the values through a left-right traversal has already been implemented.


In the real world though interfacing with the low level components is essential for performant code, which breaks the nice abstractions. Thats not really reflected over here though.







Friday, 10 October 2014

Backtrack to simpler times

Prelude:

I recently re-encountered a problem, thinking about it in Haskell was just so much more succinct and beautiful that I had to code it up.

The Problem:

Given a code on a keypad, find all possible codes that correspond to transpositions of the original code where a transposition is a translation of all keys in a particular direction. The * and # are invalid characters.


We begin with a function that checks the translation of a single character. We assume the existence of a map and its inverse between locations and keys. Thus translation of a single character is valid if and only if both the original character is in the map and the final location is in the inverse map. We'll wrap our answer in a context of failure. We can have two ways to fail, either when trying to perform the initial lookup, or the second. Thus our value should be wrapped in two contexts of failures Maybe (Maybe Char). However we don't really care about distinguishing between the two and so we'll flatten them out into a Maybe Char. This treatment yields the following code.

check :: Char -> (Int, Int) -> Maybe Char
check c off = (+) <$> lookup c keyToLoc <*> return off >>= flip lookup locToKey


Next we have to reflect this behavior of a single character to a set of characters. We want to check that each of the individual characters succeeds, if one fails all should fail. Sounds similar to a conjunction right? This is (>>=) for MonadPlus  However since we know the structure of what we're binding in advance we can just use the applicative interface (<*>) to glue our results together. Over here, we're going to work on the results of check. However we're going to work on them while they're embedded within two contexts - a reader and maybe. Thats why we have to lift our the (:) and the [] two levels up.

inp' :: String -> (((->) (Int, Int)) (Maybe [Char]))
inp' = foldr ((liftA2 . liftA2) (:)) ((pure . pure) []) . fmap check

Now we need to generate the list of possible offsets. To create a single offset we use (,). Let xList represents all possible alternatives along the x component and yList represents them along the y component. (<*>) teaches our function (,) to operate on lists of alternatives. That is it reflects the behavior of (,) to a place where the operands are a list of possibilities. 

list :: Int -> [(Int, Int)]
list maxOff = (,) <$> [-maxOff .. maxOff] <*> [-maxOff .. maxOff]

Finally we need to combine the above. Mapping them over each other gives us a context of non-determinism around a context of failure. The characteristic we want to observe of this structure is the number of successes. So we obtain the characteristic of this structure by folding it using an accumulator.

sol :: String -> Int -> Int
sol inp = foldl (\a -> (a +) . success) 0 . fmap (inp' inp) . list