Monday, 10 November 2014

Iteration: functional style

for(int i = 0; i < a.length; i++){
a[i] = f(a[i], constant_info); // where f is a pure function
for(int i = 0; i < a.length; i++) {
accum = g(a[i], accum); // where g is a pure function
while(x >= 0 && s < a.length) {
ans *= h(a[x], accum);
x = a[x];

In the first example, we have a function purely in terms of the i'th element of a. It cannot affect anything else about a and it doesn't have any other information about a.

In the second example, we ease up and give ourselves access to an accumulated value that could possibly summarize the previous part of the array.

In the final one we let all hell loose and allow the value to determine where we go next. We're no longer accumulating in a fixed direction from left to right. We're allowing the value at our location to determine where we jump to next.

The above are three general ways of iteration. Now imagine if we had specific loop constructs that tackled each of the above cases. Aside from the clutter, this would actually make stuff more clear. Using these constructs would guarantee that any effect on our array was localized, or that an update to our accumulator is restricted to the correct values. I mean if all we wanted to do was simply modify the individual characters of a string (say by applying a shift cipher) having access to the entire array seems a bit overkill right. It's easy to imagine messing up and referencing some other part of the array in the update. So we could implement something like,

forIndividual (new Function <Dest_Type, Source_Type> ( Source_Type valueAtInd) {
public Dest_Type apply(Source_Type valueAtInd){
return some_jiggery_pokery(valueAtInd);
forAccumulate (new Function <Dest_Type, Tuple<Dest_Type, Source_Type>> (){
public Dest_Type apply(Source_Type valueAtInd, Dest_Type accum){
return accumulate_stuff(valueAtInd, accum);
forJumpAround (new Function <Tuple <Source_Type, Dest_Type>, Source_Type> (){
public Tuple<Dest_Type, Source_Type> apply(Source_Type stream){
return new Tuple(process_element_in_stream(stream), next_in_stream(stream));

That looks particularly ugly. It's because the idea of an update is not first class in Java, reifying it results in the above. However the idea behind these elaborate schemes is to protect the programmer from himself. 
Restrict the user by giving him the minimal amount of information necessary to perform the task
Separate the update from the manner of iteration.

Moving to a functional language where an update can be reified to a function, transcribing the above to Haskell we get:

map ~ forIndividual
map :: (a -> b) -> [a] -> [b]
foldl ~ forAccumulate
foldl :: (a -> b -> a) -> [a] -> b -> [b]
unfoldr ~ forJumpAround
unfoldr :: (a -> Maybe(b, a)) -> a -> [b]
We can abstract these even further, where we generalize the container we iterate over.

Friday, 7 November 2014

Why I like Haskell

I would say I'm comfortable with a few languages, Java, C++, C and Haskell. The reason I prefer Haskell to the others is for the following reasons.

To me, Haskell is a better software product. The interface offered by the language is much cleaner. For example, using some of the abstractions in C and C++ correctly requires an understanding of their implementation. For Haskell, the only properties we care about are those needed to define the abstraction (mostly)

The language constructs in imperative languages seem less powerful. For example control flow structures like if statements, loops and exceptions are hard-coded into the language, supported by syntax. In a functional language, we can create these control flow structures using functions and manipulate them as first class values.

The abstractions commonly used in functional programming are very powerful. Although they aren't a property of it, the Functor, Monoid, Applicative and Monad abstractions are extremely powerful and general. They provide a framework to manipulate data very rigorously. Using this framework ensures correctness for achieving the particular task. 

Speaking of abstractions, types are of course absolutely awesome. They provide the best abstraction yet. Some types perfectly describe their implementation counterpart and much more succinctly. In fact parametricity will guarantee certain theorems hold which is very nice to have and nigh impossible in real-world Java and C++ systems.

Lets say we wanted to implement a function that removes duplicates.

removeDups :: Ord a => [a] -> [a]
removeDups = snd . foldl step (S.empty, [])
step (t, acc) a = if S.member a t
then (t, acc)
else (S.insert a t, acc ++ [a])

public static List<A> removeDups(List<A> inp){
HashSet<A> s = new HashSet<A>();
List<A> ans = new List<A>();
for(A a : inp){

The difference over here is that using the model for iteration through the 'iterate' function we separate out the logic for a single step allowing us to observe and preserve our invariants with greater clarity. The invariant here is that after processing the first i characters, the set contains entries of all distinct characters upto that point and the list is a subsequence of the original having the first occurrence of all distinct characters upto the ith index.  We don't have to worry about the edge cases of the string. The logic for collapsing a structure by accumulating the values through a left-right traversal has already been implemented.

In the real world though interfacing with the low level components is essential for performant code, which breaks the nice abstractions. Thats not really reflected over here though.