Distributing Computation in Haskell

Created by Ian Connolly / Slides available on GitHub

Hi, I'm Ian

I've been working on libraries for cloud computing in Haskell

Specifically I've been working on getting Cloud Haskell working on AWS

As well as trying to show why Haskell, public clouds, and Cloud Haskell are a great fit.

The Plan

  • Intro to Haskell
  • Intro to Cloud Haskell
  • Overview of my FYP
  • Demo
  • Some motivation & reflection

Haskell

The Basics

Haskell is a:

  • Purely functional
  • Lazy
  • Strongly and statically typed

... programming language.

The Basics pt. 2

Purely functional means that functions have no side-effects in Haskell*

Lazy semantics mean functions are evaluated only to the extent required by a caller

Strongly and statically typed means a program's types are checked at compile-time with no coercion

Quicksort


quicksort :: Ord a => [a] -> [a]
quicksort []     = []
quicksort (p:xs) = (quicksort lesser) ++ [p] ++ (quicksort greater)
where
    lesser  = filter (< p) xs
    greater = filter (>= p) xs

IO

IO is fundamentally impure, how does it work in Haskell?

We represent impure actions as IO actions

We compose IO actions using an IO monad

Haskell runs the IO action called 'main' with special semantics

An example


main :: IO ()
main = do
    putStrLn "I get printed to stdout"
    putStr "I'm a prompt: "
    input <- getLine
    putStr "\n"
    putStrLn input 

Other Haskell Features

  • Painless, lightweight pure parallelism
  • Lightweight concurrent threads (with forkIO)
  • Rich type-system at the limit of decidability

Advantages

  • Idiomatic Haskell is easy to understand
  • Concise
  • The type system is your best friend
  • Composition, not hierarchy
  • Speed

Why Haskell in the Cloud?

  • Haskell gives a unique combination of safety and abstraction
  • Functional languages encourage the 'right' architecture for distribution (eg. MapReduce)
  • Haskell's constraints lead to novel programs

Cloud Haskell

Cloud Haskell is a domain-specific language for developing programs for a distributed computing environment, shallowly embedded in Haskell, providing a message-passing communication model.

Features

  • Erlang-inspired message passing
  • Serialising (some) function closures across networks
  • Compatible with existing shared memory concurrency

In short, trying to leverage the best of Haskell and Erlang

Let's break it down

Cloud Haskell programs are made up of:

  • Processes
  • Channels & Messages
  • Process monitoring, management, and init abstractions

Some points

  • Processes can run anywhere (including forkIO!)
  • The programmer builds their program out of processes communicating via channels.
  • Only in the main IO code do they deal with the service provider.
cloud haskell

Echo in Cloud Haskell


echoRemote :: () -> Backend -> Process ()
echoRemote () _backend = forever $ do
  str <- expect
  remoteSend (str :: String)

echoLocal :: LocalProcess ()
echoLocal = do
  str <- liftIO $ putStr "# " >> hFlush stdout >> getLine
  unless (null str) $ do
    localSend str
    liftIO $ putStr "Echo: " >> hFlush stdout
    echo <- localExpect
    liftIO $ putStrLn echo
    echoLocal

Competitors

Erlang/OTP is the most similar technology

Sizeable, mature industry deployments.

Has domain-specific features baked in (binary pattern-matching!)

Google's Go has a pragmatic take on channels and concurrency.

My Project(s)

  • aws-service-api
  • distributed-process-aws

aws-service-api

A Haskell library for managing the creation of 'Cloud Services'

Create a VM, find available cloud services, parse AWS output to be usable by Cloud Haskell.

Majority of the work was here, interfacing with AWS

distributed-process-aws

A Haskell library, built on top of aws-service-api

Allows AWS to be a Cloud Haskell Service Provider

Majority of this was an adaption of the existing distributed-process-azure library to work on AWS

Adaptive Primitives

Two additions to the aws-service-api library over its azure cousin are createVM and shutdownVM

These automatically add and remove VMs from a Cloud Service

Bubbled up to distributed-process-aws they allow us to expose the following high-level API:


scaleUp :: CloudService -> Int -> CloudService
scaleDown :: CloudService -> Int -> CloudService

What's The Benefit?

  • AWS is the largest public cloud in the world
  • Previously Cloud Haskell didn't take advantage of cloud's ability to scale
  • Now the system itself can decide how many machines it runs on
  • Opens Cloud Haskell up to a number of adapative applications

Challenges

  • Azure counterparts are 2 years old and unmaintained
  • Problems with Haskell's ecosystem
  • API Transparency

Demo

The Demos

Two demos, Fibonacci and Echo

Echo is an expansion of the code you saw earlier

Fibonacci demonstrates the serialization of functions

Future Work

Better AWS primitives

Specifically, find a way to expose instance types to higher level programs

Questions?