UberTechBlog: Parallelism in Clojure

An axiom of microprocessor development stating that the number of transistors on integrated circuits doubles approximately every two years is analogous to the fact the processing power doubles during the same period, relative to the cost or size. This has lead to the need of parallelism in programming languages and Clojure is one such language which provides a variety of functions for your multi-threaded code. In this article we’ll discuss about futures, atoms and refs which are all a consequential part of parallel programming.

Futures
Futures are supported out-of-box in clojure. They can be used to simply start a new memory intensive operation in a new thread which typically runs in a thread pool. Dereferencing a future may yield a result immediately or it will block until the future is done.

Let’s start with an example of a memory intensive operation. Here, we will try to write a range of numerical values in separate files, with each write operation for a file done with a separate future.

user=> (def a
 #_=>      (doall
 #_=>         (map
 #_=>            (fn [n] (future (spit (str (gensym "test") ".txt")
 #_=>                                        (apply str (doall (map inc (range n)))))))
 #_=>            [10000000 1000000 10000000])))
 #'user/a

In the above code doall causes the entire lazyseq to reside in memory at one time. As “map” maps through the vector of length 3, so 3 futures are started where each future starts a new spit operation on test[random number].txt files to write the range of numbers defined by the vector “[10000000 1000000 10000000]”. The above code will return a lazyseq of futures which will be stored in “a”.
Now let’s check when the future gets completed by using the following code:

user=> (map future-done? a)
 (false true false)
 user=> (map future-done? a)
 (true true false)

future-done? returns true if the following future is done. On my system the future operation with range of “10000000” takes more time as compared to the range of “1000000”.
So, “(map future-done? a)” for the first time returns (false true false), meaning that the second future is done but the first and third futures are still processing.
After awhile, “(map future-done? a)” returns (true true false), meaning that first and second futures are done.
The order of completion of futures may vary on your system, but eventually all will be done after awhile. This means that you can offload heavy operations in different threads and continue with other tasks.

Atoms
In the most simple terms atoms are uncoordinated and synchronous. We use atoms only when one identity needs to be updated synchronously. Synchronous access means that all values are updated before continuing further with the next update. Atoms simply ensures that in a multi-threaded environment the values of atom are either updated entirely or not at all.

We’ll start with the most basic example:

user=> (def a (atom {:firstname "" :lastname ""}))
 #'user/a
 user=> @a
 {:firstname "", :lastname ""}

Here we’ve defined a map as an atom. In order to find the value stored in an atom we use deref or a short form of deref “@”. Dereferencing returns the value stored in “a”, in this case we get the map defined during initialization.
To change the value of an atom we can either use swap! or reset!

1 2	user=> (swap! a #(assoc % :firstname %2 :lastname %3) "Sidharth" "Khattri") {:firstname "Sidharth", :lastname "Khattri"}

swap! takes atom as the first argument and the function to apply on the value stored in the atom as the second argument. In the above case we’ve defined an anonymous function that associates a new value to the previously defined map and returns that new value. Internally swap! applies compare-and-set! on the new values to cross check for any race conditions. If their exist a race condition i.e if two threads are trying to apply same function on the atom simultaneously and if the value of the atom has changed since they first began swapping, the loser thread will try to swap again until the present value is same as the last commit.

1 2	user=> (reset! a 1) 1

reset! sets the value of atom to a new value without regard for the previously stored value in the atom.

Refs
Unlike atoms, refs are coordinated and are used for synchronous access. They are used when multiple identities have to be changed synchronously and they use Software Transactional Memory System (STM) for memory transactions. Any changes to the values of refs have to be done in transactional blocks, i.e sync or dosync.

Refs are defined in the same way atoms are defined:

user=> (def task-to-be-done (ref #{1,2,3,4,5}))
 #'user/task-to-be-done
 user=> (def task-done (ref #{}))
 #'user/task-done

Here, we have defined two tasks that have to be updated simultaneously, i.e task-to-be-done and task-done, which are defined as refs containing hashsets.

user=> (defn update-values [value]
 #_=>      (dosync
 #_=>        (alter task-to-be-done disj value)
 #_=>          (alter task-done conj value)))
 #'user/check

Then we define a function – update-values, to alter the values of both the refs simultaneously in the dosync transactional block. “alter” is used to change the in-transactional-value of the refs. “disj” returns a new set that does not contain the key(s) passed as the argument. Similarly, “conj” returns a new set that contains the key(s) passed as the argument.

user=> (update-values 1)
 #{1}
 user=> @task-to-be-done
 #{2 3 4 5}
 user=> @task-done
 #{1}

On passing some value to “update-values” function, the values of both refs are updated simultaneously. In order to view the values stored in refs we use the same deref or “@” as used in atoms. Dereferencing task-to-be-done and @task-done returns the values stored in the refs.

Thus refs are used for managing multiple data structures in transactional blocks, atomically.

This article was first published on the Knoldus blog.

Tuesday, May 1, 2018

Parallelism in Clojure

No comments:

Post a Comment