General


31
May 10

Clojure Protocols & Datatypes — A sneak peek

Clojure 1.2 introduces two very remarkable features – Protocols and Datatypes. Clojure is defined in terms of abstractions and various implementations of those abstractions. For example, vectors, maps, lists, sets in Clojure implement the sequence abstraction which lets us treat any of those data structures as sequences.
Until recently it was not possible feasible to define and implement such core abstractions in Clojure itself; one had to drop down to Java (or C#) for those tasks, but not anymore :)

Clojure 1.2 now has excellent facilities for defining and implementing similar abstractions in a highly dynamic manner while maintaining fantastic performance characteristics.
In this post, I will give you a brief overview of these new features and will show you how they are useful.

Protocols

Protocols in Clojure are similar to Java Interfaces, though not quite. Basically a protocol is a contract, a set of functionalities without any implementation. Let’s consider a simple protocol –

(defprotocol Fly
  "A simple protocol for flying"
  (fly [this] "Method to fly"))

So here we have a trivial protocol Fly which declares a method fly which takes one argument ‘this’ (which is actually the type implementing the protocol itself). In case of all methods defined via Protocols, the first argument is always the implementing type itself. The name ‘this’ is just a convention; it could be ’self’, etc. or anything.

When we declared the Fly protocol, two new vars were created. One is ‘Fly’, the protocol itself, and the other is ‘fly’ which is a polymorphic function that will get called when we execute it on an implementation of Fly.

Right now, if you try to execute the method ‘fly’ on any object, you will get an exception because no types are implementing that protocol yet, which brings us to the next topic, DataTypes.

DataTypes

Traditionally in Clojure whenever we wanted to have some kind of record or a property-only Class, we used maps or struct-maps. Those serve the purpose perfectly well in most cases but the problem was that those maps didn’t have any type information attached to them. As a result, we had to put some extra keys in maps to help us determine the type of a record before we could dispatch methods. There were some obvious performance limitations too; being vanilla maps, they were never as fast as Plain Old Java Objects (POJOs). Enter deftype and its cousin defrecord.

In Clojure 1.2 we can define our own types using defrecord like this -

(defrecord Bird [nom species])

Boom! We have a custom type, Bird with two fields, name and species. We can now instantiate a Bird like this -

(def crow (Bird. "Crow" "Corvus corax"))

We can access the fields of the Bird instance by treating it like
a normal map -

user› (:nom crow)
"Crow"
user› (:species crow)
"Corvus corax"

We can also add/remove/modify keys in a record like we would do with a
normal map.

(def sparrow (assoc crow :nom "Sparrow" :species "Passer domesticus"))

This will create a new immutable instance of Bird with different data. Note that since Clojure records are persistent and immutable, the original crow instance is not affected.

Now to make the Bird fly. We already have a protocol called Fly. We need to implement the protocol so that our birds can actually fly. One way to do that is to put the protocol implementation inline with the record definition itself -

(defrecord Bird [nom species]
  Fly
  (fly [this] (str (:nom this) " flies..."))

So easy, right? If we now create another instance of Bird, it will actually be able to fly -

user› (def kiwi (Bird. "Kiwi" "Apteryx australis"))
#'user/kiwi
user› (fly kiwi)
"Kiwi flies..."

Great! But what happens to the Crow, and Sparrow? We created those instances when the Bird record didn’t have any implementation of the Fly protocol. You might face similar issues when you don’t have control over the code which defines the record/class. You will need to extend those types dynamically with implementation of a protocol. Enter extend-type. extend-type (and its cousin extend-protocol) allows us to implement protocols on pre-existing types. Consider the following example -

(defprotocol Walk
  "A simple protocol to make birds walk"
  (walk [this] "Birds want to walk too!"))
 
(extend-type Bird
  Walk
  (walk [this] (str (:nom this) " walks too..."))

We just added an implementation of the Walk protocol to the existing type Bird. All new Bird instances created from now on will be able to Walk and Fly.

user› (def hummingbird (Bird. "Hummingbird" "Selasphorus rufus"))
user› (fly hummingbird)
"Hummingbird flies..."
user› (walk hummingbird)
"Hummingbird walks too..."

Cool, right? At times you might require a anonymous object which implements some protocol or interface. You could utilise those objects in cases where you just need an object which implements a given protocol but you don’t care about its type. Clojure 1.2 gives you reify. reify allows us to create one-off anonymous objects which implement one or more protocols.

user› (fly (reify Fly (fly [_] "Swine flu...")))
"Swine flu..."

Woah! Clojure can make Pigs fly :) Jokes apart, what we just did was very interesting. We just created an anonymous type which implements the Fly protocol and called the fly method on it; and it flu[sic] :)

We could implement multiple protocols in the same reify statement too,
like this -

(def pig (reify
                Fly (fly [_] "Swine flu...")
                Walk (walk [_] "Pig-man walking...")))
 
user› (fly pig)
"Swine flu..."
user› (walk pig)
"Pig-man walking..."

Beautiful. reify is quite similar to proxy and it is now recommended to use reify instead of proxy wherever possible because reify is much faster than proxy.

Before I finish off, let me explain the differences between defrecord and deftype. defrecord creates a new type and implements a few core Clojure interfaces like that of the persistent map, hashcode, keyword accessors, etc. If you are using deftype, Clojure will not implicitly implement any interface not provided by the user. In short, if you are using deftype, you will have to implement your own accessors, hashcode, etc. In most cases defrecord should suffice, but in other cases like when you need mutable fields, use deftype.

There is some in-depth explanation of Protocols and Datatypes on the Clojure website which you should consult if you need more information.

Bonus Material

Making Java Strings fly and walk :)

user› (extend-type java.lang.String
                   Fly (fly [this] "See me fly?")
                   Walk (walk [this] "Yes, that's me walking!"))
 
nil
user› (walk "foo")
"Yes, that's me walking!"
user› (fly "bar")
"See me fly?"

PS – I wrote this today because I was sitting at home, sick. There are possibly some mistakes in this post; in which case, please let me know.


27
May 10

Clojure Course in Pune

There is a possibility of me conducting a Clojure course/workshop in Pune. In the course I will cover Clojure from ground up, teaching how to build real-world applications in Clojure.

If you are (or your friend is) interested, please take part in this short survey which will help me understand what is needed.


19
Feb 10

Slides from my Clojure talk at GNUnify 2010

I gave an intro talk about Clojure at GNUnify 2010, Pune today. It was supposed to be a very basic talk on Clojure aimed at Java programmers. Here are the slides -


6
Oct 09

Downloading a bunch of files in parallel using Clojure Agents

I suddenly needed to download around 3000 files from the Internet. I had the urls in a sequence and I was thinking about a nice way to download the files in parallel.

The idea of using Clojure Agents came naturally to my mind and I was thinking about writing an Agent based HTTP client in Clojure. I asked around on the Clojure IRC channel and the very helpful Stuart Sierra pointed me towards clojure.contrib.http.agent

Indeed, c.c.http.agent seemed to be exactly what I had in my mind :)

The API seemed to be straightforward enough and I got cracking immediately. I came up with something like this –

;;; downloader.clj -- Parallel Downloader -*- Clojure -*-
;;; Time-stamp: "2009-10-06 13:38:57 ghoseb"
;;; Author: Baishampayan Ghose 
 
(ns downloader
  (:require [clojure.contrib.http.agent :as h]
            [clojure.contrib.duck-streams :as d]))
 
A vector of vectors containing the file name and the URL
(def url-data [["file1" "http://some.domain/file1.xml"]
               ["file2" "http://some.domain/file2.xml"]
               ; Many many more :)
               ])
 
(defn download
  "Download the data in the given URL using HTTP Agents
   Args:
     file-name - The file name to save the data in
     url - The URL to fetch
  "
  [file-name url]
  (h/http-agent url
                :handler (fn [agnt]
                           (let [fname file-name]  ; File name in a closure
                             (with-open [w (d/writer fname)]
                               (d/copy (h/stream agnt) w))))))
 
(defn download-all
  "Download all the URLs
   Args:
     url-data - A vector of vectors containing the file name and the url
  "
  [url-data]
  (doseq [[file-name url] url-data]
    (download file-name url)))
 
(download-all url-data)

This looked fine and worked with a small set of urls. But when I ran it on the full-blown set of URLs, the server bailed out because of too many concurrent requests. The reason being the fact that http.agent uses send-off to dispatch action to the agents and send-off can end up using a potentially very large thread-pool.

Surely I needed to somehow make sure that only a limited number of files are downloaded in parallel and start downloading more when those are done.

To achieve that, I did this –

(def partitioned-data (partition 15 url-data)) ;; 15 being the max parallel downloads
 
(defn download-all2
  "Download all the files, step by step
   Args:
     p-url-data - Partitioned url data
  "
  [p-url-data]
  (doseq [url-data p-url-data]
    (let [agnts (map #(download (first %) (second %)) url-data)]
      (apply await agnts)))) ; Wait till the agents finish
 
(download-all2 partitioned-data)

What did I just do? I simply partitioned the data set by the number of parallel downloads I wanted to do, and then modified the download-all function to take the partitioned data, dispatch agents on one partition and wait for them to finish, and then move on to the next partition.

Simple, yet very beautiful.


17
Oct 08

Follow your twitter followers

Now you can follow back your followers on Twitter, thanks to
https://followtwits.appspot.com/

Posted by email from Baishampayan’s Posterous


10
Oct 08

Stupidity Assurance

via http://dilbert.com/strips/comic/2008-10-10/

Posted by email from Baishampayan’s Posterous


9
Oct 08

The oTeam Lunch Outing

Photos taken at the oTeam Lunch Outing.

See and download the full gallery on posterous

Posted by email from Baishampayan’s Posterous


9
Oct 08

Pirate of the Pirate Bay

Ready to rip DVDs and seed torrents…

Posted by email from Baishampayan’s Posterous


29
Apr 08

DO NOT WANT!

Well the reason why I had to resurrect my old and unmaintained blog is that apparently, I have been nominated for the “Great Indian Developer Awards” in the “Top Committer” category.

I had no idea about this until now as I have received no communication from the organisers regarding the awards and I have no idea how I got nominated.

And I think this is completely Bullshit.

I don’t see any reason how I can get nominated even though I haven’t been active in the Free Software community since the last year or so.

I think there are many other people who deserve this much more than I do. Hell I don’t think I would even feature in the list of top 100 Indian Free Software developers.

To name a few, I would rather nominate the following (in no particular order):

So, the organisers, please don’t humiliate me like this and kindly take down my name from your God damned website.

I am just a Free Software enthusiast … I don’t need no award.

And in any case, I think this whole event is bullshit, just look at the other nominations and you will know. But that’s another story …


17
May 07

A reward from Sir Donald

Dear Mr Ghose,

Many thanks for your extremely helpful note. [...]

I owe you the customary "reward check", because these corrections
affect pages of The Art of Computer Programming. To what snail-mail
address should the check be sent?

Cordially,
Don Knuth

w00t!

Update: I received the cheque today! Check my flickr page for a scan of the cheque.