I gave an intro talk about Clojure at GNUnify 2010, Pune today. It was supposed to be a very basic talk on Clojure aimed at Java programmers. Here are the slides -
Posts Tagged functional programming
Slides from my Clojure talk at GNUnify 2010
I suddenly needed to download around 3000 files from the Internet. I had the urls in a sequence and I was thinking about a nice way to download the files in parallel.
The idea of using Clojure Agents came naturally to my mind and I was thinking about writing an Agent based HTTP client in Clojure. I asked around on the Clojure IRC channel and the very helpful Stuart Sierra pointed me towards clojure.contrib.http.agent
Indeed, c.c.http.agent seemed to be exactly what I had in my mind
The API seemed to be straightforward enough and I got cracking immediately. I came up with something like this –
;;; downloader.clj -- Parallel Downloader -*- Clojure -*- ;;; Time-stamp: "2009-10-06 13:38:57 ghoseb" ;;; Author: Baishampayan Ghose (ns downloader (:require [clojure.contrib.http.agent :as h] [clojure.contrib.duck-streams :as d])) A vector of vectors containing the file name and the URL (def url-data [["file1" "http://some.domain/file1.xml"] ["file2" "http://some.domain/file2.xml"] ; Many many more :) ]) (defn download "Download the data in the given URL using HTTP Agents Args: file-name - The file name to save the data in url - The URL to fetch " [file-name url] (h/http-agent url :handler (fn [agnt] (let [fname file-name] ; File name in a closure (with-open [w (d/writer fname)] (d/copy (h/stream agnt) w)))))) (defn download-all "Download all the URLs Args: url-data - A vector of vectors containing the file name and the url " [url-data] (doseq [[file-name url] url-data] (download file-name url))) (download-all url-data)
This looked fine and worked with a small set of urls. But when I ran it on the full-blown set of URLs, the server bailed out because of too many concurrent requests. The reason being the fact that http.agent uses send-off to dispatch action to the agents and send-off can end up using a potentially very large thread-pool.
Surely I needed to somehow make sure that only a limited number of files are downloaded in parallel and start downloading more when those are done.
To achieve that, I did this –
(def partitioned-data (partition 15 url-data)) ;; 15 being the max parallel downloads (defn download-all2 "Download all the files, step by step Args: p-url-data - Partitioned url data " [p-url-data] (doseq [url-data p-url-data] (let [agnts (map #(download (first %) (second %)) url-data)] (apply await agnts)))) ; Wait till the agents finish (download-all2 partitioned-data)
What did I just do? I simply partitioned the data set by the number of parallel downloads I wanted to do, and then modified the download-all function to take the partitioned data, dispatch agents on one partition and wait for them to finish, and then move on to the next partition.
Simple, yet very beautiful.
Setting up Emacs & Clojure with Emacs Starter Kit
Clojure is a very modern functional programming language which runs on the JVM. It’s a very cleanly designed Lisp dialect and has all the features that any useful programming language would require. You can learn more about Clojure on the official website, the wikibook, a comprehensive article by R. Mark Volkmann or by buying the awesome book by Stuart Halloway.
Now to setup Emacs as a Clojure IDE. If you are new to Emacs then don’t worry as we will take the easiest way to set it all up. If you are a Emacs veteran, then keep your own dotemacs and the bunch of customisations that you have done in all these years aside and follow along.
Installing Emacs is easy. On an Ubuntu system, do this -
$ sudo apt-get install emacs-snapshot-gtk
That will install Emacs 23 for you. You will also need a few other things -
$ sudo apt-get install sun-java6-jdk ant git-core
You will need the JDK and Ant to build and run Clojure (it runs on the JVM, remember?) and Git to fetch the Clojure an other related libraries.
Now to setup Emacs, we will use a brilliant set of Emacs Lisp libraries aptly named Emacs Starter Kit (ESK). To get it, do this -
$ git clone git://github.com/technomancy/emacs-starter-kit.git ~/.emacs.d
That will get the ESK and will put it in your ~/.emacs.d (existing Emacs users, please move your ~/.emacs and ~/.emacs.d to some other location before doing this. You can get back your old settings later).
What ESK does is that it provides a bunch of sane defaults for pretty much everything in Emacs. You can safely use it and get a very usable Emacs setup almost immediately.
You can now launch Emacs by going to Applications > Accessories and selecting Emacs Snapshot Gtk. Once Emacs has started up, install the Emacs Clojure libraries by typing M-x package-list-packages. That will start up a buffer with a bunch of Emacs packages. Go to the line that says clojure-mode in it and press i. Then press x to install the package. Once that’s done you are ready to install Clojure. Now type M-x clojure-install and it will prompt you for a directory to install Clojure. Just press Enter for now and let it proceed. You can always change these things later.
That will fetch the Clojure runtime and also a few other useful libraries like clojure-contrib and will install them inside ~/src.
Once that is done, you can type M-x slime to start Clojure inside your Emacs. Play around with it and learn the various key-bindings for Slime and Paredit which is a fantastic system for doing Lisp development. Paredit is so good that it makes all those parentheses just vanish in thin air. But yes, it takes a bit getting used to which you can achieve with some practice. One tip is to type C-h m in Emacs to see the documentation of all the modes that are currently active. This gives a good overview of all the key-bindings and the functions bound to those keys.
To develop slightly more complex Clojure applications, you need to use different dependencies with your own code. Since Java has its own classpath related quirkiness it can become a slightly complicated task. But then, there is cool function in Emacs Starter Kit called clojure-project which will make your life easy.
To use it all you need to do is to follow this simple directory structure and the rest will be taken care of automagically. The directory structure should be like this -
myproject
|-- lib
|-- src
|-- target
| |-- classes
| `-- dependency
`-- test
Copy the Clojure & Clojure Contrib (or any other dependency) JAR files into lib (you also alternately use a build system like Maven to manage dependencies for you in which case you should consider configuring the build system to copy the dependencies into target/dependency). Keep all your code inside src and tests inside test. The src and test directories should contain namespaces like foo.bar & foo.bar.test respectively (you can later use clojure-test-mode to automatically pick up the tests and run them quickly). So when you run M-x clojure-project, it will ask you for a directory name. Give it the name of the project directory and boom! Emacs will launch Slime with the right classpath settings.
If you want to customise any Emacs setting, just create a file with your current system login id called <username.el> inside ~/.emacs.d and it will be automatically loaded when you restart Emacs. If you use additional libraries, create a directory with your username inside the same directory and everything inside will be in the Emacs load-path. Nifty, right?
So that’s all required to setup Emacs for Clojure. Join the Clojure mailing list to communicate with the Clojure community. Do check the Emacs Wiki in case you have any queries about Emacs.
Above all, have fun. That’s what we are here for.
By the way, I am @ghoseb on Twitter and I think you should follow me
UPDATE: Fixed a factual inconsistency regarding the paths used for clojure-project. Thanks to Phil (the creator of ESK & clojure-project) for pointing this out!