Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing the dependency from clojure hadoop to a specific hadoop version #9

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 126 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
A library to assist in writing Hadoop MapReduce jobs in Clojure.

Originally written by Stuart Sierra ([http://stuartsierra.com/](http://stuartsierra.com)).

Extended by Roman Scherer, Christopher Miles, Ian Eslick, Dave Lambert, Alex Ott, and other.

Stable releases are available via [http://clojars.org](http://clojars.org)

##Resources

* [Stuart's presentation on clojure-hadoop](http://vimeo.com/7669741)
* [Introduction to clojure-hadoop](http://alexott.net/en/clojure/ClojureHadoop.html)
* [Hadoop](http://hadoop.apache.org/)
* [Clojure](http://clojure.org/)

##Using the Library

This library provides different layers of abstraction away from the raw Hadoop API.

###Layer 1: clojure-hadoop.imports

Provides convenience functions for importing the many classes and interfaces in the Hadoop API.


###Layer 2: clojure-hadoop.gen

Provides gen-class macros to generate the multiple classes needed for a MapReduce job. See the example file "wordcount1.clj" for a demonstration of these macros.


###Layer 3: clojure-hadoop.wrap

Provides wrapper functions that automatically convert between Hadoop Text objects and Clojure data structures. See the example file "wordcount2.clj" for a demonstration of these wrappers.


###Layer 4: clojure-hadoop.job

Provides a complete implementation of a Hadoop MapReduce job that can be dynamically configured to use any Clojure functions in the map and reduce phases. See the example file "wordcount3.clj" for a demonstration of this usage.


###Layer 5: clojure-hadoop.defjob

A convenient macro to configure MapReduce jobs with Clojure code. See the example files "wordcount4.clj" and "wordcount5.clj" for demonstrations of this macro.

##Requiring

You can either require this library through leiningen dependencies or maven2. It is important to notice that you must include the desired version of hadoop-core as well.

Currently, versions 0.20.2 and 1.0.3 are tested and working.

###Leiningen

```
[clojure-hadoop "1.4.1"]
[org.apache.hadoop/hadoop-core "1.0.3"]
```

###Maven2

```
<dependencies>
...

<dependency>
<groupId>clojure-hadoop</groupId>
<artifactId>clojure-hadoop</artifactId>
<version>1.4.1</version>
</dependency>

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.0.3</version>
</dependency>

...
</dependencies>
...
<repositories>
...

<repository>
<id>clojars</id>
<url> http://clojars.org/repo </url>
</repository>

...
</repositories>
```


##Building from source

In the top-level directory of this project, run:

```
lein jar
```

This will compile and build the JAR file.


###Dependencies

* [Java 6 JDK](http://java.sun.com/)
* [Hadoop core](http://hadoop.apache.org/releases.html)
* [Leiningen](http://github.com/technomancy/leiningen)


##Running the Examples and Tests

With hadoop 0.20.2:

```
lein with-profile 0.20.2 test
```

With hadoop 1.0.3:

```
lein with-profile 1.0.3 test
```


##License

Copyright (c) Stuart Sierra, 2009. All rights reserved. The use and distribution terms for this software are covered by the Eclipse Public License 1.0 (http://opensource.org/licenses/eclipse-1.0.php) which can be found in the file LICENSE.html at the root of this distribution. By using this software in any fashion, you are agreeing to be bound by the terms of this license. You must not remove this notice, or any other, from this software.
137 changes: 0 additions & 137 deletions README.txt

This file was deleted.

14 changes: 9 additions & 5 deletions project.clj
Original file line number Diff line number Diff line change
@@ -1,18 +1,21 @@
(defproject clojure-hadoop "1.4.1"
(defproject clojure-hadoop "1.4.2"
:description "Library to aid writing Hadoop jobs in Clojure."
:url "http://github.com/alexott/clojure-hadoop"
:license {:name "Eclipse Public License 1.0"
:url "http://opensource.org/licenses/eclipse-1.0.php"
:distribution "repo"
:comments "Same license as Clojure"}
:dependencies [[org.clojure/clojure "1.3.0"]
[org.apache.hadoop/hadoop-core "1.0.3"]
[log4j/log4j "1.2.16" :exclusions [javax.mail/mail
javax.jms/jms
com.sun.jdmk/jmxtools
com.sun.jmx/jmxri]]
]
:dev-dependencies [[swank-clojure "1.4.2"]]
com.sun.jmx/jmxri]]]

:min-lein-version "2.0.0"
:profiles {:0.20.2 {:dependencies [[org.apache.hadoop/hadoop-core "0.20.2"]]}
:1.0.3 {:dependencies [[org.apache.hadoop/hadoop-core "1.0.3"]]}
:dev {:dependencies [[swank-clojure "1.4.2"]]}}

:aot [clojure-hadoop.config
clojure-hadoop.defjob
clojure-hadoop.gen
Expand All @@ -24,3 +27,4 @@
;; TODO: Remove them? Only needed for the tests.
clojure-hadoop.examples.wordcount1
clojure-hadoop.examples.wordcount2])

2 changes: 1 addition & 1 deletion src/clojure_hadoop/imports.clj
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@
'(org.apache.hadoop.fs
PathFilter PositionedReadable Seekable Syncable BlockLocation
BufferedFSInputStream ChecksumFileSystem ContentSummary DF DU
FileStatus FileSystem FileSystem$Statistics FileUtil HardLink
FileStatus FileSystem FileSystem$Statistics FileUtil
FilterFileSystem FSDataInputStream FSDataOutputStream FSInputChecker
FSInputStream FSOutputSummer FsShell FsUrlStreamHandlerFactory HarFileSystem
InMemoryFileSystem LocalDirAllocator LocalFileSystem Path RawLocalFileSystem
Expand Down
9 changes: 9 additions & 0 deletions test-resources/to_be_counted.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque accumsan mattis cursus. Donec urna felis, pretium eu scelerisque non, egestas sed lectus. Sed interdum, augue vel pulvinar ornare, urna quam dapibus sapien, nec ornare elit turpis et neque. Suspendisse ipsum orci, mollis nec auctor et, auctor vitae justo. In hac habitasse platea dictumst. In convallis eros neque. Aliquam erat volutpat. Aenean orci nisi, lobortis sit amet dictum et, consectetur ut augue. Ut congue dui a nibh rutrum vel vestibulum nunc fringilla. Nullam laoreet orci vel nisl sodales mattis. Pellentesque neque ipsum, aliquam iaculis condimentum ullamcorper, varius sit amet sapien. Aenean at ultrices orci.

In ac accumsan est. Donec id tempus justo. Nulla est sapien, aliquam in congue vel, lobortis eu augue. Duis volutpat odio non odio varius commodo. Pellentesque in erat nec lectus ultrices feugiat quis a risus. Nullam dictum velit non diam aliquet tincidunt. Nunc accumsan velit sit amet urna pretium sodales. Mauris eget diam a lorem commodo interdum. Ut justo mi, blandit vel ultrices sit amet, fermentum non arcu. Cras sit amet ipsum ligula.

Nunc velit mauris, aliquam sit amet rhoncus at, eleifend non massa. Vivamus et lorem tortor. Curabitur sapien arcu, ultrices sit amet semper at, convallis vitae nisi. Phasellus pretium, sapien in venenatis ultricies, ligula magna vehicula lorem, id pulvinar nunc ante eu eros. Praesent quis arcu eu libero molestie porttitor. Pellentesque quis lectus at nisl ultrices porta non eu est. Suspendisse consequat vulputate mi, at hendrerit sem interdum a.

Nullam id mauris a orci iaculis imperdiet. Etiam sit amet auctor mi. Integer at orci dictum erat gravida consequat eget et elit. In ultricies, nulla vitae iaculis dignissim, velit ante congue neque, non hendrerit felis est non turpis. Nam placerat orci a dui fringilla vel dignissim erat ultrices. Mauris commodo iaculis semper. Praesent egestas, dui ut consectetur hendrerit, nibh lacus aliquet enim, eu adipiscing orci ipsum ut turpis. Donec iaculis libero quis nibh lacinia lobortis. Sed eu dui sapien, id vulputate velit. Etiam et est eget libero laoreet elementum. Sed lacinia sapien et lacus rhoncus cursus.

Donec ultrices enim mi. Nulla cursus venenatis enim, sit amet rutrum neque adipiscing et. Proin felis mi, malesuada eget malesuada a, tincidunt eget odio. Mauris metus magna, scelerisque ut auctor ac, elementum id mauris. Fusce sagittis dolor magna, vel tincidunt nunc. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Etiam quis lobortis libero. Vivamus non velit nec quam adipiscing aliquam a vitae nibh. Phasellus sem lorem, aliquam vitae pretium sed, porttitor sit amet sem. In pretium, lacus sagittis condimentum tincidunt, tortor odio hendrerit orci, eget consectetur augue dui ut libero. Nullam at sodales nulla. Morbi convallis vulputate magna, sed fringilla enim mollis at.
14 changes: 7 additions & 7 deletions test/clojure_hadoop/examples/wordcount1.clj
Original file line number Diff line number Diff line change
Expand Up @@ -11,18 +11,18 @@
;; three functions mapper-map, reducer-reduce, and tool-run.
;;
;; To run this example, first compile it (see instructions in
;; README.txt), then run this command (all one line):
;; README.md), then run this command (all one line):
;;
;; java -cp examples.jar \
;; clojure_hadoop.examples.wordcount1 \
;; README.txt out1
;; README.md out1
;;
;; This will count the instances of each word in README.txt and write
;; This will count the instances of each word in README.md and write
;; the results to out1/part-00000


(ns clojure-hadoop.examples.wordcount1
(:require [clojure-hadoop.gen :as gen]
(:require [clojure-hadoop.gen :as gen]
[clojure-hadoop.imports :as imp])
(:import (java.util StringTokenizer)
(org.apache.hadoop.util Tool))
Expand All @@ -32,7 +32,7 @@
(imp/import-fs)
(imp/import-io)
(imp/import-mapreduce)
(imp/import-mapreduce-lib)
(imp/import-mapreduce-lib)

(gen/gen-job-classes) ;; generates Tool, Mapper, and Reducer classes
(gen/gen-main-method) ;; generates Tool.main method
Expand Down Expand Up @@ -80,12 +80,12 @@
(.setMapperClass (Class/forName "clojure_hadoop.examples.wordcount1_mapper"))
(.setReducerClass (Class/forName "clojure_hadoop.examples.wordcount1_reducer"))
(.setInputFormatClass TextInputFormat)
(.setOutputFormatClass TextOutputFormat)
(.setOutputFormatClass TextOutputFormat)
(FileInputFormat/setInputPaths (first args))
(FileOutputFormat/setOutputPath (Path. (second args)))
(.waitForCompletion true))
0)

(deftest test-wordcount-1
(.delete (FileSystem/get (Configuration.)) (Path. "tmp/out1") true)
(is (tool-run (clojure_hadoop.job.) ["README.txt" "tmp/out1"])))
(is (tool-run (clojure_hadoop.job.) ["test-resources/to_be_counted.txt" "tmp/out1"])))
Loading