Skip to content

Clojure-based, R-like statistical computing and graphics environment for the JVM

Notifications You must be signed in to change notification settings

sc13-bioinf/incanter

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Incanter

Overview and motivation

Incanter is a Clojure-based, R-like statistical computing and graphics environment for the JVM. At the core of Incanter are the Parallel Colt numerics library, a multithreaded version of Colt, and the JFreeChart charting library, as well as several other Java and Clojure libraries.

The motivation for creating Incanter is to provide a JVM-based statistical computing and graphics platform with R-like semantics and interactive-programming environment. Running on the JVM provides access to the large number of existing Java libraries for data access, data processing, and presentation. Clojure’s seamless integration with Java makes leveraging these libraries much simpler than is possible in R, and Incanter’s R-like semantics makes statistical programming much simpler than is possible in pure Java.

Motivation for a Lisp-based R-like statistical environment can be found in the paper Back to the Future: Lisp as a Base for a Statistical Computing System by Ihaka and Lang (2008). Incanter is also inspired by the now dormant Lisp-Stat (see the special volume in the Journal of Statistical Software on Lisp-Stat: Past, Present, and Future from 2005).

Motivation for a JVM-based Lisp can be found at the Clojure website, and screencasts of several excellent Clojure talks by the language’s creator, Rich Hickey, can be found at clojure.blip.tv.

Getting started with Clojure

For a great introduction to programming in Clojure, read Clojure – Functional Programming for the JVM. by R. Mark Volkmann. For an even more extensive introduction, get one of the books on Clojure Programming Clojure by Stuart Halloway, “The Joy of Clojure” by Michael Fogus and Chris Houser, “Clojure in Action” by Amit Rathore, “Practical Clojure” by Luke VanderHart and Stuart Sierra.

Other Clojure resources

Getting started with Incanter

Start by visiting the Incanter website for an overview, checkout the documentation page for a listing of HOW-TOs and examples, and then download either an Incanter executable or a pre-built version of the latest build of Incanter, which includes all the necessary dependencies, and unpack the file (if you would like to build it from source, read Building Incanter).

Start the Clojure REPL (aka the shell) by double-clicking on the downloaded executable or, if you downloaded the pre-built distribution, running one of the scripts in the Incanter directory: script/repl or script\repl.bat on Windows. NOTE: The lein repl task uses Clojure 1.1, and Incanter 1.2.x requires Clojure 1.2, so use the repl script instead of lein.

From the Clojure REPL, load the Incanter libraries:

user=> (use '(incanter core stats charts))

Try an example: sample 1,000 values from a standard-normal distribution and view a histogram:

user=> (view (histogram (sample-normal 1000)))

Try another simple example, a plot of the sine function over the range -4 to 4:

user=> (view (function-plot sin -10 10))

The online documentation for most Incanter functions contain usage examples. The documentation can be viewed using Clojure’s doc function. For example, to view the documentation and usage examples for the linear-model function, call (doc linear-model) from the Clojure shell. Use (find-doc "search term") to search the online documentation from the Clojure shell. The API documentation can also be found at http://liebke.github.com/incanter.

More Incanter examples

Documentation

The following documentation covers the Incanter and Clojure APIs and the APIs of the underlying java libraries.

Incanter documentation

Related API documentation

Building Incanter

To build and test Incanter, you will need to have Leiningen and git installed:

1. Clone the repository with git: git clone git://github.com/liebke/incanter.git

2. Install Leiningen
a. Download the lein script: wget https://github.com/technomancy/leiningen/raw/stable/bin/lein
(use lein.bat on Windows)
b. Place it on your path and chmod it to be executable: chmod +x lein
c. Run: lein self-install

3. From the incanter directory, download the necessary dependencies: lein deps

4. Start a REPL: script/repl or script\repl.bat, or start a Swank server: script/swank or script\swank.bat

Other tasks:

  • If you want to run the tests for each of Incanter’s modules, use script/test
  • Each of Incanter’s modules are independent Leiningen projects. Just cd into modules/incanter-* and use Leiningen to build each one as a standalone library.
  • script/install uses Leiningen to build all the modules and install them in your local ~/.m2 repository.

Incanter dependencies

Problems Moving to Clojure 1.3

Numerics

  • Integer overflow in distributions_tests.clj, function
    (large-integer-tests). The test uses @(reduce
  • (repeat 100 2))@ and (reduce * (repeat 100 3)). Workaround:
    change * to *’ in the test. Open question: propagating number
    promotion looks to be a big pain. Should we add ticks to all the
    arithmetic operators, or should be force them to bigint by adding an
    “N” to the original literals?
  • Promotion problems:
    • Test failure in (dice-string) (stats_tests.clj:184), comparing a double (0.25) to a Ratio (1/4).
    • Test failure in (chebyshev) (stats_tests.clj:236), comparing an integer to a double.
    • Workaround: Change from equality operator (=) to equivalence operator (==). This should probably be done comprehensively.
  • Bit ops no longer support doubles. Appears to only affect
    incanter.core/get-dummies, which was passing a double but didn’t
    need to. Have made that explicitly an integer.
  • (matrix) promotes to double.
  • Equality tests are now problematic. Either use = and double or
    bigint literals, or use == and don’t force precision on literals.
  • However, matrix-to-list or lazy-seq-to-list compares don’t work with
    ==. Symmetry is broken and and clojure lists don’t look like numeric
    lists, and clojure vectors don’t look like algebraic vectors.

Sequences

  • Matrix.java no longer accepted as seq. Previously, implementing ISeq
    was sufficient, now it appears the marker interface
    clojure.lang.Sequential is also needed. Workaround: added Sequential
    to Matrix’s implements clause. Open question: Should this be
    Sequential vs. Seqable?

Dynamic vars

  • Compiler complains about *test-statistic-iterations* and
    *test-statistic-map* looking like dynamic vars, but not being
    declared as such. They aren’t rebound in incanter-core, but the
    docstring for test-statistic-distribution implies than an
    application can rebind them, so I’m adding ^:dynamic to the decls.
  • $data was not declared as dynamic, could not be rebound. Workaround:
    added ^{:dynamic true} to metadata. Open question: does rebinding
    this fit in the 1.3 model for vars? Check with Stu about threading.

Leiningen

  • Complains that class clojure.set not found (called from
    incanter.core as clojure.set/difference). This didn’t happen when
    building under cake, for some reason. Workaround: change call from
    clojure.set/difference to just difference, add :use in ns decl.

Contrib migration

  • From clojure.contrib.core, defvar and defvar- didn’t make it into
    core.incubator. Scope: only used by distributions.clj. Workaround:
    changed defvar- calls to just def.

Missing tests

These are things I caught by running examples by hand, but were not
picked up by any test cases.

  • $data not used in -core

Possible behavior changes

  • read-dataset with a URL that redirects gives a dataset with the
    redirect response as the rows, rather than following the
    redirect. Is this how it was under clojure 1.2?
  • $where has some examples on the web that use strings as the column
    keys. I couldn’t get that to work, had to use keywords
    instead. Intended change or accidental?

About

Clojure-based, R-like statistical computing and graphics environment for the JVM

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Clojure 98.0%
  • Java 1.7%
  • Shell 0.3%