Skip to content

ProrokEskobar1/kotlin-numpy

 
 

Repository files navigation

JetBrains incubator project Download GitHub License

Kotlin bindings for NumPy

This project is a Kotlin library, which is a statically typed wrapper for the NumPy library.

Features

  • Statically typed multidimensional arrays.
  • Idiomatic API for users with NumPy experience.
  • Random, math, linear algebra, and other useful functions from NumPy.
  • Python allocates memory for arrays and frees memory when JVM GC collects unnecessary arrays.
  • Direct access to array data using DirectBuffer.
    • Increased performance working with array's data compared to python.

Requirements

To use the library in your project, you will need:

  • Java 8 or above
  • Python 3.5 or above
  • NumPy 1.7 or above
  • if you are using macOS or Linux, you will need GCC, Clang.

Note: Make sure you use the correct Python environment. This is necessary to use the correct version of Python and NumPy.

For the convenience of installing Python, NumPy and setting the environment, it's recommended to use Anaconda.

Installation

In your Gradle build script:

  1. Add the kotlin-datascience repository.
  2. Add the org.jetbrains:kotlin-numpy:0.1.5 implementation dependency.

Groovy build script (build.gradle):

repositories {
    maven { url "https://kotlin.bintray.com/kotlin-datascience" }
}

dependencies {
    implementation 'org.jetbrains:kotlin-numpy:0.1.5'
}

Kotlin build script (build.gradle.kts):

repositories {
    maven("https://dl.bintray.com/kotlin/kotlin-datascience")
}

dependencies {
    implementation("org.jetbrains:kotlin-numpy:0.1.5")
}

The library will install ktnumpy (native library for kotlin-numpy) as python package the first time kotlin-numpy functions are called. For this, Python will be taken in environment of which program is running. You can install ktnumpy yourself:

pip install ktnumpy==%kotlin-numpy.version%

You can also run a program by manually specifying the path to Python. To do this, use LibraryLoader.setPythonConfig before calling kotlin-numpy functions.

Usage

Kotlin bindings for NumPy offer an API very similar to the original NumPy API. Consider the following programs:

Python:

import numpy as np
a = np.arange(15).reshape(3, 5) # ndarray([[ 0,  1,  2,  3,  4],
                                #        [ 5,  6,  7,  8,  9],
                                #        [10, 11, 12, 13, 14]])
print(a.shape == (3, 5))        # True
print(a.ndim == 2)              # True
print(a.dtype.name)             # 'int64'

b = (np.arange(15) ** 2).reshape(3, 5)

c = a * b
print(c)
# [[   0    1    8   27   64]
#  [ 125  216  343  512  729]
#  [1000 1331 1728 2197 2744]]

d = c.transpose().dot(a)
print(d)
# [[10625 11750 12875 14000 15125]
#  [14390 15938 17486 19034 20582]
#  [18995 21074 23153 25232 27311]
#  [24530 27266 30002 32738 35474]
#  [31085 34622 38159 41696 45233]]

Kotlin:

import org.jetbrains.numkt.core.*
import org.jetbrains.numkt.math.*
import org.jetbrains.numkt.*

fun main() {
    val a = arange(15).reshape(3, 5) // KtNDArray<Int>([[ 0,  1,  2,  3,  4],
                                                     // [ 5,  6,  7,  8,  9],
                                                     // [10, 11, 12, 13, 14]]

    println(a.shape.contentEquals(intArrayOf(3, 5))) // true
    println(a.ndim == 2)                             // true
    println(a.dtype)                                 // class java.lang.Integer

    // create an array of ints, we square each element and the shape to (3, 5) 
    val b = (arange(15) `**` 2).reshape(3, 5)

    // c is the product of a and b, element-wise
    val c = a * b
    println(c)
    // Output:
    // [[   0    1    8   27   64]
    //  [ 125  216  343  512  729]
    //  [1000 1331 1728 2197 2744]]
    
    // d is the dot product of the transposed c and a
    val d = c.transpose().dot(a)
    println(d)
    // Output:
    // [[10625 11750 12875 14000 15125]
    //  [14390 15938 17486 19034 20582]
    //  [18995 21074 23153 25232 27311]
    //  [24530 27266 30002 32738 35474]
    //  [31085 34622 38159 41696 45233]]

}

Array creation

Simple ways to create arrays look like this:

    array(arrayOf(1, 2, 3)) // simple flat array: KtNDArray<Int>([1, 2, 3])

    array<Float>(listOf(listOf(15, 13), listOf(2, 31))) // KtNDArray<Float>([[15f, 13f], 
                                                                          // [ 2f, 31f])

    ones<Double>(3, 3, 3) // array of ones. Shape will be (3, 3, 3)

    linspace<Double>( 1, 3, 10 ) // array have 10 numbers from 1 to 3

Basic operations

Arithmetic operations are supported:

    val a = array(arrayOf(20, 30, 40, 50)) // [20, 30, 40, 50]
    val b = arange(4) // [0, 1, 2, 3]
    val c = a - b // [20 29 38 47]

    b `**` 2 // [0, 1, 4, 9]
    sin(a) * 10 // [ 9.12945251, -9.88031624, 7.4511316, -2.62374854]

Matrix operations:

    val matA = array<Long>(listOf(listOf(1, 1), listOf(0, 1))) // KtNDArray<Long>([[1, 1])
                                                                                // [0, 1]])
    
    val matB = array<Long>(listOf(listOf(2, 0), listOf(3, 4))) // KtNDArray<Long>([[2, 0]
                                                                                // [3, 4]])

    println(matA * matB)
    // elementwise product
    // [[2 0]
    //  [0 4]]

    println(matA `@` matB)
    // matrix product:
    // [[5 4]
    //  [3 4]]

    println(matA.dot(matB))
    // matrix product:
    // [[5 4]
    //  [3 4]]

Augmented assigment (or override assigment) operations modify an existing array instead of creating a new one.

Note: When using augmented assignments, don't forget to import them explicitly (or import the entire package org.jetbrains.numkt.math.*. Otherwise, kotlin tries to extend the operation. For example, the line a += b, is extended to a = a + b.

    val a = ones<Int>(2, 3)
    val b = Random.random(2, 3)
    a *= 3

    b += a

Indexing, slicing, and iterating

Arrays in Kotlin bindings for NumPy use the traditional index to access the items. Also, there is an analogue of Python's slice. To skip an index in slice, use None. For example, [:8:2] in Python is equivalent to [None..8..2] in kotlin. Iteration occurs elementwise regardless of shape.

    val a = arange(10L) `**` 3 // KtNDArray<Long>([0, 1, 8, 27, 64, 125 216 343 512 729])

    //  a[2]
    println(a[2]) // 8

    println(a[2..5..1]) // equivalent a[2:5] in python. output: KtNDArray<Long>([8, 27, 64])

    // equivalent to a[0:6:2] = -1000 in python; from start to position 6, set every 2nd element to -1000
    a[0..6..2] = -1000
    println(a) // [-1000, 1, -1000, 27, -1000, 125, 216, 343, 512, 729]

    // reverse
    println(a[None..None..-1]) // [729, 512, 343, 216, 125, -1000, 27, -1000, 1, -1000]

    for (el in a.reshape(2, 5)) {
        print("${el.toDouble().pow(1.0 / 3.0)} ") // NaN 1.0 NaN 3.0 4.999999999999999 5.999999999999999 ...
    }

Remember that, when indexing, you must pass the number of indexes equal to the dimension of the KtNDArray (| j1, j2, j3, ... | = ndim). When slicing, this is not necessary.

    val x = array(arrayOf(1, 2, 3, 4, 5, 6)).apply { resize(2, 3, 1) }
    println(x.shape.joinToString())
//    2, 3, 1

    println(x[1..2])
//    [[[4]
//      [5]
//      [6]]]
      
    println(x[1, 2, 0])
//    6

There are three iterators:

Calls a standard Python iterator. Always returns an view. If iteration occurs over a flat array, use the scalar property to obtain the element. If the array is not a scalar, the null will be returned. To check, use methods isScalar and isNotScalar.

val a = linspace<Double>(0, 10).reshape(2, 5, 5)

// Get ten one-dimensional arrays
for (ax1 in a) {
    for (ax2 in ax1) {
        println(ax2)
    }
}

// Sum of all elements
var sum = 0.0
for (ax1 in a) {
    for (ax2 in ax1) {
        for (el in ax2) {
            sum += el.scalar!!
        }
    }
}

This iterator is a mapping of the C array iterator API, like in python np.nditer. It also displays items in the order they are in memory.

val a = arange(6).apply { resize(2, 3) }

// Square each element in the array
val iter = KtNDIter(a)
for (i in iter) {
    a[iter.multiIndex] = i * i
}

for (x in KtNDIter(a[None..None, 1..None..-1])) {
    print("$x ")
}

An iterator directly above the buffer. The fastest of all these iterators. Able to display view. Use method flatIter.

val a = linspace<Double>(0, 10)

// Displays all items
for (el in a.flatIter()) {
    print("$el ")
}

Stacking

    val a = floor(10 * Random.random(2, 2))
    val b = floor(10 * Random.random(2, 2))

    // stack arrays row wise
    val v = vstack(a, b)
    println(v.shape.joinToString())
    // 4, 2

    // stack arrays column wise
    val h = hstack(a, b)
    println(h.shape.joinToString())
    // 2, 4

NumPy routine coverage

  • Array creation
    • Ones and zeros - ✅
    • From existing data - ✅
    • Creating record arrays - ⬜
    • Numeric character arrays - ⬜
    • Numerical ranges - ✅
    • Building matrices - ✅
    • The Matrix class - ✅
  • Array manipulation routines
    • Basic operations - ✅
    • Changing array shape - ✅
    • Transpose-like operations - ✅
    • Changing number of dimensions - ✅
    • Changing kind of array - ✅
    • Joining arrays - ✅
    • Splitting arrays - ✅
    • Tiling arrays - ✅
    • Adding and removing elements - ✅
    • rearranging elements - ✅
  • Binary operations
    • Elementwise bit operations - ✅
    • Bit packing - ✅
    • Output formatting - ✅
  • String operations - ⬜
  • C-Types Foreign Function Interface (ctypeslib) - ⬜
  • Datetime Support Functions - ⬜
  • Data type routines - ⬜
  • Optionally Scipy-accelerated routines - ⬜
  • Mathematical functions with automatic domain (numpy.emath) - ⬜
  • Floating point error handling - ⬜
  • Discrete Fourier Transform (numpy.fit) - ⬜
  • Financial functions - ⬜
  • Functional programming - ⬜
  • NumPy-specific help functions - ⬜
  • Indexing routines
    • Generating index arrays - ⬜
    • Indexing-like operations - ✅
    • Inserting data into arrays - ⬜
    • Iterating over arrays - ⬜
  • Input and output
    • NumPy binary files (NPY, NPZ) - ⬜
    • Text files - ✅
    • Raw binary files - ✅
    • String formatting - ⬜
    • Memory mapping files - ⬜
    • Text formatting options - ⬜
    • Base-n representations - ⬜
    • Data sources - ⬜
    • Binary format description - ⬜
  • Linear algebra (numpy.linalg)
    • Matrix and vector products - ✅
    • Decompositions - ✅
    • Matrix eigenvalues - ✅
    • Norms and other numbers - ✅
    • Solving equations and inverting matrices - ✅ without tensors
  • Logic functions
    • Truth value testing - ✅
    • Array contents - ✅
    • Array type testing - ⬜
    • Logic operations - ✅
    • Comparison - ✅
  • Masked array operations - ⬜
  • Mathematical functions
    • Trigonometric functions - ✅
    • Hyperbolic functions - ✅
    • Rounding - ✅
    • Sums, products, differences - ✅
    • Exponents and logarithms - ✅
    • Other special functions - ✅
    • Floating point routines - ✅
    • Rational routines - ✅
    • Arithmetic operations - ✅
    • Handling complex numbers - ⬜
    • Miscellaneous - ✅
  • Matrix library (numpy.matlib) - ⬜
  • Miscellaneous routines - ⬜
  • Padding arrays - ⬜
  • Polynomials - ⬜
  • Random (numpy.random)
    • Simple random data - ✅
    • Permutations - ✅
    • Distributions - ✅
    • Random generator - ⬜
  • Set routines - ⬜
  • Sorting, searching and counting
    • Sorting - ✅
    • Searching - ✅
    • Counting - ✅
  • Statistics
    • Order statistics - ✅
    • Averages and variances - ✅
    • Correlating - ✅
    • Histograms - ⬜
  • Test support - ⬜
  • Window functions - ⬜

How it works

Foundation

Using Java Native Interface (JNI) and Python C Extensions, we attach the Python interpreter to the JVM process. There is a singleton Interpreter for this. Initialization of the Python interpreter occurs on the first call of any function. The interpreter will remain in the JVM until the JVM exits. The Interpreter class contains external functions (as callFunc) through which NumPy functions are called. Let's have a more detailed look at the call to NumPy functions.

The following arguments are required to call NumPy functions:

  • Array of call attributes - module names and function names in NumPy.
  • Array of arguments - function arguments, equivalent to *args in Python. Importantly, the arguments should be in the same order and in the same places as in Python. If you skip an argument, pass None instead.
  • Map of arguments - maps names of arguments to arguments, equivalent to **kwargs in Python.
  • Return type - which class we expect to return. If the expected type is KtNDArray, then this is not needed.

To call a NumPy function, the above arguments are passed to the associated external kotlin function. After that, the following code will appear in native code:

  • By the array of call attributes we get callable PyObject. This is a related function from NumPy.
  • The array of arguments is converted to a tuple of the corresponding Python objects.
  • If the map of arguments is not empty, then it is converted into a dictionary of keyword arguments.
  • We call a NumPy function with arguments passed to it - PyObject_Call(FunctionObject, TupleArgs, DictKwargs), in python it is FunctionObject(TupleArgs, DictKwargs)
  • Convert the result in a Java object and return it.

Let's have a look at an example.

The diagonal method returns the specified diagonal, of the same type as the called KtNDArray object.

fun <T : Any> KtNDArray<T>.diagonal(offset: Int = 0, axis1: Int = 0, axis2: Int = 1): KtNDArray<T> =
    callFunc(nameMethod = arrayOf("ndarray", "diagonal"), args = arrayOf(this, offset, axis1, axis2))

We see that the first argument is an array of strings. Getting the final attribute will look like this: numpy -> ndarray -> diagonal (the NumPy module is used by default). The next argument is an array of arguments: this (KtNDArray from which the diagonals are taken), offset, axis1, axis2. In this case, kwargs are not used. We also expect the array to return, so we don’t pass anything to the return type.

Objects

Type matching

When processing objects in native code, there is a conversion from Java objects to Python objects, and when the result is returned, back from Python objects to Java objects. The table below shows the conversion of objects of different types.

Kotlin -> Python Python/NumPy -> Kotlin
None -> None None -> null
Char -> str
String -> str str -> String
Boolean -> bool bool -> Boolean
Byte -> int8 int8 -> Byte
Short -> int16 int16 -> Short
Int -> int32 int32 -> Int
Long -> int64 int64 -> Long
Float -> float32 float32 -> Float
Double -> float64 float64 -> Double
Array -> tuple tuple -> Array
List -> list list -> List
Map -> dict dict -> Map
KtNDArray -> ndarray ndarray -> KtNDArray
Slice -> slice

What's inside KtNDArray

The main object type is KtNDArray. Like ndarray in NumPy, it is a homogeneous multidimensional array. KtNDArray holds a pointer to its corresponding ndarray. Using the pointer, we can perform operations on the array.

KtNDArray and ndarray operate on shared memory. Python allocates memory for the array, and through java.nio.DirectByteBuffer, we get access to this memory.

KtNDArray provides access to some ndarray attributes, such as shape, ndim, itemsize, size, strides, dtype. Additionally, KtNDArray has the field data of type ByteBuffer. This is the direct buffer.

Type safety

Kotlin is a statically typed programing language. This makes it possible to catch errors at the compilation stage.

Lest's have a look at an example:

Python:

import numpy as np

# ...

a = np.ones((3, 3), dtype=int) * 3
b = np.random.random((3, 3))

b *= a # success

a *= b # TypeError at runtime 

The same code written in Kotlin will notify us of an error during the compilation:

// ...

val a = ones<Int>(3, 3) * 3
val b = Random.random(3, 3)

b *= a // success

a *= b // compilation error
// Kotlin: Type mismatch: inferred type is KtNDArray<Double> but KtNDArray<Int> was expected

There are other types of errors that can be prevented during the compilation, for example:

  • type mismatch of function arguments :

Python:

a = np.array(['a', 'b', 'c'])
np.sin(a)

# TypeError: ufunc 'sin' not supported for the input types, 
# and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Kotlin:

val a = array(arrayOf('a', 'b', 'c'))
sin(a)

// Kotlin: Type parameter bound for T in fun <T : Number> sin(x: KtNDArray<T>): KtNDArray<Double> 
// is not satisfied: inferred type Char is not a subtype of Number
  • Method signature defined:

Python:

a = np.array([0, 1], [1, 0])

# TypeError: data type not understood

Kotlin:

val a = array(listOf(0, 1), listOf(1, 0))

// Kotlin: None of the following functions can be called with the arguments supplied: ...

In Python, there are implicit type conversions that can be really tricky. The following code will be seems to work fine, but implicitly converts float to int, which the user may not expect. As a result, the output differs from the desired but no errors are seen.

Python:

a = np.arange(15, dtype=np.int32)
b = np.linspace(0, 1, 15, dtype=np.float64)[::-1]
for i in range(15):
    a[i] = b[i]
print(a)

# [1 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

In Kotlin, the compiler will tell the user that an explicit conversion is required here.

Kotlin:

val a = arange(15)
val b = linspace<Double>(0, 1, 15)[None..None..-1]
for (i in 0..14) {
    a[i] = b[i] // Error
}

Building

To build the library, you will need GCC, Clang or MSVC.

The build system of this project is Gradle.

First, you should build the native library: run ./gradlew wheelBuild. The library will appear in ./build/libs/ktnumpy. This will also build the wheel which will be in the dist folder.

After building the native library, run ./gradlew assemble.

To run the test, use ./gradlew test.

To build everything and run tests, run: ./gradlew build.

About

Kotlin bindings for NumPy

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Kotlin 73.1%
  • C 25.2%
  • Python 1.3%
  • C++ 0.4%