diff --git a/.travis.yml b/.travis.yml
index d5cdaab..67c0e72 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -24,3 +24,5 @@ cache:
   directories:
     - $HOME/.gradle/caches/
     - $HOME/.gradle/wrapper/
+
+dist: trusty
\ No newline at end of file
diff --git a/README.md b/README.md
index e152671..087b0fe 100644
--- a/README.md
+++ b/README.md
@@ -85,3 +85,9 @@ Publishing
 Publishing to [Bintray][bintray] is currently done via a dedicated
 build configuration of an internal TeamCity server. This allows us
 to deploy a cross-platform version.
+
+Documentation
+----
+
+Visit [viktor Documentation](./docs/docs.md) for an extensive feature overview,
+instructive code examples and benchmarking data. 
\ No newline at end of file
diff --git a/docs/benchmark.md b/docs/benchmark.md
new file mode 100644
index 0000000..b60729d
--- /dev/null
+++ b/docs/benchmark.md
@@ -0,0 +1,120 @@
+# viktor Benchmarks
+
+We designed a series of microbenchmarks to compare `viktor`'s
+native SIMD optimization efficiency to that of a simple Kotlin/Java loop.
+
+The microbenchmark code can be found in `src/jmh` folder. To run the benchmarks, run
+the following commands from `viktor`'s root folder:
+```bash
+$ ./gradlew clean assemble benchmarkJar
+$ java -jar ./build/libs/viktor-benchmark.jar
+```
+You can add the usual [JMH](https://openjdk.java.net/projects/code-tools/jmh/)
+command line arguments to the latter command, e.g. `-o` to specify
+the output file or `-t` to control the number of threads.
+
+## Benchmark Environment
+
+We conducted the benchmarking on two machines:
+a laptop running on Intel Core i7-6820HQ CPU at 2.70GHz
+and a server running on Intel Xeon E5-2690 at 2.90GHz.
+The following table summarizes the main features of both:
+
+machine | laptop | server
+--------|--------|-------
+CPU | Intel Core i7-6820HQ | Intel Xeon E5-2690
+frequency | 2.70GHz | 2.90 GHz
+cores[^cores] | 8 | 32
+architecture | `amd64` | `amd64`
+highest SIMD extension[^simd] | `AVX` | `SSE2`
+OS | Ubuntu 18.04.3 LTS | CentOS Linux 7 (Core)
+
+[^cores]: The number of cores shouldn't matter since all benchmarks ran in a single thread.
+
+[^simd]: The most advanced extension that was used by `viktor`. In reality, the laptop
+  had `AVX2` and the server had `SSE4.2` as the highest extension.
+  
+## Benchmark Results
+
+The following data is provided for informational purposes only.
+In each benchmark, we considered arrays of size `1000`, `100_000` and `10_000_000`.
+We investigate two metrics:
+* `Array ops/s` is the number of times the operation was performed on the entire array
+  per second. Shown on a logarithmic scale.
+* `FLOPS` is the number of equivalent scalar operations performed per second. It is equal
+to `Array ops/s` multiplied by `Array size`. Shown on a linear scale.  
+
+### Math Benchmarks
+
+The task here was to calculate exponent (logarithm, `expm1`, `log1p` respectively)
+of all the elements in a double array. It was done either with a simple loop,
+or with a dedicated `viktor` array method (e.g. `expInPlace`). We also evaluated the
+efficiency of
+[`FastMath`](https://commons.apache.org/proper/commons-math/javadocs/api-3.3/org/apache/commons/math3/util/FastMath.html)
+methods compared to Java's built-in `Math`.
+
+We also measured the performance of `logAddExp` method for adding
+two logarithmically stored arrays. It was compared with the scalar `logAddExp`
+defined in `viktor`.
+
+#### Laptop
+
+![exp laptop array ops/s](./figures/ExpBenchmark_arrayopss_workstation.png)
+![exp laptop flops](./figures/ExpBenchmark_flops_workstation.png)
+![expm1 laptop array ops/s](./figures/Expm1Benchmark_arrayopss_workstation.png)
+![expm1 laptop flops](./figures/Expm1Benchmark_flops_workstation.png)
+![log laptop array ops/s](./figures/LogBenchmark_arrayopss_workstation.png)
+![log laptop flops](./figures/LogBenchmark_flops_workstation.png)
+![log1p laptop array ops/s](./figures/Log1pBenchmark_arrayopss_workstation.png)
+![log1p laptop flops](./figures/Log1pBenchmark_flops_workstation.png)
+![logAddExp laptop array ops/s](./figures/LogAddExpBenchmark_arrayopss_workstation.png)
+![logAddExp laptop flops](./figures/LogAddExpBenchmark_flops_workstation.png)
+
+#### Server
+
+![exp server array ops/s](./figures/ExpBenchmark_arrayopss_server.png)
+![exp server flops](./figures/ExpBenchmark_flops_server.png)
+![expm1 server array ops/s](./figures/Expm1Benchmark_arrayopss_server.png)
+![expm1 server flops](./figures/Expm1Benchmark_flops_server.png)
+![log server array ops/s](./figures/LogBenchmark_arrayopss_server.png)
+![log server flops](./figures/LogBenchmark_flops_server.png)
+![log1p server array ops/s](./figures/Log1pBenchmark_arrayopss_server.png)
+![log1p server flops](./figures/Log1pBenchmark_flops_server.png)
+![logAddExp server array ops/s](./figures/LogAddExpBenchmark_arrayopss_server.png)
+![logAddExp server flops](./figures/LogAddExpBenchmark_flops_server.png)
+
+### Statistics Benchmarks
+
+We tested the `sum()`, `sd()` and `logSumExp()` methods here. We also measured
+the throughput of a dot product of two arrays (`dot()` method). All these benchmarks
+(except for `logSumExp`) don't have a loop-based `FastMath` version since they only
+use arithmetic operations in the loop.
+
+#### Laptop
+
+![sum laptop array ops/s](./figures/SumBenchmark_arrayopss_workstation.png)
+![sum laptop flops](./figures/SumBenchmark_flops_workstation.png)
+![sd laptop array ops/s](./figures/SDBenchmark_arrayopss_workstation.png)
+![sd laptop flops](./figures/SDBenchmark_flops_workstation.png)
+![logSumExp laptop array ops/s](./figures/LogSumExpBenchmark_arrayopss_workstation.png)
+![logSumExp laptop flops](./figures/LogSumExpBenchmark_flops_workstation.png)
+![dot laptop array ops/s](./figures/DotBenchmark_arrayopss_workstation.png)
+![dot laptop flops](./figures/DotBenchmark_flops_workstation.png)
+
+#### Server
+
+![sum server array ops/s](./figures/SumBenchmark_arrayopss_server.png)
+![sum server flops](./figures/SumBenchmark_flops_server.png)
+![sd server array ops/s](./figures/SDBenchmark_arrayopss_server.png)
+![sd server flops](./figures/SDBenchmark_flops_server.png)
+![logSumExp server array ops/s](./figures/LogSumExpBenchmark_arrayopss_server.png)
+![logSumExp server flops](./figures/LogSumExpBenchmark_flops_server.png)
+![dot server array ops/s](./figures/DotBenchmark_arrayopss_server.png)
+![dot server flops](./figures/DotBenchmark_flops_server.png)
+
+## Cautious Conclusions
+
+`viktor` seems to perform up to three times better than the
+regular scalar computation approach. The only exception to that seems to be
+`FastMath.exp()` which is on par `viktor`'s `exp()` method on `AVX`
+(and faster than that on `SSE2`) .
\ No newline at end of file
diff --git a/docs/docs.md b/docs/docs.md
new file mode 100644
index 0000000..9c88b8c
--- /dev/null
+++ b/docs/docs.md
@@ -0,0 +1,489 @@
+# viktor Overview
+
+* [General Concepts](#general-concepts)
+* [Creating Arrays and Accessing Elements](#creating-arrays-and-accessing-elements)
+* [Views](#views)
+* [Flattenable and Dense Arrays](#flattenable-and-dense-arrays)
+* [Copying](#copying)
+* [Arithmetic and Mathematics](#arithmetic-and-mathematics)
+* [Statistics and Special Methods](#statistics-and-special-methods)
+* [Logarithmic Storage](#logarithmic-storage)
+* [Efficiency and SIMD](#efficiency-and-simd)
+* [Examples](#examples)
+
+See [README](../README.md) for instructions on how to install `viktor`.
+
+The section [Examples](#examples) contains instructive code examples with explanations
+and can be used as a tutorial.
+
+See [Benchmarks](./benchmark.md) for informational benchmarking data.
+
+## General Concepts
+
+`viktor` is a Kotlin + JNI library that revolves around a n-dimensional array concept
+similar to that of NumPy’s [ndarray](http://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html).
+
+This array is described by the class `F64Array`. We call 1-dimensional arrays **vectors**
+and 2-dimensional ones **matrices**. Regardless of the number of dimensions, an `F64Array`
+always stores its data in an ordinary `DoubleArray` (`double[]` for Java speakers).
+The elements are laid out in row-major order (also called C-order). For example,
+for a 3-dimensional array `a` with the **shape** of 2x3x2, the elements will be laid out
+in the following order:
+
+    a[0, 0, 0] === data[0]
+    a[0, 0, 1] === data[1]
+    a[0, 1, 0] === data[2]
+    a[0, 1, 1] === data[3]
+    a[0, 2, 0] === data[4]
+    a[0, 2, 1] === data[5]
+    a[1, 0, 0] === data[6]
+    a[1, 0, 1] === data[7]
+    a[1, 1, 0] === data[8]
+    a[1, 1, 1] === data[9]
+    a[1, 2, 0] === data[10]
+    a[1, 2, 1] === data[11]
+
+It’s easy to see that the element `a[i, j, k]` corresponds to the element
+`data[6 * i + 2 * j + 1 * k]`. The values `6`, `2`, `1` are called _strides_,
+since they are the distances separating the elements with neighboring indices.
+
+Two distinct `F64Array`s may be supported by the same underlying `DoubleArray`.
+They can point to intersecting or non-intersecting portions of the underlying array.
+For example, by calling a viewer getter method `b = a.V[1]` we can obtain
+a view of the second 3x2 matrix slice of `a`:
+
+    b[0, 0] === a[1, 0, 0] === data[6]
+    b[0, 1] === a[1, 0, 1] === data[7]
+    b[1, 0] === a[1, 1, 0] === data[8]
+    b[1, 1] === a[1, 1, 1] === data[9]
+    b[2, 0] === a[1, 2, 0] === data[10]
+    b[2, 1] === a[1, 2, 1] === data[11]
+
+For all intents and purposes, `b` is an `F64Array` in its own right.
+It doesn't keep any reference to `a`. The element `b[i, j]` corresponds to the element
+`data[6 + 2 * i + j]`. Value `6` is called *offset*. Any in-place operation on `b` will
+be actually performed on the appropriate region of `data` and thus visible through `a`.
+
+## Creating Arrays and Accessing Elements 
+
+### Accessing elements
+The elements can be accessed through standard get/set methods:
+```kotlin
+println(a[1, 1, 1])
+a[0, 2, 1] = 42.0
+```
+    
+The only currently supported element type is `Double` (`double` in Java). 
+    
+### Creating F64Arrays from scratch
+There is a number of ways to create an F64Array:
+```kotlin
+val a = F64Array(2, 3, 2) // zero-filled 2x3x2 array
+val b = F64Array.full(2, 3, 2, init = 3.14) // 2x3x2 array filled with 3.14
+val c = F64Array(8, 8) { i, j -> if (i == j) 1.0 else 0.0 } // create an 8x8 unit matrix; this method is available for vectors, matrices and 3D arrays
+val d = F64Array.of(3.14, 2.78, 1.41) // creates a vector
+```
+
+### Creating F64Arrays from Kotlin/Java arrays
+```kotlin
+val a: DoubleArray = ...
+val b = a.asF64Array() // a wrapper method: creates a vector that uses [a] as data storage; only works for vectors
+val c: Array<Array<DoubleArray>> = ...
+val d = c.toF64Array() // a copying method: allocates a [DoubleArray] for storage and copies the values from [g] to recreate its structure: g[i][j][k] == h[i, j, k]; only works for matrices and above
+```
+### Retrieving Kotlin/Java array from F64Array
+```kotlin
+val a: F64Array = ...
+val b: DoubleArray = a.data // retrieves the underlying storage [DoubleArray]; may or may not represent all of [b]’s elements
+val c: DoubleArray = a.toDoubleArray() // a copying method; converts the vector into a [DoubleArray]: a[i] == c[i]; only works for vectors
+val d: Array<*> = a.toGenericArray() // a copying method; converts the array to a corresponding Kotlin/Java structure ([Array<DoubleArray>] for matrices, [Array<Array<DoubleArray>>] for 3D arrays etc.); only works for matrices and above
+val e: Any = a.toArray() // returns either a [DoubleArray] or an [Array<*>], depending on the number of dimensions of [a]
+```
+
+## Views
+A **view** of an `F64Array` is another `F64Array` which uses the same `DoubleArray`
+for storage. For example, a matrix column is a view of the matrix.
+
+Each `F64Array` has a special property `V`, called **viewer**. It enables a more idiomatic way
+to obtain **views**. In this case, you can use a special `_I` object to signify
+"skip this axis", see the examples below.
+
+```kotlin
+val a = F64Array(2, 3, 2) // 2x3x2 array
+val b = a.view(1) // view the second 3x2 matrix slice of [a]: b[i, j] === a[1, i, j]
+val b1 = a.V[1] // another way of doing the same
+val c = a.view(0, 1) // view a 2x2 matrix such that c[i, j] === a[i, 0, j]
+val c1 = a.V[_I, 0] // another way of doing the same; [_I] indicates that the first axis should be skipped
+val d = a.view(1, 2) // view a 2x3 matrix such that d[i, i] === a[i, j, 1]
+val e = a.along(1) // a sequence of 2x2 views: a.view(0, 1), a.view(1, 1), a.view(2, 1)
+```
+
+The viewer also has setter methods:
+
+```kotlin
+a.V[1] = d // copies the contents of [d] into the second 3x2 matrix slice of [a]
+a.V[_I, 0] = 42.0 // replaces the corresponding elements of [a] with the value 42.0
+a.V[_I] = 3.14 // replaces all elements of [a] with the value 3.14 
+```
+
+More sophisticated views can be generated by **slicing**:
+
+```kotlin
+val f = a.slice(from = 0, to = 2, axis = 1) // f[i, j, k] === a[i, j, k], but [f] has a shape 2x2x2
+val g = a.slice(from = 1, to = 3, axis = 1) // g[i, j, k] === a[i, j + 1, k], and [g] has a shape 2x2x2
+val h = a.slice(from = 0, step = 2) // h[i, 0, k] === a[i, 0, k], h[i, 1, k] === a[i, 2, k] 
+```
+
+## Flattenable and Dense Arrays
+Let's explore some views from the [previous section](#views) in detail:
+```kotlin
+val a = F64Array(2, 3, 2) // 2x3x2 array
+val b = a.view(1) // view the second 3x2 matrix slice of [a]: b[i, j] === a[1, i, j]
+val c = a.view(0, 1) // view a 2x2 matrix such that c[i, j] === a[i, 0, j]
+val d = a.view(1, 2) // view a 2x3 matrix such that d[i, i] === a[i, j, 1]
+```
+
+These three views have different properties.
+
+    b[0, 0] === a[1, 0, 0] === data[6]
+    b[0, 1] === a[1, 0, 1] === data[7]
+    b[1, 0] === a[1, 1, 0] === data[8]
+    b[1, 1] === a[1, 1, 1] === data[9]
+    b[2, 0] === a[1, 2, 0] === data[10]
+    b[2, 1] === a[1, 2, 1] === data[11]
+
+`b` owns a contiguous region of `data`. It is thus called **dense**.
+
+    d[0, 0] === a[0, 0, 1] === data[1]
+    d[0, 1] === a[0, 1, 1] === data[3]
+    d[0, 2] === a[0, 2, 1] === data[5]
+    d[1, 0] === a[1, 0, 1] === data[7]
+    d[1, 1] === a[1, 1, 1] === data[9]
+    d[1, 2] === a[1, 2, 1] === data[11]
+
+`d` doesn't own a contiguous region of `data`, but its entries are equidistant.
+This means there is a 6-element vector that owns the exact same elements as `d`.
+This vector can be obtained by calling `flatten()`:
+```kotlin
+val e = d.flatten()
+```
+
+    e[0] === d[0, 0] === a[0, 0, 1] === data[1]
+    e[1] === d[0, 1] === a[0, 1, 1] === data[3]
+    e[2] === d[0, 2] === a[0, 2, 1] === data[5]
+    e[3] === d[1, 0] === a[1, 0, 1] === data[7]
+    e[4] === d[1, 1] === a[1, 1, 1] === data[9]
+    e[5] === d[1, 2] === a[1, 2, 1] === data[11]
+
+`b` and `d` are thus both **flattenable** arrays. It can be confirmed by looking
+at the property `isFlattenable`.
+
+    c[0, 0] === a[0, 0, 0] === data[0]
+    c[0, 1] === a[0, 0, 1] === data[1]
+    c[1, 0] === a[1, 0, 0] === data[6]
+    c[1, 1] === a[1, 0, 1] === data[7]
+
+`c` doesn't own a contiguous region of `data`, and its entries are not equidistant.
+It is thus **not flattenable**. `c.isFlattenable` will return `false`, and `c.flatten()`
+will fail with an exception.
+
+Flattenable arrays (especially dense ones) generally allow a more efficient traversal,
+since all their elements can be accessed in a single loop.
+
+### Playing with shape
+Flattenable arrays can be easily **reshaped**, provided that their number of elements
+stays the same. The reshaped array is a view of the original; it can access
+the same elements.
+
+```kotlin
+val a = F64Array(2, 3, 2)
+val b = a.reshape(12) // the same as a.flatten()
+val c = a.reshape(2, 6) // essentially flattens the 3x2 matrix slices into 6-element vectors
+val d = a.reshape(3, 4) // no elegant interpretation, just the same elements in the same order
+```
+
+## Copying
+If desired, you may create a **copy** of an array that is completely separate
+from the original. The changes to the copy won't propagate to the original (and vice versa),
+because the underlying storage arrays will be different.
+
+```kotlin
+val a = F64Array.full(2, 3, 2, init = 2.78)
+val b = a.copy()
+b.fill(3.14) // doesn't affect [a]
+```
+
+You can also copy the array contents to another array of the same shape:
+
+```kotlin
+b.copyTo(a)
+a.V[_I] = b // has the exact same effect
+```
+
+The copy will always be dense (see [Flattenable and Dense Arrays](#flattenable-and-dense-arrays)),
+even if the original was not.
+
+## Arithmetic and mathematics
+If two arrays have the same exact shape, you can use ordinary  arithmetic operators on them:
+```kotlin
+val a = F64Array.full(2, 3, 2, init = 2.78)
+val b = F64Array.full(2, 3, 2, init = 3.14)
+val c = a + b // elementwise addition
+val d = a / b // elementwise division
+```
+
+These operators are **copying**; they will return a new 2x3x2 array with sums or ratios
+respectively.
+
+```kotlin
+a -= b // in-place elementwise subtraction
+b *= b // in-place elementwise multiplication
+```
+
+These operators are **in-place**; they will modify the elements of the left part.
+
+You can also use the verbose methods, for example:
+
+```kotlin
+val c = a.div(b)
+b.timesAssign(b)
+```
+
+You can also do the same operations with scalar values, i.e. `Double`s:
+
+```kotlin
+val e = a * 42.0
+val f = 1.0 / b // yes, this works! we have an extension method Double.div(F64Array)!
+```
+
+For more sophisticated examples, you have (equally elementwise) mathematical methods
+at your disposal:
+
+```kotlin
+val g = a.exp() // elementwise natural exponent, copying
+g.logInPlace() // elementwise natural logarithm, in-place
+```
+
+The operations `exp`, `expm1`, `log`, `log1p` are available as both copying
+and in-place methods.
+
+For vectors, we have a scalar multiplication method `dot()`:
+
+```kotlin
+val v1 = F64Array(1000)
+val v2 = F64Array(1000)
+val v1v2: Double = v1.dot(v2)
+```
+
+## Statistics and Special Methods
+You can calculate some statistical values:
+```kotlin
+val a = F64Array(2, 3, 2)
+val s: Double = a.sum() // what it says
+val m: Double = a.mean() // arithmetic mean
+val sd: Double = a.sd() // standard deviation
+val aMax = a.max() // maximum value
+```
+
+For a vector, there is an in-place method for calculating cumulative sums:
+
+```kotlin
+val a = F64Array(1000)
+a.cumSum() // after this, each element a[n] is equal to the sum a[0] + … + a[n] of the original values of [a]
+```
+
+Also for a vector, you can get the index of the maximum / minimum element
+using `argMax()` / `argMin()` respectively, as well as an arbitrary quantile:
+
+```kotlin
+val median = a.quantile(0.5) // note that this is a destructive method, which will shuffle the array
+```
+
+You can rescale an array so that its entries sum to one with an in-place method `rescale()`:
+
+```kotlin
+val a = F64Array.of(3.14, 2.78)
+a.rescale() // equivalent to a /= a.sum() but more efficient and precise
+```
+
+## Logarithmic Storage
+In statistics and machine learning, it’s frequently useful to store the logarithms of values
+instead of the values themselves (e.g. when dealing with distribution density).
+The logarithms make multiplication and division easy (just add or subtract the logarithms),
+but hinder the summation. To this end, we have a special infix method `logAddExp`:
+
+```kotlin
+val logA = F64Array(2, 3, 2, init = ln(3.14))
+val logB = F64Array(2, 3, 2, init = ln(2.78))
+val logC = logA logAddExp logB
+```
+
+Like the name suggests, `logAddExp` is equivalent to writing
+`(logA.exp() + logB.exp()).log()` but generally more precise and efficient
+and less memory-consuming. There's also an in-place variant, `logA.logAddExpAssign(logB)`.
+
+You can also sum the entire logarithmically stored array:
+
+```kotlin
+val logSum: Double = logA.logSumExp()
+```
+
+This is again equivalent to (but better than) `ln(logA.exp().sum())`.
+
+There is also a version of `rescale()` for logarithmically stored arrays:
+
+```kotlin
+logA.logRescale() // equivalent to a -= a.logSumExp(), but more efficient and precise
+```
+
+## Efficiency and SIMD
+### Excessive Copying
+You might want to keep the number of copying events to a minimum,
+especially when working with very large arrays. Copying requires allocation and
+(later) disposal of large amounts of memory.
+
+There is one general tip for less copying: use in-place operations whenever possible.
+For example, while
+
+```kotlin
+val b = (a + 1.0) / 2.0
+```
+
+is equivalent to
+
+```kotlin
+val b = a + 1.0
+b /= 2.0
+```
+
+the latter approach avoids one copy allocation and disposal by reusing an array.
+The effect is even more pronounced with a longer operation chain.
+
+### Efficient Traversal
+Flattenable arrays can be traversed in a single loop, so they naturally enjoy
+faster computations. Try to avoid large non-flattenable arrays.
+Moreover, dense arrays benefit from native SIMD optimizations and faster copying,
+so these are the natural choice for efficient calculations.
+See [Flattenable and Dense Arrays](#flattenable-and-dense-arrays) for more details.
+
+### SIMD
+`SIMD` stands for "single instruction, multiple data". It’s a broad family
+of CPU instruction set extensions, including `MMX`, `SSE`, `AVX` and others.
+These extensions allow using extended CPU registers containing multiple data units.
+For example, `AVX` defines 256-bit registers that can operate on
+four 64-bit floating-point numbers with one instruction.
+
+`viktor` supports SIMD extensions by providing a set of binary JNI libraries
+built specifically for several target platforms and extension sets.
+We build and include the following libraries for `amd64` (also called `x86_64`)
+instruction set extensions:
+
+amd64 (x86_64) | SSE2 | AVX
+---------------|------|----
+Windows | + | +
+Linux | + | +
+macOS | + | +
+
+These libraries are only compatible with 64-bit JVM. In all other cases,
+`viktor` doesn't use JNI, opting for Kotlin/Java operations instead.
+
+## Examples
+
+### Factorials 
+
+Suppose you want a 1000-element vector `f` such that `f[i] == i!`, i.e.
+a vector of factorials. Is there an efficient way to do this with `viktor`? Of course!
+
+```kotlin
+val f = F64Array(1000) { it.toDouble() } // fill vector f with values so that f[i] == i
+f[0] = 1.0 // gotta deal with that edge case
+f.logInPlace() // replace all values with their natural logarithms; f[i] == log(i) for i > 0
+f.cumSum() // replace all values with cumulative sums; f[i] == sum_{k=1}^i log(k) for i > 0
+f.expInPlace() // replace all values with their exponents; f[i] == exp(sum_{k=1}^i log(k)) == prod_{k=1}^i k == i!
+```
+
+Bingo! You have a vector of factorials. Sadly, most of these are equal to positive infinity
+due to floating-point overflow. You might consider skipping the last exponentiation
+and keeping the logarithms of the factorials, which are much less prone to overflow.
+See [Logarithmic Storage](#logarithmic-storage) for more details.
+
+### Gaussian Density
+
+Suppose you have a vector `o` of *i.i.d.* random observations, and you want to calculate
+the vector of probability density `d` under the standard normal distribution `N(0, 1)`.
+Since the density formula is `p(x) = sqrt(2 * pi) * exp(-x^2/2)`, we can do the following:
+
+```kotlin
+val d = sqrt(2 * PI) * (- o * o / 2.0).exp()
+```
+
+This produces the desired results, but is not very efficient: each of the `*`, `/`, `-`,
+`exp` and `*` creates a copy, only the last of which is retained as `d`.
+That’s four copies too many. Let’s consider a better approach:
+
+```kotlin
+val d = o * o // d == o^2
+d /= -2.0 // d == -o^2 / 2
+d.expInPlace() // d == exp(-o^2 / 2)
+d *= sqrt(2 * PI) // d == sqrt(2 * pi) * exp(-o^2 / 2)
+```
+
+Voila! No extra copies created. However, if some observations have large absolute values,
+the exponent might underflow, leaving us with a zero density. It’s thus better to store
+and operate on log densities! Since `log p(x) = 1/2 log (2 * pi) - x^2 / 2`, we can write:
+
+```kotlin
+val logD = o * o
+logD /= -2.0
+logD += 0.5 * ln(2 * PI)
+```
+
+What is the total log likelihood of `o` under the standard normal distribution `N(0, 1)`?
+Why, it’s `logD.sum()`, naturally. (Since likelihood is a product of densities,
+log likelihood is a sum of log densities.)
+
+### Gaussian mixture
+
+We now suspect our observation vector `o` actually came from a uniform mixture
+of two normal distributions, `N(0, 1)` and `N(3, 1)`. What is the total likelihood now?
+What is the most probable sequence of components?
+
+Let's create a matrix `logP` to hold the probabilities of observations
+belonging to components: `logP[i, j] = log(P(o_j, c_j = i))`, where `c_j` is the mixture
+component responsible for the `j`th observation (either `0` or `1`). Using conditional
+probabilities:
+
+    P(o_j, c_j = i) = P(o_j | c_j = i) * P(c_j = i)
+    
+Let's also try to use array-wide operations as much as possible.  
+
+```kotlin
+val oColumn = o.reshape(1, o.size) // create a 1x1000 matrix view of [o]
+val logP = F64Array.concatenate(oColumn, oColumn) // concatenate into a 2x1000 array; logP[i, j] == o_j
+logP.V[1] -= 3.0 // logP[i, j] == o_j - μ_i
+logP *= logP // logP[i, j] == (o_j - μ_i) ^ 2
+logP /= -2.0 // logP[i, j] == - (o_j - μ_i) ^ 2 / 2
+logP += 0.5 * ln(2 * PI) + ln(0.5) // logP[i, j] == - (o_j - μ_i) ^ 2 / 2 + 1/2 log(2 * pi) + log(1/2) == log(P(o_j | c_j = i) * P(c_j = i))
+```
+
+Now we can answer our questions. The likelihood of an observation is
+
+    P(o_j) = Σ_i P(o_j, c_j = i)
+    
+thus we can easily obtain a vector of log-likelihoods by log-summing the columns of `logP`:
+
+```kotlin
+val logL = logP.V[0] logAddExp logP.V[1]
+```
+
+The total log-likelihood is equal to `logL.sum()`, like in the previous example.
+
+To obtain the most probable sequence of mixture components, we just need to pick the one
+with the greatest probability for each observation:
+
+```kotlin
+val components = logP.along(1).map { it.argMax() } // a sequence of 0s and 1s
+```
+
+This generates a sequence of matrix rows (`along(1)`) and picks the maximum element
+index for each row.
\ No newline at end of file
diff --git a/docs/figures/DotBenchmark_arrayopss_server.png b/docs/figures/DotBenchmark_arrayopss_server.png
new file mode 100644
index 0000000..d8ceadb
Binary files /dev/null and b/docs/figures/DotBenchmark_arrayopss_server.png differ
diff --git a/docs/figures/DotBenchmark_arrayopss_workstation.png b/docs/figures/DotBenchmark_arrayopss_workstation.png
new file mode 100644
index 0000000..298828e
Binary files /dev/null and b/docs/figures/DotBenchmark_arrayopss_workstation.png differ
diff --git a/docs/figures/DotBenchmark_flops_server.png b/docs/figures/DotBenchmark_flops_server.png
new file mode 100644
index 0000000..853b0fe
Binary files /dev/null and b/docs/figures/DotBenchmark_flops_server.png differ
diff --git a/docs/figures/DotBenchmark_flops_workstation.png b/docs/figures/DotBenchmark_flops_workstation.png
new file mode 100644
index 0000000..b18f7e0
Binary files /dev/null and b/docs/figures/DotBenchmark_flops_workstation.png differ
diff --git a/docs/figures/ExpBenchmark_arrayopss_server.png b/docs/figures/ExpBenchmark_arrayopss_server.png
new file mode 100644
index 0000000..5bb2f07
Binary files /dev/null and b/docs/figures/ExpBenchmark_arrayopss_server.png differ
diff --git a/docs/figures/ExpBenchmark_arrayopss_workstation.png b/docs/figures/ExpBenchmark_arrayopss_workstation.png
new file mode 100644
index 0000000..41412ef
Binary files /dev/null and b/docs/figures/ExpBenchmark_arrayopss_workstation.png differ
diff --git a/docs/figures/ExpBenchmark_flops_server.png b/docs/figures/ExpBenchmark_flops_server.png
new file mode 100644
index 0000000..71d3ff1
Binary files /dev/null and b/docs/figures/ExpBenchmark_flops_server.png differ
diff --git a/docs/figures/ExpBenchmark_flops_workstation.png b/docs/figures/ExpBenchmark_flops_workstation.png
new file mode 100644
index 0000000..179d8f8
Binary files /dev/null and b/docs/figures/ExpBenchmark_flops_workstation.png differ
diff --git a/docs/figures/Expm1Benchmark_arrayopss_server.png b/docs/figures/Expm1Benchmark_arrayopss_server.png
new file mode 100644
index 0000000..696fbe7
Binary files /dev/null and b/docs/figures/Expm1Benchmark_arrayopss_server.png differ
diff --git a/docs/figures/Expm1Benchmark_arrayopss_workstation.png b/docs/figures/Expm1Benchmark_arrayopss_workstation.png
new file mode 100644
index 0000000..eaf2215
Binary files /dev/null and b/docs/figures/Expm1Benchmark_arrayopss_workstation.png differ
diff --git a/docs/figures/Expm1Benchmark_flops_server.png b/docs/figures/Expm1Benchmark_flops_server.png
new file mode 100644
index 0000000..aca71bc
Binary files /dev/null and b/docs/figures/Expm1Benchmark_flops_server.png differ
diff --git a/docs/figures/Expm1Benchmark_flops_workstation.png b/docs/figures/Expm1Benchmark_flops_workstation.png
new file mode 100644
index 0000000..e6842cd
Binary files /dev/null and b/docs/figures/Expm1Benchmark_flops_workstation.png differ
diff --git a/docs/figures/Log1pBenchmark_arrayopss_server.png b/docs/figures/Log1pBenchmark_arrayopss_server.png
new file mode 100644
index 0000000..3236b03
Binary files /dev/null and b/docs/figures/Log1pBenchmark_arrayopss_server.png differ
diff --git a/docs/figures/Log1pBenchmark_arrayopss_workstation.png b/docs/figures/Log1pBenchmark_arrayopss_workstation.png
new file mode 100644
index 0000000..5eb03c7
Binary files /dev/null and b/docs/figures/Log1pBenchmark_arrayopss_workstation.png differ
diff --git a/docs/figures/Log1pBenchmark_flops_server.png b/docs/figures/Log1pBenchmark_flops_server.png
new file mode 100644
index 0000000..fb1f7ee
Binary files /dev/null and b/docs/figures/Log1pBenchmark_flops_server.png differ
diff --git a/docs/figures/Log1pBenchmark_flops_workstation.png b/docs/figures/Log1pBenchmark_flops_workstation.png
new file mode 100644
index 0000000..4468142
Binary files /dev/null and b/docs/figures/Log1pBenchmark_flops_workstation.png differ
diff --git a/docs/figures/LogAddExpBenchmark_arrayopss_server.png b/docs/figures/LogAddExpBenchmark_arrayopss_server.png
new file mode 100644
index 0000000..07e03b0
Binary files /dev/null and b/docs/figures/LogAddExpBenchmark_arrayopss_server.png differ
diff --git a/docs/figures/LogAddExpBenchmark_arrayopss_workstation.png b/docs/figures/LogAddExpBenchmark_arrayopss_workstation.png
new file mode 100644
index 0000000..a343698
Binary files /dev/null and b/docs/figures/LogAddExpBenchmark_arrayopss_workstation.png differ
diff --git a/docs/figures/LogAddExpBenchmark_flops_server.png b/docs/figures/LogAddExpBenchmark_flops_server.png
new file mode 100644
index 0000000..10a8c24
Binary files /dev/null and b/docs/figures/LogAddExpBenchmark_flops_server.png differ
diff --git a/docs/figures/LogAddExpBenchmark_flops_workstation.png b/docs/figures/LogAddExpBenchmark_flops_workstation.png
new file mode 100644
index 0000000..c546f02
Binary files /dev/null and b/docs/figures/LogAddExpBenchmark_flops_workstation.png differ
diff --git a/docs/figures/LogBenchmark_arrayopss_server.png b/docs/figures/LogBenchmark_arrayopss_server.png
new file mode 100644
index 0000000..1553909
Binary files /dev/null and b/docs/figures/LogBenchmark_arrayopss_server.png differ
diff --git a/docs/figures/LogBenchmark_arrayopss_workstation.png b/docs/figures/LogBenchmark_arrayopss_workstation.png
new file mode 100644
index 0000000..cfdb272
Binary files /dev/null and b/docs/figures/LogBenchmark_arrayopss_workstation.png differ
diff --git a/docs/figures/LogBenchmark_flops_server.png b/docs/figures/LogBenchmark_flops_server.png
new file mode 100644
index 0000000..0c354c9
Binary files /dev/null and b/docs/figures/LogBenchmark_flops_server.png differ
diff --git a/docs/figures/LogBenchmark_flops_workstation.png b/docs/figures/LogBenchmark_flops_workstation.png
new file mode 100644
index 0000000..254c672
Binary files /dev/null and b/docs/figures/LogBenchmark_flops_workstation.png differ
diff --git a/docs/figures/LogSumExpBenchmark_arrayopss_server.png b/docs/figures/LogSumExpBenchmark_arrayopss_server.png
new file mode 100644
index 0000000..c949fc6
Binary files /dev/null and b/docs/figures/LogSumExpBenchmark_arrayopss_server.png differ
diff --git a/docs/figures/LogSumExpBenchmark_arrayopss_workstation.png b/docs/figures/LogSumExpBenchmark_arrayopss_workstation.png
new file mode 100644
index 0000000..7526a4a
Binary files /dev/null and b/docs/figures/LogSumExpBenchmark_arrayopss_workstation.png differ
diff --git a/docs/figures/LogSumExpBenchmark_flops_server.png b/docs/figures/LogSumExpBenchmark_flops_server.png
new file mode 100644
index 0000000..99223ec
Binary files /dev/null and b/docs/figures/LogSumExpBenchmark_flops_server.png differ
diff --git a/docs/figures/LogSumExpBenchmark_flops_workstation.png b/docs/figures/LogSumExpBenchmark_flops_workstation.png
new file mode 100644
index 0000000..4389782
Binary files /dev/null and b/docs/figures/LogSumExpBenchmark_flops_workstation.png differ
diff --git a/docs/figures/SDBenchmark_arrayopss_server.png b/docs/figures/SDBenchmark_arrayopss_server.png
new file mode 100644
index 0000000..06c2b86
Binary files /dev/null and b/docs/figures/SDBenchmark_arrayopss_server.png differ
diff --git a/docs/figures/SDBenchmark_arrayopss_workstation.png b/docs/figures/SDBenchmark_arrayopss_workstation.png
new file mode 100644
index 0000000..80fc8f0
Binary files /dev/null and b/docs/figures/SDBenchmark_arrayopss_workstation.png differ
diff --git a/docs/figures/SDBenchmark_flops_server.png b/docs/figures/SDBenchmark_flops_server.png
new file mode 100644
index 0000000..ed7e513
Binary files /dev/null and b/docs/figures/SDBenchmark_flops_server.png differ
diff --git a/docs/figures/SDBenchmark_flops_workstation.png b/docs/figures/SDBenchmark_flops_workstation.png
new file mode 100644
index 0000000..208324b
Binary files /dev/null and b/docs/figures/SDBenchmark_flops_workstation.png differ
diff --git a/docs/figures/SumBenchmark_arrayopss_server.png b/docs/figures/SumBenchmark_arrayopss_server.png
new file mode 100644
index 0000000..3268ade
Binary files /dev/null and b/docs/figures/SumBenchmark_arrayopss_server.png differ
diff --git a/docs/figures/SumBenchmark_arrayopss_workstation.png b/docs/figures/SumBenchmark_arrayopss_workstation.png
new file mode 100644
index 0000000..fe35021
Binary files /dev/null and b/docs/figures/SumBenchmark_arrayopss_workstation.png differ
diff --git a/docs/figures/SumBenchmark_flops_server.png b/docs/figures/SumBenchmark_flops_server.png
new file mode 100644
index 0000000..4ff7079
Binary files /dev/null and b/docs/figures/SumBenchmark_flops_server.png differ
diff --git a/docs/figures/SumBenchmark_flops_workstation.png b/docs/figures/SumBenchmark_flops_workstation.png
new file mode 100644
index 0000000..c706fae
Binary files /dev/null and b/docs/figures/SumBenchmark_flops_workstation.png differ
diff --git a/gradle.properties b/gradle.properties
index 60acae2..aef125e 100644
--- a/gradle.properties
+++ b/gradle.properties
@@ -1 +1 @@
-version=0.5.3
+version=1.0.0
diff --git a/src/jmh/java/org/jetbrains/bio/viktor/AbstractMathBenchmark.java b/src/jmh/java/org/jetbrains/bio/viktor/AbstractMathBenchmark.java
new file mode 100644
index 0000000..ade42c3
--- /dev/null
+++ b/src/jmh/java/org/jetbrains/bio/viktor/AbstractMathBenchmark.java
@@ -0,0 +1,82 @@
+package org.jetbrains.bio.viktor;
+
+import org.apache.commons.math3.util.Precision;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.Setup;
+import org.openjdk.jmh.annotations.TearDown;
+import org.openjdk.jmh.infra.Blackhole;
+
+import java.util.function.DoubleUnaryOperator;
+
+public abstract class AbstractMathBenchmark {
+
+    private double[] src;
+    private double[] dst;
+
+    final private DoubleUnaryOperator regularOp;
+    final private DoubleUnaryOperator fastOp;
+    final private VectorOp vectorOp;
+
+    AbstractMathBenchmark() {
+        regularOp = getRegularOp();
+        fastOp = getFastOp();
+        vectorOp = getVectorOp();
+    }
+
+    abstract DoubleUnaryOperator getRegularOp();
+    abstract DoubleUnaryOperator getFastOp();
+    abstract VectorOp getVectorOp();
+    abstract int getArraySize();
+
+
+    @Setup
+    public void generateData() {
+        src = new double[getArraySize()];
+        dst = new double[getArraySize()];
+        Internal.sampleUniformGamma(src);
+    }
+
+    @TearDown
+    public void checkAnswer() {
+        for (int i = 0; i < getArraySize(); i++) {
+            final double expected = regularOp.applyAsDouble(src[i]);
+            if (!Precision.equals(expected, dst[i], 5)) {
+                throw new IllegalStateException(String.format(
+                        "f(%s) = %s (instead of %s)",
+                        src[i], dst[i], expected));
+            }
+        }
+    }
+
+    private void scalar(final Blackhole bh, final DoubleUnaryOperator op) {
+        System.arraycopy(src, 0, dst, 0, getArraySize()); // let's be fair
+        for (int i = 0; i < src.length; i++) {
+            dst[i] = op.applyAsDouble(src[i]);
+        }
+        bh.consume(dst);
+    }
+
+    @Benchmark
+    public void scalarFastMath(final Blackhole bh) {
+        scalar(bh, fastOp);
+    }
+
+    @Benchmark
+    public void scalarMath(final Blackhole bh) {
+        scalar(bh, regularOp);
+    }
+
+    @Benchmark
+    public void vector(final Blackhole bh) {
+        System.arraycopy(src, 0, dst, 0, getArraySize());
+        vectorOp.apply(dst, 0, getArraySize());
+        bh.consume(dst);
+    }
+
+}
+
+interface VectorOp {
+
+    void apply(final double[] array, final int offset, final int size);
+
+}
\ No newline at end of file
diff --git a/src/jmh/java/org/jetbrains/bio/viktor/DotBenchmark.java b/src/jmh/java/org/jetbrains/bio/viktor/DotBenchmark.java
new file mode 100644
index 0000000..830cf85
--- /dev/null
+++ b/src/jmh/java/org/jetbrains/bio/viktor/DotBenchmark.java
@@ -0,0 +1,54 @@
+package org.jetbrains.bio.viktor;
+
+import org.apache.commons.math3.util.Precision;
+import org.openjdk.jmh.annotations.*;
+import org.openjdk.jmh.infra.Blackhole;
+
+import java.util.concurrent.TimeUnit;
+
+@BenchmarkMode(Mode.Throughput)
+@OutputTimeUnit(TimeUnit.SECONDS)
+@State(Scope.Benchmark)
+@Warmup(iterations = 5)
+@Measurement(iterations = 10)
+@Fork(value = 2, jvmArgsPrepend = "-Djava.library.path=./build/libs")
+public class DotBenchmark {
+
+	@Param({"1000", "100000", "1000000"})
+	int arraySize;
+	private double[] src1;
+	private double[] src2;
+	private double res;
+
+	@Setup
+	public void generateData() {
+		src1 = new double[arraySize];
+		Internal.sampleUniformGamma(src1);
+		src2 = new double[arraySize];
+		Internal.sampleUniformGamma(src2);
+	}
+
+	@TearDown
+	public void checkAnswer() {
+		final double stored = res;
+		scalar(null);
+		if (!Precision.equals(stored, res, (stored + res) * 1E-12)) {
+			throw new IllegalStateException(String.format("expected %s, got %s", res, stored));
+		}
+	}
+
+	@Benchmark
+	public void scalar(final Blackhole bh) {
+		res = 0.;
+		for (int i = 0; i < arraySize; i++) {
+			res += src1[i] * src2[i];
+		}
+		if (bh != null) bh.consume(res);
+	}
+
+	@Benchmark
+	public void vector(final Blackhole bh) {
+		res = NativeSpeedups.INSTANCE.unsafeDot(src1, 0, src2, 0, arraySize);
+		bh.consume(res);
+	}
+}
diff --git a/src/jmh/java/org/jetbrains/bio/viktor/ExpBenchmark.java b/src/jmh/java/org/jetbrains/bio/viktor/ExpBenchmark.java
index a5a5f40..77d1fe3 100644
--- a/src/jmh/java/org/jetbrains/bio/viktor/ExpBenchmark.java
+++ b/src/jmh/java/org/jetbrains/bio/viktor/ExpBenchmark.java
@@ -1,9 +1,7 @@
 package org.jetbrains.bio.viktor;
 
 import org.apache.commons.math3.util.FastMath;
-import org.apache.commons.math3.util.Precision;
 import org.openjdk.jmh.annotations.*;
-import org.openjdk.jmh.infra.Blackhole;
 
 import java.util.concurrent.TimeUnit;
 import java.util.function.DoubleUnaryOperator;
@@ -11,49 +9,31 @@
 @BenchmarkMode(Mode.Throughput)
 @OutputTimeUnit(TimeUnit.SECONDS)
 @State(Scope.Benchmark)
-@Warmup(iterations = 10)
-@Measurement(iterations = 5)
+@Warmup(iterations = 5)
+@Measurement(iterations = 10)
 @Fork(value = 2, jvmArgsPrepend = "-Djava.library.path=./build/libs")
-public class ExpBenchmark {
-    @Param({"1000", "10000", "100000"})
+public class ExpBenchmark extends AbstractMathBenchmark {
+
+    @Param({"1000", "100000", "1000000"})
     int arraySize;
-    double[] src;
-    double[] dst;
-
-    @Setup
-    public void generateData() {
-        src = new double[arraySize];
-        dst = new double[arraySize];
-        Internal.sampleUniformGamma(src);
-    }
 
-    @TearDown
-    public void checkAnswer() {
-        for (int i = 0; i < arraySize; i++) {
-            final double expected = Math.exp(src[i]);
-            if (!Precision.equals(expected, dst[i], 5)) {
-                throw new IllegalStateException(String.format(
-                        "exp(%s) = %s (instead of %s)",
-                        src[i], dst[i], expected));
-            }
-        }
+    @Override
+    DoubleUnaryOperator getRegularOp() {
+        return Math::exp;
     }
 
-    @Benchmark
-    public void scalarFastMathExp(final Blackhole bh) {
-        transform(src, dst, FastMath::exp);
-        bh.consume(dst);
+    @Override
+    DoubleUnaryOperator getFastOp() {
+        return FastMath::exp;
     }
 
-    @Benchmark
-    public void vectorExp(final Blackhole bh) {
-        NativeSpeedups.INSTANCE.unsafeExp(src, 0, dst, 0, arraySize);
-        bh.consume(dst);
+    @Override
+    VectorOp getVectorOp() {
+        return NativeSpeedups.INSTANCE::unsafeExpInPlace;
     }
 
-    private void transform(double[] src, double[] dst, final DoubleUnaryOperator op) {
-        for (int i = 0; i < src.length; i++) {
-            dst[i] = op.applyAsDouble(src[i]);
-        }
+    @Override
+    int getArraySize() {
+        return arraySize;
     }
 }
diff --git a/src/jmh/java/org/jetbrains/bio/viktor/Expm1Benchmark.java b/src/jmh/java/org/jetbrains/bio/viktor/Expm1Benchmark.java
index 93554c1..b8762ed 100644
--- a/src/jmh/java/org/jetbrains/bio/viktor/Expm1Benchmark.java
+++ b/src/jmh/java/org/jetbrains/bio/viktor/Expm1Benchmark.java
@@ -11,56 +11,32 @@
 @BenchmarkMode(Mode.Throughput)
 @OutputTimeUnit(TimeUnit.SECONDS)
 @State(Scope.Benchmark)
-@Warmup(iterations = 10)
-@Measurement(iterations = 5)
+@Warmup(iterations = 5)
+@Measurement(iterations = 10)
 @Fork(value = 2, jvmArgsPrepend = "-Djava.library.path=./build/libs")
-public class Expm1Benchmark {
-    @Param({"1000", "10000", "100000"})
-    int arraySize;
-    double[] src;
-    double[] dst;
-
-    @Setup
-    public void generateData() {
-        src = new double[arraySize];
-        dst = new double[arraySize];
-        Internal.sampleUniformGamma(src);
-    }
+public class Expm1Benchmark extends AbstractMathBenchmark {
 
-    @TearDown
-    public void checkAnswer() {
-        for (int i = 0; i < arraySize; i++) {
-            final double expected = Math.expm1(src[i]);
-            if (!Precision.equals(expected, dst[i], 5)) {
-                throw new IllegalStateException(String.format(
-                        "expm1(%s) = %s (instead of %s)",
-                        src[i], dst[i], expected));
-            }
-        }
-    }
+    @Param({"1000", "100000", "1000000"})
+    int arraySize;
 
-    @Benchmark
-    public void scalarFastMath(final Blackhole bh) {
-        transform(src, dst, FastMath::expm1);
-        bh.consume(dst);
+    @Override
+    DoubleUnaryOperator getRegularOp() {
+        return Math::expm1;
     }
 
-    @Benchmark
-    public void scalarMath(final Blackhole bh) {
-        transform(src, dst, Math::expm1);
-        bh.consume(dst);
+    @Override
+    DoubleUnaryOperator getFastOp() {
+        return FastMath::expm1;
     }
 
-    @Benchmark
-    public void vector(final Blackhole bh) {
-        NativeSpeedups.INSTANCE.unsafeExpm1(src, 0, dst, 0, arraySize);
-        bh.consume(dst);
+    @Override
+    VectorOp getVectorOp() {
+        return NativeSpeedups.INSTANCE::unsafeExpm1InPlace;
     }
 
-    private void transform(double[] src, double[] dst, final DoubleUnaryOperator op) {
-        for (int i = 0; i < src.length; i++) {
-            dst[i] = op.applyAsDouble(src[i]);
-        }
+    @Override
+    int getArraySize() {
+        return arraySize;
     }
 }
 
diff --git a/src/jmh/java/org/jetbrains/bio/viktor/Log1pBenchmark.java b/src/jmh/java/org/jetbrains/bio/viktor/Log1pBenchmark.java
index 3cfc365..cd48f3e 100644
--- a/src/jmh/java/org/jetbrains/bio/viktor/Log1pBenchmark.java
+++ b/src/jmh/java/org/jetbrains/bio/viktor/Log1pBenchmark.java
@@ -11,55 +11,31 @@
 @BenchmarkMode(Mode.Throughput)
 @OutputTimeUnit(TimeUnit.SECONDS)
 @State(Scope.Benchmark)
-@Warmup(iterations = 10)
-@Measurement(iterations = 5)
+@Warmup(iterations = 5)
+@Measurement(iterations = 10)
 @Fork(value = 2, jvmArgsPrepend = "-Djava.library.path=./build/libs")
-public class Log1pBenchmark {
-    @Param({"100", "1000", "10000"})
-    int arraySize;
-    double[] src;
-    double[] dst;
-
-    @Setup
-    public void generateData() {
-        src = new double[arraySize];
-        dst = new double[arraySize];
-        Internal.sampleUniformGamma(src);
-    }
+public class Log1pBenchmark extends AbstractMathBenchmark {
 
-    @TearDown
-    public void checkAnswer() {
-        for (int i = 0; i < arraySize; i++) {
-            final double expected = Math.log1p(src[i]);
-            if (!Precision.equals(expected, dst[i], 5)) {
-                throw new IllegalStateException(String.format(
-                        "log1p(%s) = %s (instead of %s)",
-                        src[i], dst[i], expected));
-            }
-        }
-    }
+    @Param({"1000", "100000", "1000000"})
+    int arraySize;
 
-    @Benchmark
-    public void scalarFastMath(final Blackhole bh) {
-        transform(src, dst, FastMath::log1p);
-        bh.consume(dst);
+    @Override
+    DoubleUnaryOperator getRegularOp() {
+        return Math::log1p;
     }
 
-    @Benchmark
-    public void scalarMath(final Blackhole bh) {
-        transform(src, dst, Math::log1p);
-        bh.consume(dst);
+    @Override
+    DoubleUnaryOperator getFastOp() {
+        return FastMath::log1p;
     }
 
-    @Benchmark
-    public void vector(final Blackhole bh) {
-        NativeSpeedups.INSTANCE.unsafeLog1p(src, 0, dst, 0, arraySize);
-        bh.consume(dst);
+    @Override
+    VectorOp getVectorOp() {
+        return NativeSpeedups.INSTANCE::unsafeLog1pInPlace;
     }
 
-    private void transform(double[] src, double[] dst, final DoubleUnaryOperator op) {
-        for (int i = 0; i < src.length; i++) {
-            dst[i] = op.applyAsDouble(src[i]);
-        }
+    @Override
+    int getArraySize() {
+        return arraySize;
     }
 }
diff --git a/src/jmh/java/org/jetbrains/bio/viktor/LogAddExpBenchmark.java b/src/jmh/java/org/jetbrains/bio/viktor/LogAddExpBenchmark.java
index 0f0a28a..da0344d 100644
--- a/src/jmh/java/org/jetbrains/bio/viktor/LogAddExpBenchmark.java
+++ b/src/jmh/java/org/jetbrains/bio/viktor/LogAddExpBenchmark.java
@@ -1,20 +1,22 @@
 package org.jetbrains.bio.viktor;
 
-import org.apache.commons.math3.util.FastMath;
 import org.apache.commons.math3.util.Precision;
 import org.openjdk.jmh.annotations.*;
 import org.openjdk.jmh.infra.Blackhole;
 
 import java.util.concurrent.TimeUnit;
 
+import static org.jetbrains.bio.viktor.MoreMathKt.logAddExp;
+
 @BenchmarkMode(Mode.Throughput)
-@OutputTimeUnit(TimeUnit.MILLISECONDS)
+@OutputTimeUnit(TimeUnit.SECONDS)
 @State(Scope.Benchmark)
-@Warmup(iterations = 10)
+@Warmup(iterations = 5)
 @Measurement(iterations = 10)
 @Fork(value = 2, jvmArgsPrepend = "-Djava.library.path=./build/libs")
 public class LogAddExpBenchmark {
-    @Param({"100", "500", "1000"})
+
+    @Param({"1000", "100000", "1000000"})
     int arraySize;
     double[] src1;
     double[] src2;
@@ -44,8 +46,9 @@ public void checkAnswer() {
 
     @Benchmark
     public void scalar(final Blackhole bh) {
+        System.arraycopy(src1, 0, dst, 0, arraySize); // let's be fair
         for (int i = 0; i < arraySize; i++) {
-            dst[i] = logAddExp(src1[i], src2[i]);
+            dst[i] = logAddExp(dst[i], src2[i]);
         }
 
         bh.consume(dst);
@@ -53,19 +56,9 @@ public void scalar(final Blackhole bh) {
 
     @Benchmark
     public void vector(final Blackhole bh) {
-        NativeSpeedups.INSTANCE.unsafeLogAddExp(src1, 0, src2, 0, dst, 0, arraySize);
+        System.arraycopy(src1, 0, dst, 0, arraySize);
+        NativeSpeedups.INSTANCE.unsafeLogAddExp(dst, 0, src2, 0, arraySize);
         bh.consume(dst);
     }
-
-    private static double logAddExp(final double a, final double b) {
-        if (Double.isInfinite(a) && a < 0) {
-            return b;
-        }
-        if (Double.isInfinite(b) && b < 0) {
-            return a;
-        }
-
-        return FastMath.max(a, b) + StrictMath.log1p(FastMath.exp(-Math.abs(a - b)));
-    }
 }
 
diff --git a/src/jmh/java/org/jetbrains/bio/viktor/LogBenchmark.java b/src/jmh/java/org/jetbrains/bio/viktor/LogBenchmark.java
index 9b6b0db..3976835 100644
--- a/src/jmh/java/org/jetbrains/bio/viktor/LogBenchmark.java
+++ b/src/jmh/java/org/jetbrains/bio/viktor/LogBenchmark.java
@@ -1,61 +1,43 @@
 package org.jetbrains.bio.viktor;
 
+import org.apache.commons.math3.util.FastMath;
 import org.apache.commons.math3.util.Precision;
 import org.openjdk.jmh.annotations.*;
 import org.openjdk.jmh.infra.Blackhole;
 
 import java.util.concurrent.TimeUnit;
+import java.util.function.DoubleUnaryOperator;
 
 @BenchmarkMode(Mode.Throughput)
-@OutputTimeUnit(TimeUnit.MILLISECONDS)
+@OutputTimeUnit(TimeUnit.SECONDS)
 @State(Scope.Benchmark)
-@Warmup(iterations = 10)
-@Measurement(iterations = 5)
+@Warmup(iterations = 5)
+@Measurement(iterations = 10)
 @Fork(value = 2, jvmArgsPrepend = "-Djava.library.path=./build/libs")
-public class LogBenchmark {
-    @Param({"1000", "10000"})
+public class LogBenchmark extends AbstractMathBenchmark {
+
+    @Param({"1000", "100000", "1000000"})
     int arraySize;
-    double[] src;
-    double[] dst;
-
-    @Setup
-    public void generateData() {
-        src = new double[arraySize];
-        dst = new double[arraySize];
-        Internal.sampleUniformGamma(src);
-    }
 
-    @TearDown
-    public void checkAnswer() {
-        for (int i = 0; i < arraySize; i++) {
-            final double expected = Math.log(src[i]);
-            if (!Precision.equals(expected, dst[i], 5)) {
-                throw new IllegalStateException(String.format(
-                        "log(%s) = %s (instead of %s)",
-                        src[i], dst[i], expected));
-            }
-        }
+    @Override
+    DoubleUnaryOperator getRegularOp() {
+        return Math::log;
     }
 
-    @Benchmark
-    public void scalar(final Blackhole bh) {
-        for (int i = 0; i < src.length; i++) {
-            dst[i] = Math.log(src[i]);
-        }
-
-        bh.consume(dst);
+    @Override
+    DoubleUnaryOperator getFastOp() {
+        return FastMath::log;
     }
 
-    @Benchmark
-    public void vectorBoostSimd(final Blackhole bh) {
-        NativeSpeedups.INSTANCE.unsafeLog(src, 0, dst, 0, arraySize);
-        bh.consume(dst);
+    @Override
+    VectorOp getVectorOp() {
+        return NativeSpeedups.INSTANCE::unsafeLogInPlace;
     }
 
-    @Benchmark
-    public void vectorYeppp(final Blackhole bh) {
-        info.yeppp.Math.Log_V64f_V64f(src, 0, dst, 0, arraySize);
-        bh.consume(dst);
+    @Override
+    int getArraySize() {
+        return arraySize;
     }
+
 }
 
diff --git a/src/jmh/java/org/jetbrains/bio/viktor/LogSumExpBenchmark.java b/src/jmh/java/org/jetbrains/bio/viktor/LogSumExpBenchmark.java
index 868797d..f8d8b08 100644
--- a/src/jmh/java/org/jetbrains/bio/viktor/LogSumExpBenchmark.java
+++ b/src/jmh/java/org/jetbrains/bio/viktor/LogSumExpBenchmark.java
@@ -3,78 +3,97 @@
 import org.apache.commons.math3.random.RandomDataGenerator;
 import org.apache.commons.math3.stat.StatUtils;
 import org.apache.commons.math3.util.FastMath;
+import org.apache.commons.math3.util.Precision;
 import org.openjdk.jmh.annotations.*;
 import org.openjdk.jmh.infra.Blackhole;
 
 import java.util.concurrent.TimeUnit;
 
-@BenchmarkMode(Mode.AverageTime)
-@OutputTimeUnit(TimeUnit.NANOSECONDS)
+@BenchmarkMode(Mode.Throughput)
+@OutputTimeUnit(TimeUnit.SECONDS)
 @State(Scope.Benchmark)
-@Warmup(iterations = 20)
+@Warmup(iterations = 5)
 @Measurement(iterations = 10)
 @Fork(value = 2, jvmArgsPrepend = "-Djava.library.path=./build/libs")
 public class LogSumExpBenchmark {
-    @Param({"16", "32", "64"})
+
+    @Param({"1000", "100000", "1000000"})
     int arraySize;
     double[] src;
+    double res;
 
     @Setup
     public void generateData() {
         src = new double[arraySize];
-        final RandomDataGenerator gen = new RandomDataGenerator();
-        for (int i = 0; i < src.length; i++) {
-            if (i < src.length / 3) {
-                src[i] = gen.nextUniform(0, 1);
-            } else {
-                src[i] = gen.nextGamma(42., 42.);
-            }
+        Internal.sampleUniformGamma(src);
+    }
+
+    @TearDown
+    public void checkAnswer() {
+        double sumExp = 0.;
+        for (double value : src) {
+            sumExp += Math.exp(value);
+        }
+        if (!Precision.equals(Math.exp(res), sumExp, 5)) {
+            throw new IllegalStateException(String.format("expected %s, got %s", sumExp, Math.exp(res)));
         }
     }
 
     @Benchmark
-    public void scalarLSE(final Blackhole bh) {
+    public void scalarMath(final Blackhole bh) {
         final double offset = StatUtils.max(src);
-        double sum = 0;
+        res = 0.;
         for (double value : src) {
-            sum += FastMath.exp(value - offset);
+            res += Math.exp(value - offset);
         }
-
-        bh.consume(Math.log(sum) + offset);
+        res = Math.log(res) + offset;
+        bh.consume(res);
     }
 
     @Benchmark
-    public void scalarLSEUnrolled2(final Blackhole bh) {
+    public void scalarMathUnroll2(final Blackhole bh) {
         final double offset = StatUtils.max(src);
-        double sum1 = 0;
-        double sum2 = 0;
-        for (int i = 0; i < src.length; i+=2) {
+        double sum1 = 0.;
+        double sum2 = 0.;
+        for (int i = 0; i < src.length; i += 2) {
             sum1 += FastMath.exp(src[i] - offset);
             sum2 += FastMath.exp(src[i + 1] - offset);
         }
-
-        bh.consume(Math.log(sum1 + sum2) + offset);
+        res = Math.log(sum1 + sum2) + offset;
+        bh.consume(res);
     }
 
     @Benchmark
-    public void scalarLSEUnrolled4(final Blackhole bh) {
+    public void scalarMathUnroll4(final Blackhole bh) {
         final double offset = StatUtils.max(src);
-        double sum1 = 0;
-        double sum2 = 0;
-        double sum3 = 0;
-        double sum4 = 0;
-        for (int i = 0; i < src.length; i+=4) {
+        double sum1 = 0.;
+        double sum2 = 0.;
+        double sum3 = 0.;
+        double sum4 = 0.;
+        for (int i = 0; i < src.length; i += 4) {
             sum1 += FastMath.exp(src[i] - offset);
             sum2 += FastMath.exp(src[i + 1] - offset);
             sum3 += FastMath.exp(src[i + 2] - offset);
             sum4 += FastMath.exp(src[i + 3] - offset);
         }
+        res = Math.log(sum1 + sum2 + sum3 + sum4) + offset;
+        bh.consume(res);
+    }
 
-        bh.consume(Math.log(sum1 + sum2 + sum3 + sum4) + offset);
+    @Benchmark
+    public void scalarFastMath(final Blackhole bh) {
+        final double offset = StatUtils.max(src);
+        res = 0;
+        for (double value : src) {
+            res += FastMath.exp(value - offset);
+        }
+        res = FastMath.log(res) + offset;
+        bh.consume(res);
     }
 
     @Benchmark
     public void vectorLSE(final Blackhole bh) {
-        bh.consume(NativeSpeedups.INSTANCE.unsafeLogSumExp(src, 0, arraySize));
+        res = NativeSpeedups.INSTANCE.unsafeLogSumExp(src, 0, arraySize);
+        bh.consume(res);
     }
 }
diff --git a/src/jmh/java/org/jetbrains/bio/viktor/SDBenchmark.java b/src/jmh/java/org/jetbrains/bio/viktor/SDBenchmark.java
index 2ac8c06..503d5f7 100644
--- a/src/jmh/java/org/jetbrains/bio/viktor/SDBenchmark.java
+++ b/src/jmh/java/org/jetbrains/bio/viktor/SDBenchmark.java
@@ -1,45 +1,54 @@
 package org.jetbrains.bio.viktor;
 
+import org.apache.commons.math3.util.Precision;
 import org.openjdk.jmh.annotations.*;
+import org.openjdk.jmh.infra.Blackhole;
 
-import java.util.Random;
 import java.util.concurrent.TimeUnit;
 
 @BenchmarkMode(Mode.Throughput)
-@OutputTimeUnit(TimeUnit.MILLISECONDS)
+@OutputTimeUnit(TimeUnit.SECONDS)
 @State(Scope.Benchmark)
 @Warmup(iterations = 5)
 @Measurement(iterations = 10)
 @Fork(value = 2, jvmArgsPrepend = "-Djava.library.path=./build/libs")
 public class SDBenchmark {
 
-	@Param({"1000", "10000", "100000"})
+	@Param({"1000", "100000", "1000000"})
 	int arraySize;
-	double[] values;
+	private double[] src;
+	private double res;
 
 	@Setup
 	public void generateData() {
-		final Random random = new Random();
-		values = random.doubles(arraySize).toArray();
+		src = new double[arraySize];
+		Internal.sampleUniformGamma(src);
 	}
 
-	@Benchmark
-	public double simpleSD() {
-		double sum = 0., sumSquares = 0.;
-		for (int i = 0; i < arraySize; i++) {
-			sum += values[i];
-			sumSquares += values[i] * values[i];
+	@TearDown
+	public void checkAnswer() {
+		final double stored = res;
+		scalar(null);
+		if (!Precision.equals(stored, res, (stored + res) * 1E-12)) {
+			throw new IllegalStateException(String.format("expected %s, got %s", res, stored));
 		}
-		return Math.sqrt((sumSquares - sum * sum / arraySize) / (arraySize - 1));
 	}
 
 	@Benchmark
-	public double javaSD() {
-		return F64ArrayKt.asF64Array(values, 0, arraySize).sd();
+	public void scalar(final Blackhole bh) {
+		double sum = 0., sumSquares = 0.;
+		for (double value : src) {
+			sum += value;
+			sumSquares += value * value;
+		}
+		res = Math.sqrt((sumSquares - sum * sum / arraySize) / (arraySize - 1));
+		if (bh != null)	bh.consume(res);
 	}
 
 	@Benchmark
-	public double nativeSD() {
-		return NativeSpeedups.INSTANCE.sd(values, 0, arraySize);
+	public void vector(final Blackhole bh) {
+		res = NativeSpeedups.INSTANCE.unsafeSD(src, 0, arraySize);
+		bh.consume(res);
 	}
+
 }
diff --git a/src/jmh/java/org/jetbrains/bio/viktor/SumBenchmark.java b/src/jmh/java/org/jetbrains/bio/viktor/SumBenchmark.java
index c31266f..091c9b9 100644
--- a/src/jmh/java/org/jetbrains/bio/viktor/SumBenchmark.java
+++ b/src/jmh/java/org/jetbrains/bio/viktor/SumBenchmark.java
@@ -1,44 +1,52 @@
 package org.jetbrains.bio.viktor;
 
+import org.apache.commons.math3.util.Precision;
 import org.openjdk.jmh.annotations.*;
+import org.openjdk.jmh.infra.Blackhole;
 
-import java.util.Random;
 import java.util.concurrent.TimeUnit;
 
 @BenchmarkMode(Mode.Throughput)
-@OutputTimeUnit(TimeUnit.MILLISECONDS)
+@OutputTimeUnit(TimeUnit.SECONDS)
 @State(Scope.Benchmark)
 @Warmup(iterations = 5)
 @Measurement(iterations = 10)
 @Fork(value = 2, jvmArgsPrepend = "-Djava.library.path=./build/libs")
 public class SumBenchmark {
-	@Param({"1000", "10000", "100000"})
+
+	@Param({"1000", "100000", "1000000"})
 	int arraySize;
-	double[] values;
+	private double[] src;
+	private double res;
 
 	@Setup
 	public void generateData() {
-		final Random random = new Random();
-		values = random.doubles(arraySize).toArray();
+		src = new double[arraySize];
+		Internal.sampleUniformGamma(src);
 	}
 
-	@Benchmark
-	public double simpleSum() {
-		double res = 0.;
-		for (int i = 0; i < arraySize; i++) {
-			res += values[i];
+	@TearDown
+	public void checkAnswer() {
+		final double stored = res;
+		scalar(null);
+		if (!Precision.equals(stored, res, (stored + res) * 1E-12)) {
+			throw new IllegalStateException(String.format("expected %s, got %s", res, stored));
 		}
-		return res;
 	}
 
 	@Benchmark
-	public double javaSum() {
-		return F64ArrayKt.asF64Array(values, 0, arraySize).balancedSum();
+	public void scalar(final Blackhole bh) {
+		res = 0.;
+		for (double value : src) {
+			res += value;
+		}
+		if (bh != null) bh.consume(res);
 	}
 
 	@Benchmark
-	public double nativeSum() {
-		return NativeSpeedups.INSTANCE.sum(values, 0, arraySize);
+	public void vector(final Blackhole bh) {
+		res = NativeSpeedups.INSTANCE.unsafeSum(src, 0, arraySize);
+		bh.consume(res);
 	}
 
 }
diff --git a/src/jmh/java/org/jetbrains/bio/viktor/WeightedSumBenchmark.java b/src/jmh/java/org/jetbrains/bio/viktor/WeightedSumBenchmark.java
deleted file mode 100644
index d788b8c..0000000
--- a/src/jmh/java/org/jetbrains/bio/viktor/WeightedSumBenchmark.java
+++ /dev/null
@@ -1,56 +0,0 @@
-package org.jetbrains.bio.viktor;
-
-import org.openjdk.jmh.annotations.*;
-
-import java.util.Random;
-import java.util.concurrent.TimeUnit;
-
-@BenchmarkMode(Mode.Throughput)
-@OutputTimeUnit(TimeUnit.MILLISECONDS)
-@State(Scope.Benchmark)
-@Warmup(iterations = 5)
-@Measurement(iterations = 10)
-@Fork(value = 2, jvmArgsPrepend = "-Djava.library.path=./build/libs")
-public class WeightedSumBenchmark {
-	@Param({"1000", "10000", "100000"})
-	int arraySize;
-	double[] values;
-	double[] weights;
-	short[] shorts;
-
-	@Setup
-	public void generateData() {
-		final Random random = new Random();
-		values = random.doubles(arraySize).toArray();
-		weights = random.doubles(arraySize).toArray();
-		shorts = new short[arraySize];
-		for (int i = 0; i < arraySize; i++) {
-			shorts[i] = (short) random.nextInt(1000);
-		}
-	}
-
-	@Benchmark
-	public double simpleDot() {
-		double res = 0.;
-		for (int i = 0; i < arraySize; i++) {
-			res += values[i] * weights[i];
-		}
-		return res;
-	}
-
-	@Benchmark
-	public double dot() {
-		return F64ArrayKt.asF64Array(values, 0, arraySize).dot(F64ArrayKt.asF64Array(weights, 0, arraySize));
-	}
-
-	@Benchmark
-	public double nativeDot() {
-		return NativeSpeedups.INSTANCE.weightedSum(values, 0, weights, 0, arraySize);
-	}
-
-	@Benchmark
-	public double shortDot() {
-		return F64ArrayKt.asF64Array(values, 0, arraySize).dot(shorts);
-	}
-
-}
diff --git a/src/main/kotlin/org/jetbrains/bio/viktor/DoubleExtensions.kt b/src/main/kotlin/org/jetbrains/bio/viktor/DoubleExtensions.kt
index 95d62bf..7f2eabd 100644
--- a/src/main/kotlin/org/jetbrains/bio/viktor/DoubleExtensions.kt
+++ b/src/main/kotlin/org/jetbrains/bio/viktor/DoubleExtensions.kt
@@ -2,9 +2,9 @@
 
 package org.jetbrains.bio.viktor
 
-import org.jetbrains.bio.viktor.NativeSpeedups.unsafeNegate
-import org.jetbrains.bio.viktor.NativeSpeedups.unsafePlusScalar
-import org.jetbrains.bio.viktor.NativeSpeedups.unsafeScalarDiv
+import org.jetbrains.bio.viktor.NativeSpeedups.unsafeNegateInPlace
+import org.jetbrains.bio.viktor.NativeSpeedups.unsafePlusScalarAssign
+import org.jetbrains.bio.viktor.NativeSpeedups.unsafeScalarDivAssign
 
 /**
  * Operator overloads for [Double] and [F64Array].
@@ -14,12 +14,10 @@ import org.jetbrains.bio.viktor.NativeSpeedups.unsafeScalarDiv
 
 private inline fun Double.minusInPlace(other: F64Array) {
     if (other is F64LargeDenseArray) {
-        unsafeNegate(other.data, other.offset,
-                     other.data, other.offset, other.size)
-        unsafePlusScalar(other.data, other.offset, this,
-                         other.data, other.offset, other.size)
+        unsafeNegateInPlace(other.data, other.offset, other.size)
+        unsafePlusScalarAssign(other.data, other.offset, other.size, this)
     } else {
-        for (pos in 0..other.size - 1) {
+        for (pos in 0 until other.size) {
             other.unsafeSet(pos, this - other.unsafeGet(pos))
         }
     }
@@ -37,10 +35,9 @@ inline operator fun Double.times(other: F64Array) = other * this
 
 private inline fun Double.divInPlace(other: F64Array) {
     if (other is F64LargeDenseArray) {
-        unsafeScalarDiv(this, other.data, other.offset,
-                        other.data, other.offset, other.size)
+        unsafeScalarDivAssign(other.data, other.offset, other.size, this)
     } else {
-        for (pos in 0..other.size - 1) {
+        for (pos in 0 until other.size) {
             other.unsafeSet(pos, this / other.unsafeGet(pos))
         }
     }
diff --git a/src/main/kotlin/org/jetbrains/bio/viktor/F64Array.kt b/src/main/kotlin/org/jetbrains/bio/viktor/F64Array.kt
index 05f03fc..7eed3ed 100644
--- a/src/main/kotlin/org/jetbrains/bio/viktor/F64Array.kt
+++ b/src/main/kotlin/org/jetbrains/bio/viktor/F64Array.kt
@@ -3,6 +3,8 @@ package org.jetbrains.bio.viktor
 import org.apache.commons.math3.util.Precision
 import java.text.DecimalFormat
 import java.util.*
+import kotlin.math.min
+import kotlin.math.sqrt
 
 /**
  * A strided n-dimensional array stored in a [DoubleArray].
@@ -24,9 +26,15 @@ import java.util.*
  * [1, 4]
  * ```
  *
- * Arrays with last stride equal to 1 are called called *dense*.
- * The distinction is important because some of the operations
- * can be significantly optimized for dense arrays.
+ * Due to instantiation contracts, the actual instance of this exact class will always be non-flat, i.e.
+ * have at least two dimensions. One-dimensional array will be [F64FlatArray] (or possibly a descendant of that),
+ * and zero-dimensional (singleton) arrays are not allowed.
+ *
+ * Method tags:
+ * - "in-place": this operation will modify the array
+ * - "copying": this operation creates a copy fully independent from the original array
+ * - "viewer": this operation creates a new array which shares data with the original one;
+ * modifications to one will be seen through the other, and no copying is performed
  *
  * @author Sergei Lebedev
  * @since 0.4.0
@@ -39,7 +47,8 @@ open class F64Array protected constructor(
         /** Indexing steps along each axis. */
         val strides: IntArray,
         /** Number of elements along each axis. */
-        val shape: IntArray) {
+        val shape: IntArray
+) {
 
     /** Number of axes in this array. */
     val nDim: Int get() = shape.size
@@ -48,69 +57,108 @@ open class F64Array protected constructor(
     val size: Int get() = shape.first()
 
     /**
-     * Returns `true` if this array is dense and `false` otherwise.
+     * The maximum number of dimensions suitable for unrolling, see [unrollOnce].
+     */
+    private val unrollDim: Int
+
+    /**
+     * The size of the maximum unrolled subarray sequence, see [unrollOnce].
+     */
+    private val unrollSize: Int
+
+    /**
+     * The stride of the maximum unrolled subarray sequence, see [unrollOnce].
+     */
+    private val unrollStride: Int
+
+    /**
+     * Returns `true` if this array can be flattened using [flatten].
      *
-     * Dense arrays are laid out in a single contiguous block
-     * of memory.
+     * Flattenable array's elements are laid out with a constant stride. This allows to use simple loops when iterating.
+     * Calling [flatten] on a non-flattenable array will produce an [IllegalStateException].
      *
-     * This allows to use SIMD operations, e.g. when computing the
-     * sum of elements.
+     * A particular case of a flattenable array is a dense array whose elements occupy a contiguous block of memory.
+     * Large dense arrays employ native SIMD optimizations, see [F64LargeDenseArray].
      */
-    // This is inaccurate, but maybe sufficient for our use-case?
-    // Check with http://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html
-    val isDense: Boolean get() = strides.last() == 1
+    open val isFlattenable get() = (unrollDim == nDim)
+
+    init {
+        require(strides.size == shape.size) {
+            "Strides and shape have different sizes (${strides.size} and ${shape.size})"
+        }
+        // calculation of unrollDim, unrollStride and unrollSize
+        var prevStride = 0
+        var unrollable = true
+        var d = 0
+        var s = 0
+        for (i in strides.indices) {
+            if (shape[i] == 1) {
+                if (unrollable) d = i + 1
+                continue
+            }
+            if (unrollable && (prevStride == 0 || prevStride == strides[i] * shape[i])) {
+                d = i + 1
+                s = strides[i]
+            } else {
+                unrollable = false
+            }
+            prevStride = strides[i]
+        }
+        unrollDim = d
+        unrollStride = s
+        unrollSize = shape.slice(0 until d).toIntArray().product()
+    }
 
     /**
      * Generic getter.
      *
-     * Note that it could be at least 1.5x slower than specialized versions.
+     * Note that it could be at least 1.5x slower than specialized versions for 1, 2 or 3 dimensions.
      */
     operator fun get(vararg indices: Int): Double {
-        require(indices.size == nDim) { "broadcasting get is not supported" }
+        check(indices.size == nDim) { "broadcasting get is not supported" }
         return safeIndex({ indices }) { data[unsafeIndex(indices)] }
     }
 
     operator fun get(pos: Int): Double {
-        require(nDim == 1) { "broadcasting get is not supported" }
+        check(nDim == 1) { "broadcasting get is not supported" }
         return safeIndex({ intArrayOf(pos) }) { data[unsafeIndex(pos)] }
     }
 
     operator fun get(r: Int, c: Int): Double {
-        require(nDim == 2) { "broadcasting get is not supported" }
+        check(nDim == 2) { "broadcasting get is not supported" }
         return safeIndex({ intArrayOf(r, c) }) { data[unsafeIndex(r, c)] }
     }
 
     operator fun get(d: Int, r: Int, c: Int): Double {
-        require(nDim == 3) { "broadcasting get is not supported" }
+        check(nDim == 3) { "broadcasting get is not supported" }
         return safeIndex({ intArrayOf(d, r, c) }) { data[unsafeIndex(d, r, c)] }
     }
 
     /**
      * Generic setter.
      *
-     * Note that it could be at least 1.5x slower than specialized versions.
+     * Note that it could be at least 1.5x slower than specialized versions for 1, 2 or 3 dimensions.
      */
     operator fun set(vararg indices: Int, value: Double) {
-        require(indices.size == nDim) { "broadcasting set is not supported" }
+        check(indices.size == nDim) { "broadcasting set is not supported" }
         safeIndex({ indices }) { data[unsafeIndex(indices)] = value }
     }
 
     operator fun set(pos: Int, value: Double) {
-        require(nDim == 1) { "broadcasting set is not supported" }
+        check(nDim == 1) { "broadcasting set is not supported" }
         safeIndex({ intArrayOf(pos) }) { data[unsafeIndex(pos)] = value }
     }
 
     operator fun set(r: Int, c: Int, value: Double) {
-        require(nDim == 2) { "broadcasting set is not supported" }
+        check(nDim == 2) { "broadcasting set is not supported" }
         safeIndex({ intArrayOf(r, c) }) { data[unsafeIndex(r, c)] = value }
     }
 
     operator fun set(d: Int, r: Int, c: Int, value: Double) {
-        require(nDim == 3) { "broadcasting set is not supported" }
+        check(nDim == 3) { "broadcasting set is not supported" }
         safeIndex({ intArrayOf(d, r, c) }) { data[unsafeIndex(d, r, c)] = value }
     }
 
-    // XXX required for fallback implementations in [F64FlatVector].
     @Suppress("nothing_to_inline")
     internal inline fun unsafeGet(pos: Int): Double = data[unsafeIndex(pos)]
 
@@ -163,20 +211,25 @@ open class F64Array protected constructor(
     /**
      * Returns a sequence of views along the specified [axis].
      *
-     * For example, for a 2-D array `axis = 0` means "for each row",
+     * For example, for a 2D array `axis = 0` means "for each row",
      * and `axis = 1` "for each column".
+     *
+     * Viewer method.
      */
-    open fun along(axis: Int): Sequence<F64Array> {
-        return (0..shape[axis] - 1).asSequence()
-                .map { view(it, axis) }
-    }
+    open fun along(axis: Int): Sequence<F64Array> = (0 until shape[axis]).asSequence().map { view(it, axis) }
 
-    /** Returns a view of this array along the specified [axis]. */
+    /**
+     * Returns a view of this array along the specified [axis].
+     *
+     * Viewer method.
+     */
     fun view(index: Int, axis: Int = 0): F64Array {
         checkIndex("axis", axis, nDim)
         checkIndex("index", index, shape[axis])
-        return invoke(data, offset + strides[axis] * index,
-                      strides.remove(axis), shape.remove(axis))
+        return invoke(
+            data, offset + strides[axis] * index,
+            strides.remove(axis), shape.remove(axis)
+        )
     }
 
     /**
@@ -190,41 +243,170 @@ open class F64Array protected constructor(
         return indices.fold(this) { m, pos -> m.view(pos) }
     }
 
-    /** A broadcasted viewer for this array. */
+    /**
+     * Unrolls the array down to flattenable subarrays and flattens them.
+     *
+     * See [unrollOnce] for description of unrolling. This method performs recursive unrolling until it arrives
+     * at flattenable subarrays, at which point it flattens them and returns the resulting sequence.
+     * In particular, if the original array [isFlattenable], it's flattened and returned as a singleton sequence.
+     */
+    private fun unrollToFlat(): Sequence<F64FlatArray> {
+        if (isFlattenable) return sequenceOf(flatten())
+        return unrollOnce().flatMap { it.unrollToFlat() }
+    }
+
+    /**
+     * Unrolls two arrays down to flattenable subarrays in a compatible way and applies [action] to the flattened pairs.
+     *
+     * Since two arrays with the same [shape] can have different internal organization (i.e. [strides]),
+     * it's important to unroll them in a consistent way. This method provides the necessary functionality.
+     *
+     * @param action Since this method is always used in a for-each context, we include the corresponding lambda
+     * in the signature.
+     */
+    private fun commonUnrollToFlat(
+            other: F64Array,
+            action: (F64FlatArray, F64FlatArray) -> Unit
+    ) {
+        checkShape(other)
+        val commonUnrollDim = min(unrollDim, other.unrollDim)
+        if (commonUnrollDim == nDim) {
+            action(flatten(), other.flatten())
+        } else {
+            unrollOnce(commonUnrollDim).zip(other.unrollOnce(commonUnrollDim)).forEach { (a, b) ->
+                a.commonUnrollToFlat(b, action)
+            }
+        }
+    }
+
+    /**
+     * Unroll the array into a sequence of smaller subarrays in a reasonably efficient way.
+     *
+     * We can always iterate any array by performing nested loops over all its axes. However, in the vast majority
+     * of cases this is unnecessarily complicated and at least some of the nested loops can be replaced with
+     * a single loop. This is called "unrolling".
+     *
+     * [unrollDim] describes the maximum number of dimensions suitable for unrolling. When it's equal to [nDim],
+     * the array [isFlattenable], meaning that all its elements are equidistant and can be visited in a single loop.
+     *
+     * This method assumes that the caller knows what they do and unrolls over specified dimensions without
+     * checking e.g. whether the array [isFlattenable] and thus doesn't need any unrolling.
+     */
+    private fun unrollOnce(n: Int = unrollDim): Sequence<F64Array> {
+        require(n <= unrollDim) { "can't unroll $n dimensions, only $unrollDim are unrollable" }
+        val newStrides = strides.slice(n until nDim).toIntArray()
+        val newShape = shape.slice(n until nDim).toIntArray()
+        val currentUnrollStride = if (n == unrollDim) unrollStride else run {
+            var nonTrivialN = n - 1
+            while (nonTrivialN >= 0 && shape[nonTrivialN] <= 1) nonTrivialN--
+            if (nonTrivialN >= 0) strides[nonTrivialN] else 0
+        }
+        val currentUnrollSize = if (n == unrollDim) unrollSize else shape.slice(0 until n).toIntArray().product()
+        return (0 until currentUnrollSize).asSequence().map { i ->
+            invoke(data, offset + currentUnrollStride * i, newStrides, newShape)
+        }
+    }
+
+    /**
+     * A broadcasted viewer for this array.
+     *
+     * The main difference between
+     *     a[...]
+     * and
+     *     a.V[...]
+     * is that the array's getter/setter methods deal with scalar [Double] values, while the viewer's methods
+     * deal with [F64Array]s. Another difference is that the viewer's methods can skip dimensions by providing
+     * [_I] object instead of an index.
+     *
+     * Consider a matrix (2D array) `a`. Then the following invocations have the following effect:
+     *     a[4] // fails, since it doesn't reference a scalar
+     *     a[4, 2] // returns a Double
+     *     a.V[4] // returns 4th row
+     *     a.V[_I, 2] // returns 2nd column
+     *     a.V[4, 2] // fails, since it doesn't reference an array
+     */
+    @Suppress("PropertyName")
     @delegate:Transient
-    val V: Viewer by lazy(LazyThreadSafetyMode.NONE) { Viewer(this) }
+    val V: Viewer by lazy(LazyThreadSafetyMode.PUBLICATION) { Viewer(this) }
 
     class Viewer(private val a: F64Array) {
         /**
-         * A less-verbose alias to [view].
+         * Returns a subarray with several first indices specified.
+         *
+         * A less-verbose alias to [view] and chained [view] calls:
+         *     a.V[i0, i1, i2] == a.view(i0).view(i1).view(i2)
+         *
+         * For an appropriately sized `moreIndices` [IntArray], the following invariant holds:
+         *     a[indices + moreIndices] == a.V[indices][moreIndices]
+         *
+         * It should be noted that the former invocation is generally much more efficient.
          *
-         * Please do NOT abuse this shortcut by double-indexing, i.e. don't
-         * do `m[i][j]`, write `m[i, j]` instead.
+         * Viewer method.
          */
         operator fun get(vararg indices: Int) = a.view0(indices)
 
+        /**
+         * Replaces a subarray with several first indices specified with the values from [other].
+         *
+         * After we call:
+         *     a.V[indices] = other
+         * the following holds for an appropriately sized `moreIndices` [IntArray]:
+         *     a[indices + moreIndices] == other[moreIndices]
+         *
+         * In-place method for this and copying method for [other].
+         */
         operator fun set(vararg indices: Int, other: F64Array) {
             other.copyTo(a.view0(indices))
         }
 
+        /**
+         * Fills a subarray with several first indices specified with [init] value.
+         *
+         * After we call:
+         *     a.V[indices] = init
+         * the following holds for any appropriately sized `moreIndices` [IntArray]:
+         *     a[indices + moreIndices] == init
+         *
+         * In-place method.
+         */
         operator fun set(vararg indices: Int, init: Double) {
             a.view0(indices).fill(init)
         }
 
-        /** A less-verbose alias to [copyTo]. */
+        /**
+         * Replaces the whole array with the values from [other].
+         *
+         * A less-verbose alias to [copyTo].
+         *
+         * In-place method for this and copying method for [other].
+         */
         @Suppress("unused_parameter")
-        operator fun set(vararg any: _I, other: F64Array) {
-            require(any.size < a.nDim) { "too many axes" }
+        operator fun set(any: _I, other: F64Array) {
             other.copyTo(a)
         }
 
         /**
-         * A less-verbose alias to [view].
+         * Replaces the whole array with [other] value.
+         *
+         * A less-verbose alias to [fill].
          *
-         * Use in conjunction with [_I], e.g. `m[_I, i]`.
+         * In-place method.
+         */
+        @Suppress("unused_parameter")
+        operator fun set(any: _I, other: Double) {
+            a.fill(other)
+        }
+
+        /**
+         * Returns a subarray with index 1 specified to [c].
+         *
+         * A less-verbose alias to [view]:
+         *     a.V[_I, c] == a.view(c, 1)
+         *
+         * Viewer method.
          */
         // XXX we could generalize this in a way similar to the above method.
-        //     However, after the resulting methods could only be called via
+        //     However, in that case the resulting methods could only be called via
         //     method call syntax with explicit parameter names. E.g.
         //
         //         get(any: _I, vararg rest: _I, c: Int, other: F64Array)
@@ -233,152 +415,168 @@ open class F64Array protected constructor(
         @Suppress("unused_parameter")
         operator fun get(any: _I, c: Int) = a.view(c, axis = 1)
 
+        /**
+         * Replaces a subarray with index 1 specified to [c] with the values from [other].
+         *
+         * A less-verbose alias to [view]:
+         *     a.V[_I, c] == a.view(c, 1)
+         *
+         * In-place method for this and copying method for [other].
+         */
         @Suppress("unused_parameter")
         operator fun set(any: _I, c: Int, other: F64Array) {
             other.copyTo(a.view(c, axis = 1))
         }
 
+        /**
+         * Replaces a subarray with index 1 specified to [c] with the [init] value.
+         *
+         * In-place method.
+         */
         @Suppress("unused_parameter")
         operator fun set(any: _I, c: Int, init: Double) {
             a.view(c, axis = 1).fill(init)
         }
     }
 
-    /** Returns a copy of the elements in this array. */
+    /**
+     * Returns a copy of this array.
+     *
+     * The copy has the same [shape] as the original, but not necessary the same [strides], since
+     * the copy is always flattenable and dense, even if the original array is not.
+     *
+     * Copying method.
+     */
     fun copy(): F64Array {
         val copy = invoke(*shape)
         copyTo(copy)
         return copy
     }
 
-    /** Copies elements in this array to [other] array. */
-    open fun copyTo(other: F64Array) {
-        checkShape(other)
-        if (Arrays.equals(strides, other.strides)) {
-            System.arraycopy(data, offset, other.data, other.offset,
-                             shape.product())
-        } else {
-            for (r in 0..size - 1) {
-                V[r].copyTo(other.V[r])
-            }
-        }
-    }
-
-    /** A less verbose alternative to [copyTo]. */
-    operator fun set(vararg any: _I, other: F64Array) = when {
-        any.size > nDim -> throw IllegalArgumentException("too many axes")
-        any.size < nDim -> throw IllegalArgumentException("too few axes")
-        else -> other.copyTo(this)
-    }
-
-    /** Reshapes this vector into a matrix in row-major order. */
-    fun reshape(vararg shape: Int): F64Array {
-        require(shape.product() == size) {
-            "total size of the new matrix must be unchanged"
-        }
-
-        return when {
-            nDim > 1 -> TODO()
-            Arrays.equals(this.shape, shape) -> this
-            else -> {
-                val reshaped = shape.clone()
-                reshaped[reshaped.lastIndex] = strides.single()
-                for (i in reshaped.lastIndex - 1 downTo 0) {
-                    reshaped[i] = reshaped[i + 1] * shape[i + 1]
-                }
+    /**
+     * Copies elements in this array to [other] array.
+     *
+     * In-place method for [other] and copying method for this.
+     */
+    open fun copyTo(other: F64Array): Unit = commonUnrollToFlat(other) { a, b -> a.copyTo(b) }
 
-                invoke(data, offset, reshaped, shape)
-            }
-        }
-    }
+    /**
+     * Reshapes this array.
+     *
+     * The original and the new array contain the same elements in the same orider, if both are enumerated row-major.
+     *
+     * For example,
+     *     F64Array.of(1.0, 2.0, 3.0, 4.0, 5.0, 6.0).reshape(2, 3)
+     * produces a 2x3 matrix:
+     *     [
+     *         [1.0, 2.0, 3.0],
+     *         [4.0, 5.0, 6.0]
+     *     ]
+     *
+     * Only supported for flattenable arrays.
+     *
+     * Viewer method.
+     */
+    open fun reshape(vararg shape: Int): F64Array = flatten().reshape(*shape)
 
     /**
      * Appends this array to another array.
      *
+     * @param axis the axis along which the arrays are appended.
      * @since 0.2.3
      */
     fun append(other: F64Array, axis: Int = 0): F64Array {
-        return F64Array.concatenate(this, other, axis = axis)
+        return concatenate(this, other, axis = axis)
     }
 
     /**
-     * Flattens the array into a 1-D view in O(1) time.
+     * Flattens the array into a 1D view in O(1) time.
      *
-     * No data copying is performed, thus the operation is only applicable
-     * to dense arrays.
+     * Only implemented for flattenable arrays.
+     *
+     * Viewer method.
      */
-    open fun flatten(): F64Array {
-        check(isDense) { "matrix is not dense" }
-        return data.asF64Array(offset, shape.product())
-    }
-
-    /** An alias for [transpose]. */
-    val T: F64Array get() = transpose()
-
-    /** Constructs matrix transpose in O(1) time. */
-    open fun transpose(): F64Array {
-        return invoke(data, offset, strides.reversedArray(), shape.reversedArray())
+    open fun flatten(): F64FlatArray {
+        check(isFlattenable) { "array can't be flattened" }
+        return F64FlatArray.invoke(data, offset, unrollStride, unrollSize)
     }
 
     /**
      * Creates a sliced view of this array in O(1) time.
      *
      * @param from the first index of the slice (inclusive).
-     * @param to the last index of the slice (exclusive).
+     * @param to the last index of the slice (exclusive). `-1` is treated as "until the end", otherwise [to] must be
+     * strictly greater than [from] (empty arrays are not allowed).
      * @param step indexing step.
      * @param axis to slice along.
      */
     fun slice(from: Int = 0, to: Int = -1, step: Int = 1, axis: Int = 0): F64Array {
-        val axisTo = if (to == -1) shape[axis] else to
-        if (from < 0 || axisTo < from || axisTo > shape[axis]) {
-            throw IndexOutOfBoundsException()
+        require(step > 0) { "slicing step must be positive, but was $step" }
+        require(axis in 0 until nDim) { "axis out of bounds: $axis" }
+        require(from >= 0) { "slicing start index must be positive, but was $from" }
+        val actualTo = if (to != -1) {
+            require(to > from) { "slicing end index $to must be greater than start index $from" }
+            check(to <= shape[axis]) { "slicing end index out of bounds: $to > ${shape[axis]}" }
+            to
+        } else {
+            check(shape[axis] > from) { "slicing start index out of bounds: $from >= ${shape[axis]}" }
+            shape[axis]
         }
 
         val sliceStrides = strides.clone().apply { this[axis] *= step }
         val sliceShape = shape.clone().apply {
-            this[axis] = (axisTo - from + step - 1) / step
+            this[axis] = (actualTo - from + step - 1) / step
         }
         return invoke(data, offset + from * strides[axis], sliceStrides, sliceShape)
     }
 
-    open operator fun contains(other: Double): Boolean = flatten().contains(other)
+    open operator fun contains(other: Double): Boolean = unrollToFlat().any { it.contains(other) }
 
-    /** Fills this array with a given [init] value. */
+    /**
+     * Fills this array with a given [init] value.
+     *
+     * In-place method.
+     */
     open fun fill(init: Double): Unit = flatten().fill(init)
 
-    fun reversed(axis: Int = 0): F64Array {
-        return invoke(data, offset + strides[axis] * (shape[axis] - 1),
-                      strides.clone().apply { this[axis] *= -1 },
-                      shape)
-    }
-
-    /** Applies a given permutation of indices to the elements in the array. */
+    /**
+     * Applies a given permutation of indices to the elements in the array.
+     *
+     * In-place method.
+     */
     open fun reorder(indices: IntArray, axis: Int = 0) {
-        reorderInternal(this, indices, axis,
-                        get = { pos -> view(pos, axis).copy() },
-                        set = { pos, value -> value.copyTo(view(pos, axis)) })
-    }
-
-    /** A less verbose alternative to [fill]. */
-    operator fun set(vararg any: _I, value: Double) = when {
-        any.size > nDim -> throw IllegalArgumentException("too many axes")
-        any.size < nDim -> throw IllegalArgumentException("too few axes")
-        else -> fill(value)
+        reorderInternal(
+            this, indices, axis,
+            get = { pos -> view(pos, axis).copy() },
+            set = { pos, value -> value.copyTo(view(pos, axis)) }
+        )
     }
 
-    /** Computes a dot product between two 1-D arrays. */
+    /**
+     * Computes a dot product between two vectors.
+     *
+     * Only implemented for flat arrays.
+     */
     open infix fun dot(other: ShortArray): Double = unsupported()
 
-    /** Computes a dot product between two 1-D arrays. */
+    /**
+     * Computes a dot product between two vectors.
+     *
+     * Only implemented for flat arrays.
+     */
     open infix fun dot(other: IntArray): Double = unsupported()
 
-    /** Computes a dot product between two 1-D arrays. */
+    /**
+     * Computes a dot product between two vectors.
+     *
+     * Only implemented for flat arrays. Optimized for dense arrays.
+     */
     infix fun dot(other: DoubleArray): Double = dot(other.asF64Array())
 
     /**
-     * Computes a dot product between two 1-D arrays.
+     * Computes a dot product between two vectors.
      *
-     * Optimized for dense arrays.
+     * Only implemented for flat arrays. Optimized for dense arrays.
      */
     open infix fun dot(other: F64Array): Double = unsupported()
 
@@ -399,7 +597,7 @@ open class F64Array protected constructor(
     open fun sd(): Double {
         val s = sum()
         val s2 = dot(this)
-        return Math.sqrt((s2 - s * s / size) / (size - 1))
+        return sqrt((s2 - s * s / size) / (size - 1))
     }
 
     /**
@@ -407,145 +605,165 @@ open class F64Array protected constructor(
      *
      * Optimized for dense arrays.
      */
-    open fun sum(): Double = flatten().sum()
-
-    /**
-     * Returns the sum of the elements using balanced summation.
-     *
-     * Optimized for dense arrays.
-     */
-    open fun balancedSum(): Double = flatten().balancedSum()
+    open fun sum(): Double {
+        val acc = KahanSum()
+        unrollToFlat().forEach { acc += it.sum() }
+        return acc.result()
+    }
 
     /**
      * Computes cumulative sum of the elements.
      *
-     * The operation is done **in place**.
-     *
-     * Available only for 1-D arrays.
+     * In-place method. Only implemented for flat arrays.
      */
-    open fun cumSum() {
-        check(nDim == 1)
-        val acc = KahanSum()
-        for (pos in 0..size - 1) {
-            acc += unsafeGet(pos)
-            unsafeSet(pos, acc.result())
-        }
-    }
+    open fun cumSum(): Unit = unsupported()
 
     /**
      * Returns the maximum element.
      *
+     * If any of array elements is NaN, the result is undefined but will be one of the array elements.
+     *
      * Optimized for dense arrays.
      */
-    open fun max(): Double = flatten().max()
+    open fun max(): Double = unrollToFlat().map { it.max() }.max() ?: Double.NEGATIVE_INFINITY
 
     /**
-     * Returns the unravelled index of the maximum element in the
-     * flattened array.
+     * Returns the index of the maximum element.
      *
-     * See [ravelMultiIndex] and [unravelIndex] for details.
+     * If any of array elements is NaN, the result is undefined but will be a valid index.
+     *
+     * Only implemented for flat arrays.
      */
-    open fun argMax(): Int = flatten().argMax()
+    open fun argMax(): Int = unsupported()
 
     /**
      * Returns the minimum element.
      *
+     * If any of array elements is NaN, the result is undefined but will be one of the array elements.
+     *
      * Optimized for dense arrays.
      */
-    open fun min(): Double = flatten().min()
+    open fun min(): Double = unrollToFlat().map { it.min() }.min() ?: Double.POSITIVE_INFINITY
 
     /**
-     * Returns the unravelled index of the minimum element in the
-     * flattened array.
+     * Returns the index of the minimum element.
      *
-     * See [ravelMultiIndex] and [unravelIndex] for details.
+     * If any of array elements is NaN, the result is undefined but will be a valid index.
+     *
+     * Only implemented for flat arrays.
      */
-    open fun argMin(): Int = flatten().argMin()
+    open fun argMin(): Int = unsupported()
 
     /**
-     * Computes the exponent of each element of this array.
+     * Replaces each element x of this array with its exponent exp(x).
      *
-     * Optimized for dense arrays.
+     * In-place method. Optimized for dense arrays.
      */
-    open fun expInPlace(): Unit = flatten().expInPlace()
+    open fun expInPlace(): Unit = unrollToFlat().forEach { it.expInPlace() }
 
+    /**
+     * A copying version of [expInPlace].
+     *
+     * Copying method. Optimized for dense arrays.
+     */
     fun exp() = copy().apply { expInPlace() }
 
     /**
-     * Computes exp(x) - 1 for each element of this array.
+     * Replaces each element x of this array with exp(x) - 1.
      *
-     * Optimized for dense arrays.
+     * In-place method. Optimized for dense arrays.
      *
      * @since 0.3.0
      */
-    open fun expm1InPlace(): Unit = flatten().expm1InPlace()
+    open fun expm1InPlace(): Unit = unrollToFlat().forEach { it.expm1InPlace() }
 
+    /**
+     * A copying version of [expm1InPlace].
+     *
+     * Copying method. Optimized for dense arrays.
+     */
     fun expm1() = copy().apply { expm1InPlace() }
 
     /**
-     * Computes the natural log of each element of this array.
+     * Replaces each element x of this array with its natural logarithm log(x).
      *
-     * Optimized for dense arrays.
+     * In-place method. Optimized for dense arrays.
      */
-    open fun logInPlace(): Unit = flatten().logInPlace()
+    open fun logInPlace(): Unit = unrollToFlat().forEach { it.logInPlace() }
 
+    /**
+     * A copying version of [logInPlace].
+     *
+     * Copying method. Optimized for dense arrays.
+     */
     fun log() = copy().apply { logInPlace() }
 
     /**
-     * Computes log(1 + x) for each element of this array.
+     * Replaces each element x of this array with log(1 + x).
      *
-     * Optimized for dense arrays.
+     * In-place method. Optimized for dense arrays.
      *
      * @since 0.3.0
      */
-    open fun log1pInPlace(): Unit = flatten().log1pInPlace()
+    open fun log1pInPlace(): Unit = unrollToFlat().forEach { it.log1pInPlace() }
 
+    /**
+     * A copying version of [log1pInPlace].
+     *
+     * Copying method. Optimized for dense arrays.
+     */
     fun log1p() = copy().apply { log1pInPlace() }
 
     /**
      * Rescales the elements so that the sum is 1.0.
      *
-     * The operation is done **in place**.
+     * In-place method.
      */
     fun rescale() {
         this /= sum() + Precision.EPSILON * shape.product().toDouble()
     }
 
     /**
-     * Rescales the element so that the exponent of the sum is 1.0.
-     *
-     * Optimized for dense arrays.
+     * Rescales the elements so that the sum of their exponents is 1.0.
      *
-     * The operation is done **in place**.
+     * In-place method.
      */
-    open fun logRescale() {
+    fun logRescale() {
         this -= logSumExp()
     }
 
     /**
      * Computes
      *
-     *   log(exp(v[0]) + ... + exp(v[size - 1]))
+     *     log(Σ_x exp(x))
      *
      * in a numerically stable way.
      */
-    // TODO: add axis.
-    open fun logSumExp(): Double = flatten().logSumExp()
+    open fun logSumExp(): Double = unrollToFlat().map { it.logSumExp() }.logSumExp()
 
-    infix fun logAddExp(other: F64Array): F64Array {
-        return copy().apply { logAddExp(other, this) }
-    }
+    /**
+     * Plus-assign for values stored as logarithms.
+     *
+     * In other words, the same as invoking
+     *
+     *     this[*i] = log(exp(this[*i]) + exp(other[*i]))
+     *
+     * for every valid i.
+     *
+     * In-place method.
+     */
+    open fun logAddExpAssign(other: F64Array): Unit = commonUnrollToFlat(other) { a, b -> a.logAddExpAssign(b) }
 
     /**
      * Computes elementwise
      *
-     *   log(exp(this[i]) + exp(other[i]))
+     *     log(exp(this[*i]) + exp(other[*i]))
      *
      * in a numerically stable way.
+     *
+     * Copying method.
      */
-    open fun logAddExp(other: F64Array, dst: F64Array) {
-        flatten().logAddExp(checkShape(other).flatten(), checkShape(dst).flatten())
-    }
+    infix fun logAddExp(other: F64Array): F64Array = copy().apply { logAddExpAssign(other) }
 
     operator fun unaryPlus() = this
 
@@ -553,92 +771,102 @@ open class F64Array protected constructor(
 
     operator fun plus(other: F64Array) = copy().apply { this += other }
 
-    open operator fun plusAssign(other: F64Array) {
-        flatten() += checkShape(other).flatten()
-    }
+    open operator fun plusAssign(other: F64Array): Unit = commonUnrollToFlat(other) { a, b -> a += b }
 
     operator fun plus(update: Double) = copy().apply { this += update }
 
-    open operator fun plusAssign(update: Double) {
-        flatten() += update
-    }
+    open operator fun plusAssign(update: Double): Unit = unrollToFlat().forEach { it += update }
 
     operator fun minus(other: F64Array) = copy().apply { this -= other }
 
-    open operator fun minusAssign(other: F64Array) {
-        flatten() -= checkShape(other).flatten()
-    }
+    open operator fun minusAssign(other: F64Array): Unit = commonUnrollToFlat(other) { a, b -> a -= b }
 
     operator fun minus(update: Double) = copy().apply { this -= update }
 
-    open operator fun minusAssign(update: Double) {
-        flatten() -= update
-    }
+    open operator fun minusAssign(update: Double): Unit = unrollToFlat().forEach { it -= update }
 
     operator fun times(other: F64Array) = copy().apply { this *= other }
 
-    open operator fun timesAssign(other: F64Array) {
-        flatten() *= checkShape(other).flatten()
-    }
+    open operator fun timesAssign(other: F64Array): Unit = commonUnrollToFlat(other) { a, b -> a *= b }
 
     operator fun times(update: Double) = copy().apply { this *= update }
 
-    open operator fun timesAssign(update: Double) {
-        flatten() *= update
-    }
+    open operator fun timesAssign(update: Double): Unit = unrollToFlat().forEach { it *= update }
 
     operator fun div(other: F64Array) = copy().apply { this /= other }
 
-    open operator fun divAssign(other: F64Array) {
-        flatten() /= checkShape(other).flatten()
-    }
+    open operator fun divAssign(other: F64Array): Unit = commonUnrollToFlat(other) { a, b -> a /= b }
 
     operator fun div(update: Double) = copy().apply { this /= update }
 
-    open operator fun divAssign(update: Double) {
-        flatten() /= update
-    }
+    open operator fun divAssign(update: Double): Unit = unrollToFlat().forEach { it /= update }
 
     /** Ensures a given array has the same dimensions as this array. */
-    fun checkShape(other: F64Array): F64Array {
+    protected fun checkShape(other: F64Array): F64Array {
         // Could relax this to "broadcastable".
-        require(this === other || Arrays.equals(shape, other.shape)) {
-            "operands shapes do not match " +
-            "${Arrays.toString(shape)} ${Arrays.toString(other.shape)}"
+        check(this === other || shape.contentEquals(other.shape)) {
+            "operands shapes do not match: ${shape.contentToString()} vs ${other.shape.contentToString()}"
         }
         return other
     }
 
-    // XXX must be overriden in flat array.
+    /**
+     * Returns a sequence of all array elements. Useful for testing.
+     */
+    internal open fun asSequence(): Sequence<Double> = unrollToFlat().flatMap { it.asSequence() }
+
+    /**
+     * Converts this array to a conventional Kotlin structure.
+     *
+     * For example, a vector will be converted to a [DoubleArray], a matrix will become `Array<DoubleArray>` etc.
+     *
+     * Copying method.
+     */
     open fun toArray(): Any = toGenericArray()
 
-    // XXX must be overriden in flat array.
+    /**
+     * Converts this array to an [Array].
+     *
+     * For example, a matrix will become `Array<DoubleArray>` etc.
+     *
+     * Copying method. Not implemented for flat arrays.
+     */
     open fun toGenericArray(): Array<*> = Array(size) { view(it).toArray() }
 
-    // XXX must be overriden in flat array.
+    /**
+     * Converts this vector to a [DoubleArray].
+     *
+     * Copying method. Only implemented for flat arrays.
+     */
     open fun toDoubleArray(): DoubleArray = throw UnsupportedOperationException()
 
-    // XXX must be overriden in flat array.
-    open fun toString(maxDisplay: Int,
-                 format: DecimalFormat = DecimalFormat("#.####")): String {
+    /**
+     * Creates a [String] representation of the given array.
+     *
+     * At most [maxDisplay] elements are printed for each dimension.
+     */
+    open fun toString(
+            maxDisplay: Int,
+            format: DecimalFormat = DecimalFormat("#.####")
+    ): String {
         val sb = StringBuilder()
         sb.append('[')
         if (maxDisplay < size) {
-            for (r in 0..maxDisplay / 2 - 1) {
+            for (r in 0 until maxDisplay / 2) {
                 sb.append(V[r].toString(maxDisplay, format)).append(", ")
             }
 
             sb.append("..., ")
 
             val leftover = maxDisplay - maxDisplay / 2
-            for (r in size - leftover..size - 1) {
+            for (r in size - leftover until size) {
                 sb.append(V[r].toString(maxDisplay, format))
                 if (r < size - 1) {
                     sb.append(", ")
                 }
             }
         } else {
-            for (r in 0..size - 1) {
+            for (r in 0 until size) {
                 sb.append(V[r].toString(maxDisplay, format))
                 if (r < size - 1) {
                     sb.append(", ")
@@ -652,51 +880,66 @@ open class F64Array protected constructor(
 
     override fun toString() = toString(8)
 
-    // XXX must be overriden in flat array.
     override fun equals(other: Any?): Boolean = when {
         this === other -> true
         other !is F64Array -> false
-        !Arrays.equals(shape, other.shape) -> false
-        else -> (0..size - 1).all { view(it) == other.view(it) }
+        !shape.contentEquals(other.shape) -> false
+        else -> (0 until size).all { view(it) == other.view(it) }
     }
 
-    // XXX must be overriden in flat array.
-    override fun hashCode(): Int = (0..size - 1).fold(1) { acc, r ->
+    override fun hashCode(): Int = (0 until size).fold(1) { acc, r ->
         31 * acc + view(r).hashCode()
     }
 
     companion object {
-        /** Creates a zero-filled array of a given [shape]. */
+        /**
+         * Creates a zero-filled flat array of a given [shape].
+         */
         operator fun invoke(vararg shape: Int): F64Array {
-            return F64FlatArray(DoubleArray(shape.product()))
-                    .reshape(*shape)
+            return F64FlatArray(DoubleArray(shape.product())).reshape(*shape)
         }
 
-        operator inline fun invoke(size: Int, block: (Int) -> Double): F64Array {
+        /**
+         * Creates a flat array of a given [size] and fills it using [block].
+         */
+        inline operator fun invoke(size: Int, block: (Int) -> Double): F64Array {
             return invoke(size).apply {
-                for (i in 0..size - 1) {
+                for (i in 0 until size) {
                     this[i] = block(i)
                 }
             }
         }
 
-        operator inline fun invoke(numRows: Int, numColumns: Int,
-                                   block: (Int, Int) -> Double): F64Array {
+        /**
+         * Creates a matrix with a given number of rows and columns and fills it using [block].
+         */
+        inline operator fun invoke(
+                numRows: Int,
+                numColumns: Int,
+                block: (Int, Int) -> Double
+        ): F64Array {
             return invoke(numRows, numColumns).apply {
-                for (r in 0..numRows - 1) {
-                    for (c in 0..numColumns - 1) {
+                for (r in 0 until numRows) {
+                    for (c in 0 until numColumns) {
                         this[r, c] = block(r, c)
                     }
                 }
             }
         }
 
-        operator inline fun invoke(depth: Int, numRows: Int, numColumns: Int,
-                                   block: (Int, Int, Int) -> Double): F64Array {
+        /**
+         * Creates a 3D array with given dimensions and fills it using [block].
+         */
+        inline operator fun invoke(
+                depth: Int,
+                numRows: Int,
+                numColumns: Int,
+                block: (Int, Int, Int) -> Double
+        ): F64Array {
             return invoke(depth, numRows, numColumns).apply {
-                for (d in 0..depth - 1) {
-                    for (r in 0..numRows - 1) {
-                        for (c in 0..numColumns - 1) {
+                for (d in 0 until depth) {
+                    for (r in 0 until numRows) {
+                        for (c in 0 until numColumns) {
                             this[d, r, c] = block(d, r, c)
                         }
                     }
@@ -704,7 +947,9 @@ open class F64Array protected constructor(
             }
         }
 
-        /** Creates a vector with given elements. */
+        /**
+         * Creates a vector from given elements.
+         */
         fun of(first: Double, vararg rest: Double): F64Array {
             val data = DoubleArray(rest.size + 1)
             data[0] = first
@@ -712,12 +957,18 @@ open class F64Array protected constructor(
             return data.asF64Array()
         }
 
-        /** Creates a vector filled with a given [init] element. */
+        /**
+         * Creates a vector filled with a given [init] element.
+         */
         fun full(size: Int, init: Double) = invoke(size).apply { fill(init) }
 
-        /** Creates a matrix filled with a given [init] element. */
-        fun full(vararg indices: Int, init: Double): F64Array {
-            return invoke(*indices).apply { fill(init) }
+        /**
+         * Creates an array filled with a given [init] element.
+         *
+         * Note that [init] must be a named argument.
+         */
+        fun full(vararg shape: Int, init: Double): F64Array {
+            return invoke(*shape).apply { fill(init) }
         }
 
         /**
@@ -727,11 +978,8 @@ open class F64Array protected constructor(
          */
         fun concatenate(first: F64Array, vararg rest: F64Array, axis: Int = 0): F64Array {
             for (other in rest) {
-                if (!Arrays.equals(other.shape.remove(axis),
-                                   first.shape.remove(axis))) {
-                    throw IllegalArgumentException(
-                            "input array shapes must be exactly equal " +
-                            "for all dimensions except $axis")
+                require(other.shape.remove(axis).contentEquals(first.shape.remove(axis))) {
+                    "input array shapes must be exactly equal for all dimensions except $axis"
                 }
             }
 
@@ -752,8 +1000,15 @@ open class F64Array protected constructor(
         }
 
         /** "Smart" constructor. */
-        internal operator fun invoke(data: DoubleArray, offset: Int,
-                                     strides: IntArray, shape: IntArray): F64Array {
+        internal operator fun invoke(
+                data: DoubleArray,
+                offset: Int,
+                strides: IntArray,
+                shape: IntArray
+        ): F64Array {
+            require(strides.isNotEmpty()) { "singleton arrays are not supported" }
+            require(shape.isNotEmpty()) { "singleton arrays are not supported" }
+            require(strides.size == shape.size) { "strides and shape size don't match" }
             return if (shape.size == 1) {
                 F64FlatArray(data, offset, strides.single(), shape.single())
             } else {
@@ -763,12 +1018,29 @@ open class F64Array protected constructor(
     }
 }
 
-/** Wraps a given array of elements. The array will not be copied. */
-fun DoubleArray.asF64Array(offset: Int = 0, size: Int = this.size): F64Array {
+/**
+ * Wraps a given array.
+ *
+ * Viewer method.
+ */
+fun DoubleArray.asF64Array(): F64Array {
+    return F64FlatArray(this, 0, 1, size)
+}
+
+/**
+ * Wraps a given region of the array.
+ *
+ * Viewer method.
+ */
+fun DoubleArray.asF64Array(offset: Int, size: Int): F64Array {
     return F64FlatArray(this, offset, 1, size)
 }
 
-/** Copies the elements of this nested array into [F64Array] of the same shape. */
+/**
+ * Copies the elements of this nested array into [F64Array] of the same shape.
+ *
+ * Copying method.
+ */
 fun Array<*>.toF64Array(): F64Array {
     val shape = guessShape()
     return flatten(this).asF64Array().reshape(*shape)
@@ -780,7 +1052,7 @@ private fun flatten(a: Array<*>): DoubleArray {
         when (it) {
             is DoubleArray -> Arrays.stream(it)
             is Array<*> -> Arrays.stream(flatten(it))
-            else -> unsupported()
+            else -> unsupported() // unreachable since [guessShape] will fail faster
         }
     }.toArray()
 }
@@ -788,10 +1060,44 @@ private fun flatten(a: Array<*>): DoubleArray {
 /** No validation, therefore "check". */
 private fun Array<*>.guessShape(): IntArray {
     check(isNotEmpty())
-    val tip = first()
-    return when (tip) {
-        is DoubleArray -> intArrayOf(size, tip.size)
-        is Array<*> -> intArrayOf(size) + tip.guessShape()
+    return when (val tip = first()) {
+        is DoubleArray -> {
+            (1 until size).forEach {
+                val el = get(it)
+                check(el is DoubleArray) {
+                    "array has elements of different types: element 0 is a double array, " +
+                            "but element $it is ${el?.let { it::class.java.name } ?: "null"}"
+                }
+                check(el.size == tip.size) {
+                    "array has elements of different sizes: element 0 is ${tip.size} long, " +
+                            "but element $it is ${el.size} long"
+                }
+            }
+            intArrayOf(size, tip.size)
+        }
+        is Array<*> -> {
+            (1 until size).forEach {
+                val el = get(it)
+                check(el is Array<*>) {
+                    "array has elements of different types: element 0 is a generic array, " +
+                            "but element $it is ${el?.let { it::class.java.name } ?: "null"}"
+                }
+                check(el.size == tip.size) {
+                    "array has elements of different sizes: element 0 is ${tip.size} long, " +
+                            "but element $it is ${el.size} long"
+                }
+            }
+            intArrayOf(size) + tip.guessShape()
+        }
         else -> unsupported()
     }
-}
\ No newline at end of file
+}
+
+/**
+ * A special object used to denote all indices.
+ *
+ * @since 0.1.1 Renamed to `_I` because all-underscore names are reserved
+ *              for internal use in Kotlin.
+ */
+@Suppress("ClassName")
+object _I
diff --git a/src/main/kotlin/org/jetbrains/bio/viktor/F64DenseFlatArray.kt b/src/main/kotlin/org/jetbrains/bio/viktor/F64DenseFlatArray.kt
index 2f10a60..19752ca 100644
--- a/src/main/kotlin/org/jetbrains/bio/viktor/F64DenseFlatArray.kt
+++ b/src/main/kotlin/org/jetbrains/bio/viktor/F64DenseFlatArray.kt
@@ -1,12 +1,12 @@
 package org.jetbrains.bio.viktor
 
 /**
- * A contiguous strided vector.
+ * A contiguous vector.
  *
  * @author Sergei Lebedev
  * @since 0.1.0
  */
-open class F64DenseFlatArray protected constructor(
+sealed class F64DenseFlatArray(
         data: DoubleArray,
         offset: Int,
         size: Int
@@ -32,7 +32,7 @@ open class F64DenseFlatArray protected constructor(
         const val DENSE_SPLIT_SIZE = 16
 
         internal fun create(data: DoubleArray, offset: Int, size: Int): F64DenseFlatArray {
-            return if (size <= DENSE_SPLIT_SIZE || !Loader.optimizationSupported) {
+            return if (size <= DENSE_SPLIT_SIZE || !Loader.nativeLibraryLoaded) {
                 F64SmallDenseArray(data, offset, size)
             } else {
                 F64LargeDenseArray(data, offset, size)
@@ -42,7 +42,7 @@ open class F64DenseFlatArray protected constructor(
 }
 
 /**
- * A contiguous strided vector of size at most [F64DenseFlatArray.DENSE_SPLIT_SIZE].
+ * A contiguous vector of size at most [F64DenseFlatArray.DENSE_SPLIT_SIZE].
  *
  * @author Sergei Lebedev
  * @since 0.1.0
@@ -65,11 +65,13 @@ class F64LargeDenseArray(
         size: Int
 ) : F64DenseFlatArray(data, offset, size) {
 
-    override fun sd() = NativeSpeedups.sd(data, offset, size)
+    override fun sd() = NativeSpeedups.unsafeSD(data, offset, size)
 
-    override fun balancedSum() = NativeSpeedups.sum(data, offset, size)
+    override fun sum() = NativeSpeedups.unsafeSum(data, offset, size)
 
-    override fun cumSum() = NativeSpeedups.cumSum(data, offset, data, offset, size)
+    override fun cumSum() {
+        if (!NativeSpeedups.unsafeCumSum(data, offset, size)) super.cumSum()
+    }
 
     override fun min() = NativeSpeedups.unsafeMin(data, offset, size)
 
@@ -77,93 +79,85 @@ class F64LargeDenseArray(
 
     override fun dot(other: F64Array): Double {
         return if (other is F64LargeDenseArray) {
-            require(other.size == size) { "non-conformable arrays" }
+            checkShape(other)
             NativeSpeedups.unsafeDot(data, offset, other.data, other.offset, size)
         } else {
             super.dot(other)
         }
     }
 
-    override fun expInPlace() = NativeSpeedups.unsafeExp(data, offset, data, offset, size)
-
-    override fun expm1InPlace() = NativeSpeedups.unsafeExpm1(data, offset, data, offset, size)
+    override fun expInPlace() {
+        if (!NativeSpeedups.unsafeExpInPlace(data, offset, size)) super.expInPlace()
+    }
 
-    override fun logInPlace() = NativeSpeedups.unsafeLog(data, offset, data, offset, size)
+    override fun expm1InPlace() {
+        if (!NativeSpeedups.unsafeExpm1InPlace(data, offset, size)) super.expm1InPlace()
+    }
 
-    override fun log1pInPlace() = NativeSpeedups.unsafeLog1p(data, offset, data, offset, size)
+    override fun logInPlace() {
+        if (!NativeSpeedups.unsafeLogInPlace(data, offset, size)) super.logInPlace()
+    }
 
-    override fun logRescale() = NativeSpeedups.unsafeLogRescale(data, offset, data, offset, size)
+    override fun log1pInPlace(){
+        if (!NativeSpeedups.unsafeLog1pInPlace(data, offset, size)) super.log1pInPlace()
+    }
 
     override fun logSumExp() = NativeSpeedups.unsafeLogSumExp(data, offset, size)
 
-    override fun logAddExp(other: F64Array, dst: F64Array) {
-        if (other is F64DenseFlatArray && dst is F64DenseFlatArray) {
+    override fun logAddExpAssign(other: F64Array) {
+        if (other is F64LargeDenseArray) {
             checkShape(other)
-            checkShape(dst)
-            NativeSpeedups.unsafeLogAddExp(
-                data, offset,
-                other.data, other.offset,
-                dst.data, dst.offset, size
-            )
-        } else {
-            super.logAddExp(other, dst)
+            if (NativeSpeedups.unsafeLogAddExp(data, offset, other.data, other.offset, size)) return
         }
+        super.logAddExpAssign(other)
     }
 
-    override fun unaryMinus(): F64Array {
-        val v = copy()
-        NativeSpeedups.unsafeNegate(data, offset, v.data, v.offset, v.size)
-        return v
-    }
+    override fun unaryMinus() = copy().apply { NativeSpeedups.unsafeNegateInPlace(data, offset, size) }
 
-    override fun plusAssign(update: Double) =
-            NativeSpeedups.unsafePlusScalar(data, offset, update, data, offset, size)
+    override fun plusAssign(update: Double) {
+        if (!NativeSpeedups.unsafePlusScalarAssign(data, offset, size, update)) super.plusAssign(update)
+    }
 
     override fun plusAssign(other: F64Array) {
         if (other is F64DenseFlatArray) {
             checkShape(other)
-            NativeSpeedups.unsafePlus(data, offset,
-                other.data, other.offset,
-                data, offset, size)
-        } else {
-            super.plusAssign(other)
+            if (NativeSpeedups.unsafePlusAssign(data, offset, other.data, other.offset, size)) return
         }
+        super.plusAssign(other)
     }
 
-    override fun minusAssign(update: Double) =
-            NativeSpeedups.unsafeMinusScalar(data, offset, update, data, offset, size)
+    override fun minusAssign(update: Double) {
+        if (!NativeSpeedups.unsafeMinusScalarAssign(data, offset, size, update)) super.minusAssign(update)
+    }
 
     override fun minusAssign(other: F64Array) {
         if (other is F64DenseFlatArray) {
             checkShape(other)
-            NativeSpeedups.unsafeMinus(data, offset,
-                other.data, other.offset,
-                data, offset, size)
-        } else {
-            super.minusAssign(other)
+            if (NativeSpeedups.unsafeMinusAssign(data, offset, other.data, other.offset, size)) return
         }
+        super.minusAssign(other)
     }
 
-    override fun timesAssign(update: Double) =
-            NativeSpeedups.unsafeTimesScalar(data, offset, update, data, offset, size)
+    override fun timesAssign(update: Double) {
+        if (!NativeSpeedups.unsafeTimesScalarAssign(data, offset, size, update)) super.timesAssign(update)
+    }
 
     override fun timesAssign(other: F64Array) {
         if (other is F64DenseFlatArray) {
-            NativeSpeedups.unsafeTimes(data, offset,
-                other.data, other.offset,
-                data, offset, size)
-        } else {
-            super.timesAssign(other)
+            checkShape(other)
+            if (NativeSpeedups.unsafeTimesAssign(data, offset, other.data, other.offset, size)) return
         }
+        super.timesAssign(other)
     }
 
-    override fun divAssign(update: Double) = NativeSpeedups.unsafeDivScalar(data, offset, update, data, offset, size)
+    override fun divAssign(update: Double) {
+        if (!NativeSpeedups.unsafeDivScalarAssign(data, offset, size, update)) super.divAssign(update)
+    }
 
     override fun divAssign(other: F64Array) {
         if (other is F64DenseFlatArray) {
-            NativeSpeedups.unsafeDiv(data, offset,
-                other.data, other.offset,
-                data, offset, size)
+            checkShape(other)
+            if (NativeSpeedups.unsafeDivAssign(data, offset, other.data, other.offset, size)) return
         } else {
             super.divAssign(other)
         }
diff --git a/src/main/kotlin/org/jetbrains/bio/viktor/F64FlatArray.kt b/src/main/kotlin/org/jetbrains/bio/viktor/F64FlatArray.kt
index f3cc3fb..aa9f71f 100644
--- a/src/main/kotlin/org/jetbrains/bio/viktor/F64FlatArray.kt
+++ b/src/main/kotlin/org/jetbrains/bio/viktor/F64FlatArray.kt
@@ -3,29 +3,27 @@ package org.jetbrains.bio.viktor
 import org.apache.commons.math3.util.FastMath
 import org.apache.commons.math3.util.Precision
 import java.text.DecimalFormat
+import kotlin.math.ln
+import kotlin.math.ln1p
 
 /**
  * An 1-dimensional specialization of [F64Array].
  *
  * @since 0.4.0
  */
-open class F64FlatArray protected constructor(data: DoubleArray, offset: Int,
-                                              stride: Int, size: Int)
-:
-        F64Array(data, offset, intArrayOf(stride), intArrayOf(size)) {
+open class F64FlatArray protected constructor(
+        data: DoubleArray,
+        offset: Int,
+        stride: Int,
+        size: Int
+) : F64Array(data, offset, intArrayOf(stride), intArrayOf(size)) {
 
-    override fun flatten() = this
+    override val isFlattenable get() = true
 
-    /**
-     * Constructs a column-vector view of this vector in O(1) time.
-     *
-     * A column vector is a matrix with [size] rows and a single column,
-     * e.g. `[1, 2, 3]^T` is `[[1], [2], [3]]`.
-     */
-    override fun transpose() = reshape(size, 1)
+    override fun flatten() = this
 
     override fun contains(other: Double): Boolean {
-        for (pos in 0..size - 1) {
+        for (pos in 0 until size) {
             if (unsafeGet(pos) == other) {
                 return true
             }
@@ -38,13 +36,13 @@ open class F64FlatArray protected constructor(data: DoubleArray, offset: Int,
 
     override fun copyTo(other: F64Array) {
         checkShape(other)
-        for (pos in 0..size - 1) {
+        for (pos in 0 until size) {
             other.unsafeSet(pos, unsafeGet(pos))
         }
     }
 
     override fun fill(init: Double) {
-        for (pos in 0..size - 1) {
+        for (pos in 0 until size) {
             unsafeSet(pos, init)
         }
     }
@@ -52,8 +50,8 @@ open class F64FlatArray protected constructor(data: DoubleArray, offset: Int,
     override fun reorder(indices: IntArray, axis: Int) {
         if (axis == 0) {
             reorderInternal(this, indices, axis,
-                            get = { pos -> unsafeGet(pos) },
-                            set = { pos, value -> unsafeSet(pos, value) })
+                get = { pos -> unsafeGet(pos) },
+                set = { pos, value -> unsafeSet(pos, value) })
         } else {
             unsupported()
         }
@@ -65,8 +63,8 @@ open class F64FlatArray protected constructor(data: DoubleArray, offset: Int,
 
     override fun dot(other: F64Array) = balancedDot { other[it] }
 
-    /** See [balancedSum]. */
-    private inline fun F64FlatArray.balancedDot(getter: (Int) -> Double): Double {
+    /** See [sum]. */
+    private inline fun balancedDot(getter: (Int) -> Double): Double {
         var accUnaligned = 0.0
         var remaining = size
         while (remaining % 4 != 0) {
@@ -79,10 +77,8 @@ open class F64FlatArray protected constructor(data: DoubleArray, offset: Int,
         var i = 0
         while (i < remaining) {
             // Shift.
-            var v = (unsafeGet(i) * getter(i) +
-                     unsafeGet(i + 1) * getter(i + 1))
-            val w = (unsafeGet(i + 2) * getter(i + 2) +
-                     unsafeGet(i + 3) * getter(i + 3))
+            var v = unsafeGet(i) * getter(i) + unsafeGet(i + 1) * getter(i + 1)
+            val w = unsafeGet(i + 2) * getter(i + 2) + unsafeGet(i + 3) * getter(i + 3)
             v += w
 
             // Reduce.
@@ -110,123 +106,119 @@ open class F64FlatArray protected constructor(data: DoubleArray, offset: Int,
      *
      * Dalton et al. "SIMDizing pairwise sums", 2014.
      */
-    override fun balancedSum(): Double {
+    override fun sum(): Double {
         var accUnaligned = 0.0
         var remaining = size
         while (remaining % 4 > 0) {
-            accUnaligned += unsafeGet(--remaining)
-        }
-
+                accUnaligned += unsafeGet(--remaining)
+            }
         val stack = DoubleArray(31 - 2)
         var p = 0
         var i = 0
         while (i < remaining) {
-            // Shift.
-            var v = unsafeGet(i) + unsafeGet(i + 1)
-            val w = unsafeGet(i + 2) + unsafeGet(i + 3)
-            v += w
-
-            // Reduce.
-            var bitmask = 4
-            while (i and bitmask != 0) {
-                v += stack[--p]
-                bitmask = bitmask shl 1
+                // Shift.
+                var v = unsafeGet(i) + unsafeGet(i + 1)
+                val w = unsafeGet(i + 2) + unsafeGet(i + 3)
+                v += w
+
+                // Reduce.
+                var bitmask = 4
+                while (i and bitmask != 0) {
+                    v += stack[--p]
+                    bitmask = bitmask shl 1
+                }
+                stack[p++] = v
+                i += 4
             }
-            stack[p++] = v
-            i += 4
-        }
-
         var acc = 0.0
         while (p > 0) {
-            acc += stack[--p]
-        }
-
+                acc += stack[--p]
+            }
         return acc + accUnaligned
     }
 
-    override fun sum() = balancedSum()
+    override fun cumSum() {
+        val acc = KahanSum()
+        for (pos in 0 until size) {
+            acc += unsafeGet(pos)
+            unsafeSet(pos, acc.result())
+        }
+    }
+
 
     override fun min() = unsafeGet(argMin())
 
     override fun argMin(): Int {
-        require(size > 0) { "no data" }
-        var minPos = 0
         var minValue = Double.POSITIVE_INFINITY
-        for (pos in 0..size - 1) {
+        var res = 0
+        for (pos in 0 until size) {
             val value = unsafeGet(pos)
-            if (value < minValue) {
-                minPos = pos
+            if (value <= minValue) {
                 minValue = value
+                res = pos
             }
         }
-
-        return minPos
+        return res
     }
 
     override fun max() = unsafeGet(argMax())
 
     override fun argMax(): Int {
-        require(size > 0) { "no data" }
-        var maxPos = 0
         var maxValue = Double.NEGATIVE_INFINITY
-        for (pos in 0..size - 1) {
+        var res = 0
+        for (pos in 0 until size) {
             val value = unsafeGet(pos)
-            if (value > maxValue) {
-                maxPos = pos
+            if (value >= maxValue) {
                 maxValue = value
+                res = pos
             }
         }
-
-        return maxPos
+        return res
     }
 
     override fun expInPlace() {
-        for (pos in 0..size - 1) {
+        for (pos in 0 until size) {
             unsafeSet(pos, FastMath.exp(unsafeGet(pos)))
         }
     }
 
     override fun expm1InPlace() {
-        for (pos in 0..size - 1) {
+        for (pos in 0 until size) {
             unsafeSet(pos, FastMath.expm1(unsafeGet(pos)))
         }
     }
 
     override fun logInPlace() {
-        for (pos in 0..size - 1) {
-            unsafeSet(pos, Math.log(unsafeGet(pos)))
+        for (pos in 0 until size) {
+            unsafeSet(pos, ln(unsafeGet(pos)))
         }
     }
 
     override fun log1pInPlace() {
-        for (pos in 0..size - 1) {
-            unsafeSet(pos, Math.log1p(unsafeGet(pos)))
+        for (pos in 0 until size) {
+            unsafeSet(pos, ln1p(unsafeGet(pos)))
         }
     }
 
     override fun logSumExp(): Double {
         val offset = max()
         val acc = KahanSum()
-        for (pos in 0..size - 1) {
+        for (pos in 0 until size) {
             acc += FastMath.exp(unsafeGet(pos) - offset)
         }
 
-        return Math.log(acc.result()) + offset
+        return ln(acc.result()) + offset
     }
 
-    override fun logAddExp(other: F64Array, dst: F64Array) {
-        checkShape(other)
-        checkShape(dst)
-        for (pos in 0..size - 1) {
-            dst.unsafeSet(pos, unsafeGet(pos) logAddExp other.unsafeGet(pos))
+    override fun logAddExpAssign(other: F64Array) {
+        for (pos in 0 until size) {
+            unsafeSet(pos, unsafeGet(pos) logAddExp other.unsafeGet(pos))
         }
     }
 
     override fun unaryMinus(): F64Array {
-        // XXX 'v' is always dense but it might be too small to benefit
-        //     from SIMD.
         val v = copy()
-        for (pos in 0..size - 1) {
+        for (pos in 0 until size) {
             v.unsafeSet(pos, -unsafeGet(pos))
         }
 
@@ -235,56 +227,77 @@ open class F64FlatArray protected constructor(data: DoubleArray, offset: Int,
 
     override fun plusAssign(other: F64Array) {
         checkShape(other)
-        for (pos in 0..size - 1) {
+        for (pos in 0 until size) {
             unsafeSet(pos, unsafeGet(pos) + other.unsafeGet(pos))
         }
     }
 
     override fun plusAssign(update: Double) {
-        for (pos in 0..size - 1) {
+        for (pos in 0 until size) {
             unsafeSet(pos, unsafeGet(pos) + update)
         }
     }
 
     override fun minusAssign(other: F64Array) {
         checkShape(other)
-        for (pos in 0..size - 1) {
+        for (pos in 0 until size) {
             unsafeSet(pos, unsafeGet(pos) - other.unsafeGet(pos))
         }
     }
 
     override fun minusAssign(update: Double) {
-        for (pos in 0..size - 1) {
+        for (pos in 0 until size) {
             unsafeSet(pos, unsafeGet(pos) - update)
         }
     }
 
     override fun timesAssign(other: F64Array) {
         checkShape(other)
-        for (pos in 0..size - 1) {
+        for (pos in 0 until size) {
             unsafeSet(pos, unsafeGet(pos) * other.unsafeGet(pos))
         }
     }
 
     override fun timesAssign(update: Double) {
-        for (pos in 0..size - 1) {
+        for (pos in 0 until size) {
             unsafeSet(pos, unsafeGet(pos) * update)
         }
     }
 
     override fun divAssign(other: F64Array) {
         checkShape(other)
-        for (pos in 0..size - 1) {
+        for (pos in 0 until size) {
             unsafeSet(pos, unsafeGet(pos) / other.unsafeGet(pos))
         }
     }
 
     override fun divAssign(update: Double) {
-        for (pos in 0..size - 1) {
+        for (pos in 0 until size) {
             unsafeSet(pos, unsafeGet(pos) / update)
         }
     }
 
+    override fun reshape(vararg shape: Int): F64Array {
+        shape.forEach { require(it > 0) { "Shape must be positive but was $it" } }
+        check(shape.product() == size) {
+            "total size of the new array must be unchanged"
+        }
+        return when {
+            this.shape.contentEquals(shape) -> this
+            else -> {
+                val reshaped = shape.clone()
+                reshaped[reshaped.lastIndex] = strides.single()
+                for (i in reshaped.lastIndex - 1 downTo 0) {
+                    reshaped[i] = reshaped[i + 1] * shape[i + 1]
+                }
+
+                invoke(data, offset, reshaped, shape)
+            }
+        }
+    }
+
+    override fun asSequence(): Sequence<Double> = (0 until size).asSequence().map(this::unsafeGet)
+
     override fun toArray() = toDoubleArray()
 
     override fun toGenericArray() = unsupported()
@@ -307,21 +320,21 @@ open class F64FlatArray protected constructor(data: DoubleArray, offset: Int,
         sb.append('[')
 
         if (maxDisplay < size) {
-            for (pos in 0..maxDisplay / 2 - 1) {
+            for (pos in 0 until maxDisplay / 2) {
                 sb.append(format.safeFormat(this[pos])).append(", ")
             }
 
             sb.append("..., ")
 
             val leftover = maxDisplay - maxDisplay / 2
-            for (pos in size - leftover..size - 1) {
+            for (pos in size - leftover until size) {
                 sb.append(format.safeFormat(this[pos]))
                 if (pos < size - 1) {
                     sb.append(", ")
                 }
             }
         } else {
-            for (pos in 0..size - 1) {
+            for (pos in 0 until size) {
                 sb.append(format.safeFormat(this[pos]))
                 if (pos < size - 1) {
                     sb.append(", ")
@@ -335,23 +348,27 @@ open class F64FlatArray protected constructor(data: DoubleArray, offset: Int,
 
     override fun equals(other: Any?) = when {
         this === other -> true
-        other !is F64Array -> false
+        other !is F64FlatArray -> false // an instance of F64Array can't be flat
         size != other.size -> false
-        else -> (0..size - 1).all {
+        else -> (0 until size).all {
             Precision.equals(unsafeGet(it), other.unsafeGet(it))
         }
     }
 
-    override fun hashCode() = (0..size - 1).fold(1) { acc, pos ->
+    override fun hashCode() = (0 until size).fold(1) { acc, pos ->
         // XXX calling #hashCode results in boxing, see KT-7571.
         31 * acc + java.lang.Double.hashCode(unsafeGet(pos))
     }
 
     companion object {
-        internal operator fun invoke(data: DoubleArray, offset: Int = 0,
-                                     stride: Int = 1,
-                                     size: Int = data.size): F64FlatArray {
-            require(offset + (size - 1) * stride <= data.size) { "not enough data" }
+        internal operator fun invoke(
+                data: DoubleArray,
+                offset: Int = 0,
+                stride: Int = 1,
+                size: Int = data.size
+        ): F64FlatArray {
+            // require(offset + (size - 1) * stride < data.size) { "not enough data" }
+            // this check is not needed since we control all invocations of this internal method
             return if (stride == 1) {
                 F64DenseFlatArray.create(data, offset, size)
             } else {
diff --git a/src/main/kotlin/org/jetbrains/bio/viktor/Internals.kt b/src/main/kotlin/org/jetbrains/bio/viktor/Internals.kt
index e671953..5ce93af 100644
--- a/src/main/kotlin/org/jetbrains/bio/viktor/Internals.kt
+++ b/src/main/kotlin/org/jetbrains/bio/viktor/Internals.kt
@@ -1,12 +1,12 @@
 package org.jetbrains.bio.viktor
 
 @Suppress("nothing_to_inline")
-inline fun IntArray.product() = reduce(Int::times)
+inline fun IntArray.product() = fold(1, Int::times)
 
 internal fun IntArray.remove(pos: Int) = when (pos) {
     0 -> sliceArray(1..lastIndex)
-    lastIndex -> sliceArray(0..lastIndex - 1)
-    else -> sliceArray(0..pos - 1) + sliceArray(pos + 1..lastIndex)
+    lastIndex -> sliceArray(0 until lastIndex)
+    else -> sliceArray(0 until pos) + sliceArray(pos + 1..lastIndex)
 }
 
 @Suppress("nothing_to_inline")
diff --git a/src/main/kotlin/org/jetbrains/bio/viktor/Loader.kt b/src/main/kotlin/org/jetbrains/bio/viktor/Loader.kt
index ee2f585..e86b6a4 100644
--- a/src/main/kotlin/org/jetbrains/bio/viktor/Loader.kt
+++ b/src/main/kotlin/org/jetbrains/bio/viktor/Loader.kt
@@ -37,16 +37,17 @@ internal object Loader {
     private val LOG = Logger.getLogger(Loader::class.java)
 
     /** If `true` vector operations will be SIMD-optimized. */
-    internal var nativeLibraryLoaded = false
+    internal var nativeLibraryLoaded: Boolean = false
+        private set
+
+    private var optimizationSupported = false
+    private var architectureSupported = false
 
-    internal var optimizationSupported = false
-    internal var architectureSupported = false
 
     fun ensureLoaded() {}
 
     private val arch: String get() {
-        val arch = System.getProperty("os.arch").toLowerCase()
-        return when (arch) {
+        return when (val arch = System.getProperty("os.arch").toLowerCase()) {
             "amd64", "x86_64" -> "x86_64"
             else -> error("unsupported architecture: $arch")
         }
@@ -108,7 +109,3 @@ Build viktor for your system from source as described in https://github.com/JetB
 
 internal external fun isAvxSupported(): Boolean
 internal external fun isSse2Supported(): Boolean
-
-fun main(args: Array<String>) {
-    Loader.ensureLoaded()
-}
diff --git a/src/main/kotlin/org/jetbrains/bio/viktor/Magic.kt b/src/main/kotlin/org/jetbrains/bio/viktor/Magic.kt
deleted file mode 100644
index 2114225..0000000
--- a/src/main/kotlin/org/jetbrains/bio/viktor/Magic.kt
+++ /dev/null
@@ -1,49 +0,0 @@
-package org.jetbrains.bio.viktor
-
-import java.util.*
-
-/**
- * A special object used to denote all indices.
- *
- * @since 0.1.1 Renamed to `_I` because all-underscore names are reserved
- *              for internal use in Kotlin.
- */
-object _I {}
-
-/**
- * Converts a multi-dimensional index to an index into the flattened array.
- *
- * @since 0.4.0
- * @see unravelIndex
- */
-fun ravelMultiIndex(indices: IntArray, shape: IntArray): Int {
-    var index = 0
-    var stride = 1
-    for (i in shape.indices.reversed()) {
-        index += indices[i] * stride
-        stride *= shape[i]
-    }
-
-    return index
-}
-
-/**
- * The inverse of [ravelMultiIndex].
- *
- * @since 0.4.0
- */
-fun unravelIndex(index: Int, shape: IntArray): IntArray {
-    if (index < 0 || index >= shape.product()) {
-        throw IndexOutOfBoundsException(
-                "invalid index for shape ${Arrays.toString(shape)}")
-    }
-
-    var remaining = index
-    val indices = IntArray(shape.size)
-    for (i in shape.indices.reversed()) {
-        indices[i] = remaining % shape[i]
-        remaining /= shape[i]
-    }
-
-    return indices
-}
\ No newline at end of file
diff --git a/src/main/kotlin/org/jetbrains/bio/viktor/MoreMath.kt b/src/main/kotlin/org/jetbrains/bio/viktor/MoreMath.kt
index 6ee54e4..929647d 100644
--- a/src/main/kotlin/org/jetbrains/bio/viktor/MoreMath.kt
+++ b/src/main/kotlin/org/jetbrains/bio/viktor/MoreMath.kt
@@ -1,6 +1,8 @@
 package org.jetbrains.bio.viktor
 
 import org.apache.commons.math3.util.FastMath
+import kotlin.math.abs
+import kotlin.math.max
 
 /**
  * Evaluates log(exp(a) + exp(b)) using the following trick
@@ -12,16 +14,19 @@ import org.apache.commons.math3.util.FastMath
 infix fun Double.logAddExp(b: Double): Double {
     val a = this
     return when {
-        a.isInfinite() && a < 0 -> b
-        b.isInfinite() && b < 0 -> a
-        else -> Math.max(a, b) + StrictMath.log1p(FastMath.exp(-Math.abs(a - b)))
+        a.isNaN() || b.isNaN() -> Double.NaN
+        a.isInfinite() -> if (a < 0) b else a
+        b.isInfinite() -> if (b < 0) a else b
+        else -> max(a, b) + StrictMath.log1p(FastMath.exp(-abs(a - b)))
     }
 }
 
+fun Sequence<Double>.logSumExp(): Double = toList().toDoubleArray().asF64Array().logSumExp()
+
 /**
  * Kahan-Babuska summation.
  *
- * See http://cage.ugent.be/~klein/papers/floating-point.pdf for details.
+ * See https://en.wikipedia.org/wiki/Kahan_summation_algorithm for details.
  *
  * @author Alexey Dievsky
  * @since 0.1.0
@@ -32,10 +37,10 @@ class KahanSum @JvmOverloads constructor(private var accumulator: Double = 0.0)
     /** Supplies a number to be added to the accumulator. */
     fun feed(value: Double): KahanSum {
         val t = accumulator + value
-        if (Math.abs(accumulator) >= Math.abs(value)) {
-            compensator += (accumulator - t) + value
+        compensator += if (abs(accumulator) >= abs(value)) {
+            (accumulator - t) + value
         } else {
-            compensator += (value - t) + accumulator
+            (value - t) + accumulator
         }
 
         accumulator = t
diff --git a/src/main/kotlin/org/jetbrains/bio/viktor/NativeSpeedups.kt b/src/main/kotlin/org/jetbrains/bio/viktor/NativeSpeedups.kt
index b2c6ad0..92f3003 100644
--- a/src/main/kotlin/org/jetbrains/bio/viktor/NativeSpeedups.kt
+++ b/src/main/kotlin/org/jetbrains/bio/viktor/NativeSpeedups.kt
@@ -1,82 +1,88 @@
 package org.jetbrains.bio.viktor
 
 internal object NativeSpeedups {
+
     init {
         Loader.ensureLoaded()
     }
 
-    external fun unsafePlus(src1: DoubleArray, srcOffset1: Int,
-                            src2: DoubleArray, srcOffset2: Int,
-                            dst: DoubleArray, dstOffset: Int, length: Int)
+    external fun unsafePlusAssign(
+            dst: DoubleArray,
+            dstOffset: Int,
+            src: DoubleArray,
+            srcOffset: Int,
+            length: Int
+    ): Boolean
 
-    external fun unsafeMinus(src1: DoubleArray, srcOffset1: Int,
-                             src2: DoubleArray, srcOffset2: Int,
-                             dst: DoubleArray, dstOffset: Int, length: Int)
+    external fun unsafeMinusAssign(
+            dst: DoubleArray,
+            dstOffset: Int,
+            src: DoubleArray,
+            srcOffset: Int,
+            length: Int
+    ): Boolean
 
-    external fun unsafeTimes(src1: DoubleArray, srcOffset1: Int,
-                             src2: DoubleArray, srcOffset2: Int,
-                             dst: DoubleArray, dstOffset: Int, length: Int)
+    external fun unsafeTimesAssign(
+            dst: DoubleArray,
+            dstOffset: Int,
+            src: DoubleArray,
+            srcOffset: Int,
+            length: Int
+    ): Boolean
 
-    external fun unsafeDiv(src1: DoubleArray, srcOffset1: Int,
-                           src2: DoubleArray, srcOffset2: Int,
-                           dst: DoubleArray, dstOffset: Int, length: Int)
+    external fun unsafeDivAssign(
+            dst: DoubleArray,
+            dstOffset: Int,
+            src: DoubleArray,
+            srcOffset: Int,
+            length: Int
+    ): Boolean
 
-    external fun unsafeNegate(src1: DoubleArray, srcOffset1: Int,
-                              dst: DoubleArray, dstOffset: Int, length: Int)
+    external fun unsafeNegateInPlace(dst: DoubleArray, dstOffset: Int, length: Int): Boolean
 
-    external fun unsafePlusScalar(src1: DoubleArray, srcOffset1: Int, update: Double,
-                                  dst: DoubleArray, dstOffset: Int, length: Int)
+    external fun unsafePlusScalarAssign(dst: DoubleArray, dstOffset: Int, length: Int, update: Double): Boolean
 
-    external fun unsafeMinusScalar(src1: DoubleArray, srcOffset1: Int, update: Double,
-                                   dst: DoubleArray, dstOffset: Int, length: Int)
+    external fun unsafeMinusScalarAssign(dst: DoubleArray, dstOffset: Int, length: Int, update: Double): Boolean
 
-    external fun unsafeTimesScalar(src1: DoubleArray, srcOffset1: Int, update: Double,
-                                   dst: DoubleArray, dstOffset: Int, length: Int)
+    external fun unsafeTimesScalarAssign(dst: DoubleArray, dstOffset: Int, length: Int, update: Double): Boolean
 
-    external fun unsafeDivScalar(src1: DoubleArray, srcOffset1: Int, update: Double,
-                                 dst: DoubleArray, dstOffset: Int, length: Int)
+    external fun unsafeDivScalarAssign(dst: DoubleArray, dstOffset: Int, length: Int, update: Double): Boolean
 
-    external fun unsafeScalarDiv(update: Double, src1: DoubleArray, srcOffset1: Int,
-                                 dst: DoubleArray, dstOffset: Int, length: Int)
+    external fun unsafeScalarDivAssign(dst: DoubleArray, dstOffset: Int, length: Int, update: Double): Boolean
 
     external fun unsafeMin(values: DoubleArray, offset: Int, length: Int): Double
 
     external fun unsafeMax(values: DoubleArray, offset: Int, length: Int): Double
 
-    external fun unsafeExp(src: DoubleArray, srcOffset: Int,
-                           dst: DoubleArray, dstOffset: Int, length: Int)
+    external fun unsafeExpInPlace(dst: DoubleArray, dstOffset: Int, length: Int): Boolean
 
-    external fun unsafeExpm1(src: DoubleArray, srcOffset: Int,
-                             dst: DoubleArray, dstOffset: Int, length: Int)
+    external fun unsafeExpm1InPlace(dst: DoubleArray, dstOffset: Int, length: Int): Boolean
 
-    external fun unsafeLog(src: DoubleArray, srcOffset: Int,
-                           dst: DoubleArray, dstOffset: Int, length: Int)
+    external fun unsafeLogInPlace(dst: DoubleArray, dstOffset: Int, length: Int): Boolean
 
-    external fun unsafeLog1p(src: DoubleArray, srcOffset: Int,
-                             dst: DoubleArray, dstOffset: Int, length: Int)
+    external fun unsafeLog1pInPlace(dst: DoubleArray, dstOffset: Int, length: Int): Boolean
 
     external fun unsafeLogSumExp(src: DoubleArray, srcOffset: Int, length: Int): Double
 
-    external fun unsafeLogAddExp(src1: DoubleArray, srcOffset1: Int,
-                                 src2: DoubleArray, srcOffset2: Int,
-                                 dst: DoubleArray, dstOffset: Int, length: Int)
-
-    external fun unsafeLogRescale(src: DoubleArray, srcOffset: Int,
-                                  dst: DoubleArray, dstOffset: Int, length: Int)
-
-    external fun unsafeDot(src1: DoubleArray, srcOffset1: Int,
-                           src2: DoubleArray, srcOffset2: Int, length: Int): Double
-
-    external fun sum(values: DoubleArray, offset: Int, length: Int): Double
+    external fun unsafeLogAddExp(
+            dst: DoubleArray,
+            dstOffset: Int,
+            src: DoubleArray,
+            srcOffset: Int,
+            length: Int
+    ): Boolean
 
-    external fun weightedSum(values: DoubleArray, valuesOffset: Int,
-                             weights: DoubleArray, weightsOffset: Int, length: Int): Double
+    external fun unsafeDot(
+            src1: DoubleArray,
+            srcOffset1: Int,
+            src2: DoubleArray,
+            srcOffset2: Int,
+            length: Int
+    ): Double
 
-    external fun weightedMean(values: DoubleArray, valuesOffset: Int,
-                              weights: DoubleArray, weightsOffset: Int, length: Int): Double
+    external fun unsafeSum(values: DoubleArray, offset: Int, length: Int): Double
 
-    external fun sd(values: DoubleArray, offset: Int, length: Int): Double
+    external fun unsafeSD(values: DoubleArray, offset: Int, length: Int): Double
 
-    external fun cumSum(source: DoubleArray, sourceOffset: Int,
-                        dest: DoubleArray, destOffset: Int, length: Int)
+    external fun unsafeCumSum(dest: DoubleArray, destOffset: Int, length: Int): Boolean
 }
diff --git a/src/main/kotlin/org/jetbrains/bio/viktor/Random.kt b/src/main/kotlin/org/jetbrains/bio/viktor/Random.kt
index dcb0037..d9b27a4 100644
--- a/src/main/kotlin/org/jetbrains/bio/viktor/Random.kt
+++ b/src/main/kotlin/org/jetbrains/bio/viktor/Random.kt
@@ -2,6 +2,7 @@ package org.jetbrains.bio.viktor
 
 import org.apache.commons.math3.random.MersenneTwister
 import org.apache.commons.math3.random.RandomGenerator
+import kotlin.math.floor
 
 private val DEFAULT_RANDOM = MersenneTwister()
 
@@ -18,10 +19,14 @@ internal object QuickSelect {
      *
      * Invariant:  left <= n <= right
      */
-    tailrec fun select(values: F64Array,
-                       left: Int, right: Int, n: Int,
-                       randomGenerator: RandomGenerator): Double {
-        assert(left <= n && n <= right)
+    tailrec fun select(
+            values: F64Array,
+            left: Int,
+            right: Int,
+            n: Int,
+            randomGenerator: RandomGenerator
+    ): Double {
+        // assert(n in left..right) unnecessary since we control all invocations
 
         if (left == right) {
             return values[left]
@@ -41,28 +46,32 @@ internal object QuickSelect {
  * Computes the [q]-th order statistic over this 1-D array.
  *
  * The implementation follows that of Commons Math. See JavaDoc of
- * [Percentile] for computational details.
+ * [org.apache.commons.math3.stat.descriptive.rank.Percentile] for computational details.
  *
- * The array is modified in-place. Do a [copy] of the array
+ * The array is modified in-place. Do a [F64Array.copy] of the array
  * to avoid mutation if necessary.
  *
  * @since 0.2.0
  */
-fun F64Array.quantile(q: Double = 0.5,
-                      randomGenerator: RandomGenerator = DEFAULT_RANDOM): Double {
+fun F64Array.quantile(
+        q: Double = 0.5,
+        randomGenerator: RandomGenerator = DEFAULT_RANDOM
+): Double {
     require(size > 0) { "no data" }
     check1D(this)
 
     val pos = (size + 1) * q
-    val d = pos - Math.floor(pos)
+    val d = pos - floor(pos)
     return when {
-        pos < 1     -> min()
+        pos < 1 -> min()
         pos >= size -> max()
         else -> {
             val lo = QuickSelect.select(
-                    this, 0, size - 1, pos.toInt() - 1, randomGenerator)
+                this, 0, size - 1, pos.toInt() - 1, randomGenerator
+            )
             val hi = QuickSelect.select(
-                    this, 0, size - 1, pos.toInt(), randomGenerator)
+                this, 0, size - 1, pos.toInt(), randomGenerator
+            )
             return lo + d * (hi - lo)
         }
     }
diff --git a/src/main/kotlin/org/jetbrains/bio/viktor/Serialization.kt b/src/main/kotlin/org/jetbrains/bio/viktor/Serialization.kt
index 0d29321..173498d 100644
--- a/src/main/kotlin/org/jetbrains/bio/viktor/Serialization.kt
+++ b/src/main/kotlin/org/jetbrains/bio/viktor/Serialization.kt
@@ -10,14 +10,12 @@ fun NpyArray.asF64Array() = asDoubleArray().asF64Array().reshape(*shape)
 
 /** Writes a given matrix to [path] in NPY format. */
 fun NpyFile.write(path: Path, a: F64Array) {
-    // We could getaway without doing a double copy of transposed
-    // matrices here once `npy` supports Fortran order.
-    val dense = if (a.isDense) a else a.copy()
+    val dense = if (a.isFlattenable) a else a.copy()
     write(path, dense.flatten().toDoubleArray(), shape = a.shape)
 }
 
 /** Writes a given array into an NPZ file under the specified [name]. */
 fun NpzFile.Writer.write(name: String, a: F64Array) {
-    val dense = if (a.isDense) a else a.copy()
+    val dense = if (a.isFlattenable) a else a.copy()
     write(name, dense.flatten().toDoubleArray(), shape = a.shape)
 }
\ No newline at end of file
diff --git a/src/main/kotlin/org/jetbrains/bio/viktor/Sorting.kt b/src/main/kotlin/org/jetbrains/bio/viktor/Sorting.kt
index fff676e..8983104 100644
--- a/src/main/kotlin/org/jetbrains/bio/viktor/Sorting.kt
+++ b/src/main/kotlin/org/jetbrains/bio/viktor/Sorting.kt
@@ -29,22 +29,26 @@ fun F64Array.argSort(reverse: Boolean = false): IntArray {
 private data class IndexedDoubleValue(val index: Int, val value: Double) :
         Comparable<IndexedDoubleValue> {
     override fun compareTo(other: IndexedDoubleValue): Int {
-        val res = java.lang.Double.compare(value, other.value)
+        val res = value.compareTo(other.value)
         return if (res != 0) {
             res
         } else {
-            java.lang.Integer.compare(index, other.index)
+            index.compareTo(other.index)
         }
     }
 }
 
 internal inline fun <T> reorderInternal(
-        a: F64Array, indices: IntArray, axis: Int,
-        get: (Int) -> T, set: (Int, T) -> Unit) {
+        a: F64Array,
+        indices: IntArray,
+        axis: Int,
+        get: (Int) -> T,
+        set: (Int, T) -> Unit
+) {
     require(indices.size == a.shape[axis])
 
     val copy = indices.clone()
-    for (pos in 0..a.shape[axis] - 1) {
+    for (pos in 0 until a.shape[axis]) {
         val value = get(pos)
         var j = pos
         while (true) {
@@ -77,7 +81,7 @@ internal inline fun <T> reorderInternal(
  */
 fun F64Array.partition(p: Int) {
     check1D(this)
-    require(p >= 0 && p < size) { "p must be in [0, $size)" }
+    require(p in 0 until size) { "p must be in [0, $size)" }
     partition(p, 0, size - 1)
 }
 
@@ -85,8 +89,10 @@ fun F64Array.partition(p: Int) {
  * Helper [partition] extension.
  *
  * Invariants: p = partition(values, left, right, p)
- * for all i <  p: values[i] <  values[p]
- * for all i >= p: values[p] >= values[p]
+ * for all i < p:
+ *     values[i] < values[p]
+ * for all i >= p:
+ *     values[i] >= values[p]
  *
  * @param p the index of the element to partition by.
  * @param left start index (inclusive).
@@ -97,7 +103,7 @@ internal fun F64Array.partition(p: Int, left: Int, right: Int): Int {
     swap(p, right)  // move to end.
 
     var ptr = left
-    for (i in left..right - 1) {
+    for (i in left until right) {
         if (this[i] < pivot) {
             swap(i, ptr)
             ptr++
diff --git a/src/simd/cpp/org_jetbrains_bio_viktor_NativeSpeedups.cpp b/src/simd/cpp/org_jetbrains_bio_viktor_NativeSpeedups.cpp
index a7e7f98..fa5de14 100644
--- a/src/simd/cpp/org_jetbrains_bio_viktor_NativeSpeedups.cpp
+++ b/src/simd/cpp/org_jetbrains_bio_viktor_NativeSpeedups.cpp
@@ -17,92 +17,58 @@
 #define JNI_METHOD(rtype, name)                                         \
     JNIEXPORT rtype JNICALL Java_org_jetbrains_bio_viktor_NativeSpeedups_##name
 
-JNI_METHOD(void, unsafePlus)(JNIEnv *env, jobject,
-                             jdoubleArray jsrc1, jint src_offset1,
-                             jdoubleArray jsrc2, jint src_offset2,
-                             jdoubleArray jdst, jint dst_offset,
-                             jint length)
+template<typename BinOp>
+jboolean transformAssign(JNIEnv *env,
+                         jdoubleArray jdst, jint dst_offset,
+                         jdoubleArray jsrc, jint src_offset,
+                         jint length, BinOp const& op)
 {
-    jdouble *src1 = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jsrc1, NULL));
-    jdouble *src2 = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jsrc2, NULL));
+    jdouble *src = reinterpret_cast<jdouble *>(
+            env->GetPrimitiveArrayCritical(jsrc, NULL));
+    jboolean dst_is_copy = JNI_FALSE;
     jdouble *dst = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jdst, NULL));
-    boost::simd::transform(src1 + src_offset1,
-                           src1 + src_offset1 + length,
-                           src2 + src_offset2,
+            env->GetPrimitiveArrayCritical(jdst, &dst_is_copy));
+    if (dst_is_copy == JNI_TRUE) return JNI_FALSE;
+    boost::simd::transform(dst + dst_offset,
+                           dst + dst_offset + length,
+                           src + src_offset,
                            dst + dst_offset,
-                           boost::simd::plus);
-    env->ReleasePrimitiveArrayCritical(jsrc1, src1, JNI_ABORT);
-    env->ReleasePrimitiveArrayCritical(jsrc2, src2, JNI_ABORT);
+                           op);
     env->ReleasePrimitiveArrayCritical(jdst, dst, JNI_ABORT);
+    env->ReleasePrimitiveArrayCritical(jsrc, src, JNI_ABORT);
+    return JNI_TRUE;
 }
 
-JNI_METHOD(void, unsafeMinus)(JNIEnv *env, jobject,
-                              jdoubleArray jsrc1, jint src_offset1,
-                              jdoubleArray jsrc2, jint src_offset2,
-                              jdoubleArray jdst, jint dst_offset,
-                              jint length)
+JNI_METHOD(jboolean, unsafePlusAssign)(JNIEnv *env, jobject,
+                                       jdoubleArray jdst, jint dst_offset,
+                                       jdoubleArray jsrc, jint src_offset,
+                                       jint length)
 {
-    jdouble *src1 = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jsrc1, NULL));
-    jdouble *src2 = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jsrc2, NULL));
-    jdouble *dst = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jdst, NULL));
-    boost::simd::transform(src1 + src_offset1,
-                           src1 + src_offset1 + length,
-                           src2 + src_offset2,
-                           dst + dst_offset,
-                           boost::simd::minus);
-    env->ReleasePrimitiveArrayCritical(jsrc1, src1, JNI_ABORT);
-    env->ReleasePrimitiveArrayCritical(jsrc2, src2, JNI_ABORT);
-    env->ReleasePrimitiveArrayCritical(jdst, dst, JNI_ABORT);
+    return transformAssign(env, jdst, dst_offset, jsrc, src_offset, length, boost::simd::plus);
 }
 
-JNI_METHOD(void, unsafeTimes)(JNIEnv *env, jobject,
-                              jdoubleArray jsrc1, jint src_offset1,
-                              jdoubleArray jsrc2, jint src_offset2,
-                              jdoubleArray jdst, jint dst_offset,
-                              jint length)
+JNI_METHOD(jboolean, unsafeMinusAssign)(JNIEnv *env, jobject,
+                                        jdoubleArray jdst, jint dst_offset,
+                                        jdoubleArray jsrc, jint src_offset,
+                                        jint length)
 {
-    jdouble *src1 = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jsrc1, NULL));
-    jdouble *src2 = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jsrc2, NULL));
-    jdouble *dst = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jdst, NULL));
-    boost::simd::transform(src1 + src_offset1,
-                           src1 + src_offset1 + length,
-                           src2 + src_offset2,
-                           dst + dst_offset,
-                           boost::simd::multiplies);
-    env->ReleasePrimitiveArrayCritical(jsrc1, src1, JNI_ABORT);
-    env->ReleasePrimitiveArrayCritical(jsrc2, src2, JNI_ABORT);
-    env->ReleasePrimitiveArrayCritical(jdst, dst, JNI_ABORT);
+    return transformAssign(env, jdst, dst_offset, jsrc, src_offset, length, boost::simd::minus);
 }
 
-JNI_METHOD(void, unsafeDiv)(JNIEnv *env, jobject,
-                            jdoubleArray jsrc1, jint src_offset1,
-                            jdoubleArray jsrc2, jint src_offset2,
-                            jdoubleArray jdst, jint dst_offset,
-                            jint length)
+JNI_METHOD(jboolean, unsafeTimesAssign)(JNIEnv *env, jobject,
+                                        jdoubleArray jdst, jint dst_offset,
+                                        jdoubleArray jsrc, jint src_offset,
+                                        jint length)
 {
-    jdouble *src1 = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jsrc1, NULL));
-    jdouble *src2 = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jsrc2, NULL));
-    jdouble *dst = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jdst, NULL));
-    boost::simd::transform(src1 + src_offset1,
-                           src1 + src_offset1 + length,
-                           src2 + src_offset2,
-                           dst + dst_offset,
-                           boost::simd::div);
-    env->ReleasePrimitiveArrayCritical(jsrc1, src1, JNI_ABORT);
-    env->ReleasePrimitiveArrayCritical(jsrc2, src2, JNI_ABORT);
-    env->ReleasePrimitiveArrayCritical(jdst, dst, JNI_ABORT);
+    return transformAssign(env, jdst, dst_offset, jsrc, src_offset, length, boost::simd::multiplies);
+}
+
+JNI_METHOD(jboolean, unsafeDivAssign)(JNIEnv *env, jobject,
+                                      jdoubleArray jdst, jint dst_offset,
+                                      jdoubleArray jsrc, jint src_offset,
+                                      jint length)
+{
+    return transformAssign(env, jdst, dst_offset, jsrc, src_offset, length, boost::simd::div);
 }
 
 namespace {
@@ -159,201 +125,118 @@ struct scalar_div {
 
 }
 
-JNI_METHOD(void, unsafePlusScalar)(JNIEnv *env, jobject,
-                                   jdoubleArray jsrc, jint src_offset,
-                                   jdouble update,
-                                   jdoubleArray jdst, jint dst_offset,
-                                   jint length)
+template<typename UnOp>
+jboolean transformInPlace(JNIEnv *env,
+                          jdoubleArray jdst, jint dst_offset,
+                          jint length, UnOp const& f)
 {
-    jdouble *src = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jsrc, NULL));
+    jboolean is_copy = JNI_FALSE;
     jdouble *dst = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jdst, NULL));
-    boost::simd::transform(src + src_offset,
-                           src + src_offset + length,
-                           dst + dst_offset,
-                           plus_scalar(update));
-    env->ReleasePrimitiveArrayCritical(jsrc, src, JNI_ABORT);
-    env->ReleasePrimitiveArrayCritical(jdst, dst, JNI_ABORT);
+        env->GetPrimitiveArrayCritical(jdst, &is_copy));
+    if (is_copy == JNI_TRUE) return JNI_FALSE;
+    boost::simd::transform(dst + dst_offset, dst + dst_offset + length,
+                           dst + dst_offset, f);
+    env->ReleasePrimitiveArrayCritical(jdst, dst, is_copy == JNI_ABORT);
+    return JNI_TRUE;
 }
 
-JNI_METHOD(void, unsafeMinusScalar)(JNIEnv *env, jobject,
-                                    jdoubleArray jsrc, jint src_offset,
-                                    jdouble update,
-                                    jdoubleArray jdst, jint dst_offset,
-                                    jint length)
+template<typename Fold, typename Empty>
+jdouble reduce(JNIEnv *env,
+               jdoubleArray jsrc, jint src_offset,
+               jint length, Fold const& f, Empty const& e)
 {
     jdouble *src = reinterpret_cast<jdouble *>(
         env->GetPrimitiveArrayCritical(jsrc, NULL));
-    jdouble *dst = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jdst, NULL));
-    boost::simd::transform(src + src_offset,
-                           src + src_offset + length,
-                           dst + dst_offset,
-                           minus_scalar(update));
+    jdouble res = boost::simd::reduce(
+        src + src_offset, src + src_offset + length, e, f, e);
     env->ReleasePrimitiveArrayCritical(jsrc, src, JNI_ABORT);
-    env->ReleasePrimitiveArrayCritical(jdst, dst, JNI_ABORT);
+    return res;
 }
 
-JNI_METHOD(void, unsafeTimesScalar)(JNIEnv *env, jobject,
-                                    jdoubleArray jsrc, jint src_offset,
-                                    jdouble update,
-                                    jdoubleArray jdst, jint dst_offset,
-                                    jint length)
+JNI_METHOD(jboolean, unsafePlusScalarAssign)(JNIEnv *env, jobject,
+                                         jdoubleArray jdst, jint dst_offset,
+                                         jint length, jdouble update)
 {
-    jdouble *src = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jsrc, NULL));
-    jdouble *dst = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jdst, NULL));
-    boost::simd::transform(src + src_offset,
-                           src + src_offset + length,
-                           dst + dst_offset,
-                           multiplies_scalar(update));
-    env->ReleasePrimitiveArrayCritical(jsrc, src, JNI_ABORT);
-    env->ReleasePrimitiveArrayCritical(jdst, dst, JNI_ABORT);
+    return transformInPlace(env, jdst, dst_offset, length, plus_scalar(update));
 }
 
-JNI_METHOD(void, unsafeDivScalar)(JNIEnv *env, jobject,
-                                  jdoubleArray jsrc, jint src_offset,
-                                  jdouble update,
-                                  jdoubleArray jdst, jint dst_offset,
-                                  jint length)
+JNI_METHOD(jboolean, unsafeMinusScalarAssign)(JNIEnv *env, jobject,
+                                          jdoubleArray jdst, jint dst_offset,
+                                          jint length, jdouble update)
 {
-    jdouble *src = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jsrc, NULL));
-    jdouble *dst = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jdst, NULL));
-    boost::simd::transform(src + src_offset,
-                           src + src_offset + length,
-                           dst + dst_offset,
-                           div_scalar(update));
-    env->ReleasePrimitiveArrayCritical(jsrc, src, JNI_ABORT);
-    env->ReleasePrimitiveArrayCritical(jdst, dst, JNI_ABORT);
+    return transformInPlace(env, jdst, dst_offset, length, minus_scalar(update));
 }
 
-JNI_METHOD(void, unsafeScalarDiv)(JNIEnv *env, jobject,
-                                  jdouble update,
-                                  jdoubleArray jsrc, jint src_offset,
-                                  jdoubleArray jdst, jint dst_offset,
-                                  jint length)
+JNI_METHOD(jboolean, unsafeTimesScalarAssign)(JNIEnv *env, jobject,
+                                          jdoubleArray jdst, jint dst_offset,
+                                          jint length, jdouble update)
 {
-    jdouble *src = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jsrc, NULL));
-    jdouble *dst = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jdst, NULL));
-    boost::simd::transform(src + src_offset,
-                           src + src_offset + length,
-                           dst + dst_offset,
-                           scalar_div(update));
-    env->ReleasePrimitiveArrayCritical(jsrc, src, JNI_ABORT);
-    env->ReleasePrimitiveArrayCritical(jdst, dst, JNI_ABORT);
+    return transformInPlace(env, jdst, dst_offset, length, multiplies_scalar(update));
 }
 
-JNI_METHOD(void, unsafeNegate)(JNIEnv *env, jobject,
-                               jdoubleArray jsrc, jint src_offset,
-                               jdoubleArray jdst, jint dst_offset,
-                               jint length)
+JNI_METHOD(jboolean, unsafeDivScalarAssign)(JNIEnv *env, jobject,
+                                        jdoubleArray jdst, jint dst_offset,
+                                        jint length, jdouble update)
 {
-    jdouble *src = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jsrc, NULL));
-    jdouble *dst = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jdst, NULL));
-    boost::simd::transform(src + src_offset,
-                           src + src_offset + length,
-                           dst + dst_offset,
-                           boost::simd::unary_minus);
-    env->ReleasePrimitiveArrayCritical(jsrc, src, JNI_ABORT);
-    env->ReleasePrimitiveArrayCritical(jdst, dst, JNI_ABORT);
+    return transformInPlace(env, jdst, dst_offset, length, div_scalar(update));
+}
+
+JNI_METHOD(jboolean, unsafeScalarDivAssign)(JNIEnv *env, jobject,
+                                        jdoubleArray jdst, jint dst_offset,
+                                        jint length, jdouble update)
+{
+    return transformInPlace(env, jdst, dst_offset, length, scalar_div(update));
+}
+
+JNI_METHOD(jboolean, unsafeNegateInPlace)(JNIEnv *env, jobject,
+                                      jdoubleArray jdst, jint dst_offset,
+                                      jint length)
+{
+    return transformInPlace(env, jdst, dst_offset, length, boost::simd::unary_minus);
 }
 
 JNI_METHOD(jdouble, unsafeMin)(JNIEnv *env, jobject,
                                jdoubleArray jsrc, jint src_offset,
                                jint length)
 {
-    jdouble *src = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jsrc, NULL));
-    jdouble min = boost::simd::reduce(
-        src + src_offset, src + src_offset + length,
-        boost::simd::Inf<jdouble>(), boost::simd::min,
-        boost::simd::Inf<jdouble>());
-    env->ReleasePrimitiveArrayCritical(jsrc, src, JNI_ABORT);
-    return min;
+    return reduce(env, jsrc, src_offset, length, boost::simd::min, boost::simd::Inf<jdouble>());
 }
 
 JNI_METHOD(jdouble, unsafeMax)(JNIEnv *env, jobject,
                                jdoubleArray jsrc, jint src_offset,
                                jint length)
 {
-    jdouble *src = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jsrc, NULL));
-    jdouble min = boost::simd::reduce(
-        src + src_offset, src + src_offset + length,
-        boost::simd::Minf<jdouble>(), boost::simd::max,
-        boost::simd::Minf<jdouble>());
-    env->ReleasePrimitiveArrayCritical(jsrc, src, JNI_ABORT);
-    return min;
+    return reduce(env, jsrc, src_offset, length, boost::simd::max, boost::simd::Minf<jdouble>());
 }
 
-JNI_METHOD(void, unsafeExp)(JNIEnv *env, jobject,
-                            jdoubleArray jsrc, jint src_offset,
-                            jdoubleArray jdst, jint dst_offset,
-                            jint length)
+JNI_METHOD(jboolean, unsafeExpInPlace)(JNIEnv *env, jobject,
+                                   jdoubleArray jdst, jint dst_offset,
+                                   jint length)
 {
-    jdouble *src = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jsrc, NULL));
-    jdouble *dst = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jdst, NULL));
-    boost::simd::transform(src + src_offset, src + src_offset + length,
-                           dst + dst_offset, boost::simd::exp);
-    env->ReleasePrimitiveArrayCritical(jsrc, src, JNI_ABORT);
-    env->ReleasePrimitiveArrayCritical(jdst, dst, JNI_ABORT);
+    return transformInPlace(env, jdst, dst_offset, length, boost::simd::exp);
 }
 
 
-JNI_METHOD(void, unsafeExpm1)(JNIEnv *env, jobject,
-                              jdoubleArray jsrc, jint src_offset,
-                              jdoubleArray jdst, jint dst_offset,
-                              jint length)
+JNI_METHOD(jboolean, unsafeExpm1InPlace)(JNIEnv *env, jobject,
+                                     jdoubleArray jdst, jint dst_offset,
+                                     jint length)
 {
-    jdouble *src = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jsrc, NULL));
-    jdouble *dst = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jdst, NULL));
-    boost::simd::transform(src + src_offset, src + src_offset + length,
-                           dst + dst_offset, boost::simd::expm1);
-    env->ReleasePrimitiveArrayCritical(jsrc, src, JNI_ABORT);
-    env->ReleasePrimitiveArrayCritical(jdst, dst, JNI_ABORT);
+    return transformInPlace(env, jdst, dst_offset, length, boost::simd::expm1);
 }
 
 
-JNI_METHOD(void, unsafeLog)(JNIEnv *env, jobject,
-                            jdoubleArray jsrc, jint src_offset,
-                            jdoubleArray jdst, jint dst_offset,
-                            jint length)
+JNI_METHOD(jboolean, unsafeLogInPlace)(JNIEnv *env, jobject,
+                                   jdoubleArray jdst, jint dst_offset,
+                                   jint length)
 {
-    jdouble *src = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jsrc, NULL));
-    jdouble *dst = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jdst, NULL));
-    boost::simd::transform(src + src_offset, src + src_offset + length,
-                           dst + dst_offset, boost::simd::log);
-    env->ReleasePrimitiveArrayCritical(jsrc, src, JNI_ABORT);
-    env->ReleasePrimitiveArrayCritical(jdst, dst, JNI_ABORT);
+    return transformInPlace(env, jdst, dst_offset, length, boost::simd::log);
 }
 
-JNI_METHOD(void, unsafeLog1p)(JNIEnv *env, jobject,
-                              jdoubleArray jsrc, jint src_offset,
-                              jdoubleArray jdst, jint dst_offset,
-                              jint length)
+JNI_METHOD(jboolean, unsafeLog1pInPlace)(JNIEnv *env, jobject,
+                                     jdoubleArray jdst, jint dst_offset,
+                                     jint length)
 {
-    jdouble *src = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jsrc, NULL));
-    jdouble *dst = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jdst, NULL));
-    boost::simd::transform(src + src_offset, src + src_offset + length,
-                           dst + dst_offset, boost::simd::log1p);
-    env->ReleasePrimitiveArrayCritical(jsrc, src, JNI_ABORT);
-    env->ReleasePrimitiveArrayCritical(jdst, dst, JNI_ABORT);
+    return transformInPlace(env, jdst, dst_offset, length, boost::simd::log1p);
 }
 
 
@@ -384,45 +267,12 @@ struct logaddexp {
 
 }
 
-JNI_METHOD(void, unsafeLogAddExp)(JNIEnv *env, jobject,
-                                  jdoubleArray jsrc1, jint src_offset1,
-                                  jdoubleArray jsrc2, jint src_offset2,
-                                  jdoubleArray jdst, jint dst_offset,
-                                  jint length)
-{
-    jdouble *src1 = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jsrc1, NULL));
-    jdouble *src2 = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jsrc2, NULL));
-    jdouble *dst = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jdst, NULL));
-    boost::simd::transform(src1 + src_offset1,
-                           src1 + src_offset1 + length,
-                           src2 + src_offset2,
-                           dst + dst_offset,
-                           logaddexp());
-    env->ReleasePrimitiveArrayCritical(jsrc1, src1, JNI_ABORT);
-    env->ReleasePrimitiveArrayCritical(jsrc2, src2, JNI_ABORT);
-    env->ReleasePrimitiveArrayCritical(jdst, dst, JNI_ABORT);
-}
-
-
-JNI_METHOD(void, unsafeLogRescale)(JNIEnv *env, jobject,
-                                   jdoubleArray jsrc, jint src_offset,
-                                   jdoubleArray jdst, jint dst_offset,
-                                   jint length)
+JNI_METHOD(jboolean, unsafeLogAddExp)(JNIEnv *env, jobject,
+                                      jdoubleArray jdst, jint dst_offset,
+                                      jdoubleArray jsrc, jint src_offset,
+                                      jint length)
 {
-    jdouble *src = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jsrc, NULL));
-    jdouble *dst = reinterpret_cast<jdouble *>(
-        env->GetPrimitiveArrayCritical(jdst, NULL));
-    double total = simdmath::logsumexp(src + src_offset, length);
-    boost::simd::transform(src + src_offset,
-                           src + src_offset + length,
-                           dst + dst_offset,
-                           minus_scalar(total));
-    env->ReleasePrimitiveArrayCritical(jsrc, src, JNI_ABORT);
-    env->ReleasePrimitiveArrayCritical(jdst, dst, JNI_ABORT);
+    return transformAssign(env, jdst, dst_offset, jsrc, src_offset, length, logaddexp());
 }
 
 JNI_METHOD(jdouble, unsafeDot)(JNIEnv *env, jobject,
@@ -440,68 +290,39 @@ JNI_METHOD(jdouble, unsafeDot)(JNIEnv *env, jobject,
     return res;
 }
 
-JNI_METHOD(jdouble, sum)(JNIEnv *env, jobject,
-                         jdoubleArray jvalues, jint offset, jint length)
+JNI_METHOD(jdouble, unsafeSum)(JNIEnv *env, jobject,
+                               jdoubleArray jvalues, jint offset, jint length)
 {
-    jdouble *values = (jdouble *) env->GetPrimitiveArrayCritical(jvalues, NULL);
+    jdouble *values = reinterpret_cast<jdouble *>(
+        env->GetPrimitiveArrayCritical(jvalues, NULL));
     source_1d<sum_tag> f(values + offset, length);
     double res = balanced_sum(f);
     env->ReleasePrimitiveArrayCritical(jvalues, values, JNI_ABORT);
     return res;
 }
 
-JNI_METHOD(jdouble, weightedSum)(JNIEnv *env, jobject,
-                                 jdoubleArray jvalues, jint values_offset,
-                                 jdoubleArray jweights, jint weights_offset,
-                                 jint length)
-{
-    jdouble *values = (jdouble *) env->GetPrimitiveArrayCritical(jvalues, NULL);
-    jdouble *weights = (jdouble *) env->GetPrimitiveArrayCritical(jweights, NULL);
-    source_1d<weighted_sum_tag> f(values + values_offset,
-                                  weights + weights_offset,
-                                  length);
-    double res = balanced_sum(f);
-    env->ReleasePrimitiveArrayCritical(jvalues, values, JNI_ABORT);
-    env->ReleasePrimitiveArrayCritical(jweights, weights, JNI_ABORT);
-    return res;
-}
-
-JNI_METHOD(jdouble, weightedMean)(JNIEnv *env, jobject,
-                                  jdoubleArray jvalues, jint values_offset,
-                                  jdoubleArray jweights, jint weights_offset,
-                                  jint length)
-{
-    jdouble *values = (jdouble *) env->GetPrimitiveArrayCritical(jvalues, NULL);
-    jdouble *weights = (jdouble *) env->GetPrimitiveArrayCritical(jweights, NULL);
-    source_2d<weighted_mean_tag> f(values + values_offset,
-                                   weights + weights_offset,
-                                   length);
-    double res = twin_balanced_sum(f);
-    env->ReleasePrimitiveArrayCritical(jvalues, values, JNI_ABORT);
-    env->ReleasePrimitiveArrayCritical(jweights, weights, JNI_ABORT);
-    return res;
-}
-
-JNI_METHOD(jdouble, sd)(JNIEnv *env, jobject,
-                        jdoubleArray jvalues, jint offset,
-                        jint length)
+JNI_METHOD(jdouble, unsafeSD)(JNIEnv *env, jobject,
+                              jdoubleArray jvalues, jint offset,
+                              jint length)
 {
-    jdouble *values = (jdouble *) env->GetPrimitiveArrayCritical(jvalues, NULL);
+    jdouble *values = reinterpret_cast<jdouble *>(
+        env->GetPrimitiveArrayCritical(jvalues, NULL));
     source_2d<sd_tag> f(values + offset, length);
     double res = twin_balanced_sum(f);
     env->ReleasePrimitiveArrayCritical(jvalues, values, JNI_ABORT);
     return res;
 }
 
-JNI_METHOD(void, cumSum)(JNIEnv *env, jobject,
-                         jdoubleArray jsrc, jint src_offset,
-                         jdoubleArray jdst, jint dst_offset,
-                         jint length)
+JNI_METHOD(jboolean, unsafeCumSum)(JNIEnv *env, jobject,
+                                   jdoubleArray jdst, jint dst_offset,
+                                   jint length)
 {
-    jdouble *src = (jdouble *) env->GetPrimitiveArrayCritical(jsrc, NULL);
-    jdouble *dst = (jdouble *) env->GetPrimitiveArrayCritical(jdst, NULL);
-    source_1d<cum_sum_tag> f(src + src_offset, dst + dst_offset, length);
+    jboolean is_copy = JNI_FALSE;
+    jdouble *dst = reinterpret_cast<jdouble *>(
+        env->GetPrimitiveArrayCritical(jdst, &is_copy));
+    if (is_copy == JNI_TRUE) return JNI_FALSE;
+    source_1d<cum_sum_tag> f(dst + dst_offset, dst + dst_offset, length);
     cum_sum(f);
-    env->ReleasePrimitiveArrayCritical(jsrc, src, JNI_ABORT);
-    env->ReleasePrimitiveArrayCritical(jdst, dst, JNI_ABORT);
+    env->ReleasePrimitiveArrayCritical(jdst, dst, is_copy == JNI_TRUE ? 0 : JNI_ABORT);
+    return JNI_TRUE;
 }
diff --git a/src/test/kotlin/org/jetbrains/bio/viktor/BalancedSumTests.kt b/src/test/kotlin/org/jetbrains/bio/viktor/BalancedSumTests.kt
index ada2b0f..3ddaa5a 100644
--- a/src/test/kotlin/org/jetbrains/bio/viktor/BalancedSumTests.kt
+++ b/src/test/kotlin/org/jetbrains/bio/viktor/BalancedSumTests.kt
@@ -13,11 +13,11 @@ class BalancedSumTest(private val size: Int) {
         val v = Random().doubles(size.toLong()).toArray().asF64Array()
 
         val expected = KahanSum()
-        for (i in 0..v.size - 1) {
+        for (i in 0 until v.size) {
             expected.feed(v[i])
         }
 
-        assertEquals(expected.result(), v.balancedSum(), 1e-8)
+        assertEquals(expected.result(), v.sum(), 1e-8)
     }
 
     companion object {
@@ -34,7 +34,36 @@ class BalancedDotTest(private val size: Int) {
         val w = r.doubles(size.toLong()).toArray().asF64Array()
 
         val expected = KahanSum()
-        for (i in 0..v.size - 1) {
+        for (i in 0 until v.size) {
+            expected.feed(v[i] * w[i])
+        }
+
+        assertEquals(expected.result(), v.dot(w), 1e-8)
+    }
+
+    @Test fun intAccuracy() {
+        val r = Random()
+        val v = r.doubles(size.toLong()).toArray().asF64Array()
+        val w = r.ints(-1000, 1000).limit(size.toLong()).toArray()
+
+        val expected = KahanSum()
+        for (i in 0 until v.size) {
+            expected.feed(v[i] * w[i])
+        }
+
+        assertEquals(expected.result(), v.dot(w), 1e-8)
+    }
+
+    @Test fun shortAccuracy() {
+        val r = Random()
+        val v = r.doubles(size.toLong()).toArray().asF64Array()
+        // there are no short streams
+        val w = ShortArray(size)
+        for (i in 0 until size) {
+            w[i] = (r.nextInt(2000) - 1000).toShort()
+        }
+        val expected = KahanSum()
+        for (i in 0 until v.size) {
             expected.feed(v[i] * w[i])
         }
 
diff --git a/src/test/kotlin/org/jetbrains/bio/viktor/DoubleExtensionsTest.kt b/src/test/kotlin/org/jetbrains/bio/viktor/DoubleExtensionsTest.kt
index db9d44d..e08c647 100644
--- a/src/test/kotlin/org/jetbrains/bio/viktor/DoubleExtensionsTest.kt
+++ b/src/test/kotlin/org/jetbrains/bio/viktor/DoubleExtensionsTest.kt
@@ -18,7 +18,7 @@ class DoubleExtensionsTest {
 
     @Test fun minusVector() {
         val v = F64Array(10) { it.toDouble() }
-        val reversed = v.reversed().copy()
+        val reversed = F64Array(10) { (9 - it).toDouble() }
         assertEquals(reversed, 9.0 - v)
     }
 
diff --git a/src/test/kotlin/org/jetbrains/bio/viktor/F64ArrayCreationTest.kt b/src/test/kotlin/org/jetbrains/bio/viktor/F64ArrayCreationTest.kt
index aba05f6..ea0bf58 100644
--- a/src/test/kotlin/org/jetbrains/bio/viktor/F64ArrayCreationTest.kt
+++ b/src/test/kotlin/org/jetbrains/bio/viktor/F64ArrayCreationTest.kt
@@ -1,8 +1,8 @@
 package org.jetbrains.bio.viktor
 
 import org.apache.commons.math3.util.Precision
-import org.junit.Assert.assertEquals
 import org.junit.Assert.assertArrayEquals
+import org.junit.Assert.assertEquals
 import org.junit.Test
 import kotlin.test.assertTrue
 
@@ -14,18 +14,22 @@ class F64ArrayCreationTest {
     }
 
     @Test fun of() {
-        assertArrayEquals(doubleArrayOf(1.0),
-                          F64Array.of(1.0).toDoubleArray(), Precision.EPSILON)
-        assertArrayEquals(doubleArrayOf(1.0, 2.0),
-                          F64Array.of(1.0, 2.0).toDoubleArray(), Precision.EPSILON)
-        assertArrayEquals(doubleArrayOf(1.0, 2.0, 3.0),
-                          F64Array.of(1.0, 2.0, 3.0).toDoubleArray(), Precision.EPSILON)
+        assertArrayEquals(
+            doubleArrayOf(1.0), F64Array.of(1.0).toDoubleArray(), Precision.EPSILON
+        )
+        assertArrayEquals(
+            doubleArrayOf(1.0, 2.0), F64Array.of(1.0, 2.0).toDoubleArray(), Precision.EPSILON
+        )
+        assertArrayEquals(
+            doubleArrayOf(1.0, 2.0, 3.0), F64Array.of(1.0, 2.0, 3.0).toDoubleArray(), Precision.EPSILON
+        )
     }
 
     @Test fun asF64Array() {
         assertEquals(F64Array.of(1.0), doubleArrayOf(1.0).asF64Array())
-        assertEquals(F64Array.of(3.0),
-                     doubleArrayOf(1.0, 2.0, 3.0).asF64Array(offset = 2, size = 1))
+        assertEquals(
+            F64Array.of(3.0), doubleArrayOf(1.0, 2.0, 3.0).asF64Array(offset = 2, size = 1)
+        )
     }
 
     @Test fun asF64ArrayView() {
@@ -36,33 +40,59 @@ class F64ArrayCreationTest {
     }
 
     @Test fun toF64Array() {
-        assertEquals(F64Array.of(1.0, 2.0).reshape(1, 2),
-                     arrayOf(doubleArrayOf(1.0, 2.0)).toF64Array())
-        assertEquals(F64Array.of(1.0, 2.0).reshape(2, 1),
-                     arrayOf(doubleArrayOf(1.0),
-                             doubleArrayOf(2.0)).toF64Array())
-
-        assertEquals(F64Array.of(1.0, 2.0, 3.0,
-                                 4.0, 5.0, 6.0).reshape(2, 3),
-                     arrayOf(doubleArrayOf(1.0, 2.0, 3.0),
-                             doubleArrayOf(4.0, 5.0, 6.0)).toF64Array())
-        assertEquals(F64Array.of(1.0, 2.0, 3.0,
-                                 4.0, 5.0, 6.0).reshape(3, 2),
-                     arrayOf(doubleArrayOf(1.0, 2.0),
-                             doubleArrayOf(3.0, 4.0),
-                             doubleArrayOf(5.0, 6.0)).toF64Array())
-
-        assertEquals(F64Array.of(1.0, 2.0, 3.0, 4.0,
-                                 5.0, 6.0, 7.0, 8.0).reshape(2, 2, 2),
-                     arrayOf(arrayOf(doubleArrayOf(1.0, 2.0),
-                                     doubleArrayOf(3.0, 4.0)),
-                             arrayOf(doubleArrayOf(5.0, 6.0),
-                                     doubleArrayOf(7.0, 8.0))).toF64Array())
+        assertEquals(
+            F64Array.of(1.0, 2.0).reshape(1, 2),
+            arrayOf(doubleArrayOf(1.0, 2.0)).toF64Array()
+        )
+        assertEquals(
+            F64Array.of(1.0, 2.0).reshape(2, 1),
+            arrayOf(doubleArrayOf(1.0), doubleArrayOf(2.0)).toF64Array()
+        )
+
+        assertEquals(
+            F64Array.of(
+                1.0, 2.0, 3.0,
+                4.0, 5.0, 6.0
+            ).reshape(2, 3),
+            arrayOf(
+                doubleArrayOf(1.0, 2.0, 3.0),
+                doubleArrayOf(4.0, 5.0, 6.0)
+            ).toF64Array()
+        )
+        assertEquals(
+            F64Array.of(
+                1.0, 2.0, 3.0,
+                4.0, 5.0, 6.0
+            ).reshape(3, 2),
+            arrayOf(
+                doubleArrayOf(1.0, 2.0),
+                doubleArrayOf(3.0, 4.0),
+                doubleArrayOf(5.0, 6.0)
+            ).toF64Array()
+        )
+
+        assertEquals(
+            F64Array.of(
+                1.0, 2.0, 3.0, 4.0,
+                5.0, 6.0, 7.0, 8.0
+            ).reshape(2, 2, 2),
+            arrayOf(
+                arrayOf(
+                    doubleArrayOf(1.0, 2.0),
+                    doubleArrayOf(3.0, 4.0)
+                ),
+                arrayOf(
+                    doubleArrayOf(5.0, 6.0),
+                    doubleArrayOf(7.0, 8.0)
+                )
+            ).toF64Array())
     }
 
     @Test fun invoke() {
-        assertEquals(F64Array.of(1.0, 2.0, 3.0),
-                     F64Array(3) { it + 1.0 })
+        assertEquals(
+            F64Array.of(1.0, 2.0, 3.0),
+            F64Array(3) { it + 1.0 }
+        )
     }
 
     @Test fun full() {
@@ -72,43 +102,54 @@ class F64ArrayCreationTest {
     }
 
     @Test fun concatenateFlat() {
-        assertEquals(F64Array.of(1.0, 2.0, 3.0, 4.0, 5.0),
-                     F64Array.concatenate(F64Array.of(1.0, 2.0),
-                                          F64Array.of(3.0),
-                                          F64Array.of(4.0, 5.0)))
+        assertEquals(
+            F64Array.of(1.0, 2.0, 3.0, 4.0, 5.0),
+            F64Array.concatenate(
+                F64Array.of(1.0, 2.0),
+                F64Array.of(3.0),
+                F64Array.of(4.0, 5.0)
+            )
+        )
     }
 
     @Test fun appendFlat() {
-        assertEquals(F64Array.of(1.0, 2.0),
-                     F64Array.of(1.0, 2.0).append(F64Array(0)))
-        assertEquals(F64Array.of(1.0, 2.0),
-                     F64Array(0).append(F64Array.of(1.0, 2.0)))
-        assertEquals(F64Array.of(1.0, 2.0, 3.0, 4.0, 5.0),
-                     F64Array.of(1.0, 2.0).append(F64Array.of(3.0, 4.0, 5.0)))
+        assertEquals(
+            F64Array.of(1.0, 2.0, 3.0, 4.0, 5.0),
+            F64Array.of(1.0, 2.0).append(F64Array.of(3.0, 4.0, 5.0))
+        )
     }
 
     @Test fun appendMatrix0() {
-        assertEquals(F64Array.of(1.0, 2.0,
-                                 3.0, 4.0,
-                                 42.0, 42.0).reshape(3, 2),
-                     F64Array.of(1.0, 2.0,
-                                 3.0, 4.0).reshape(2, 2)
-                             .append(F64Array.of(42.0, 42.0).reshape(1, 2)))
+        assertEquals(
+            F64Array.of(
+                1.0, 2.0,
+                3.0, 4.0,
+                42.0, 42.0
+            ).reshape(3, 2),
+            F64Array.of(
+                1.0, 2.0,
+                3.0, 4.0
+            ).reshape(2, 2).append(F64Array.of(42.0, 42.0).reshape(1, 2)))
     }
 
     @Test fun appendMatrix1() {
-        assertEquals(F64Array.of(1.0, 2.0, 42.0,
-                                 3.0, 4.0, 42.0).reshape(2, 3),
-                     F64Array.of(1.0, 2.0,
-                                 3.0, 4.0).reshape(2, 2)
-                             .append(F64Array.of(42.0, 42.0).reshape(2, 1),
-                                     axis = 1))
+        assertEquals(
+            F64Array.of(
+                1.0, 2.0, 42.0,
+                3.0, 4.0, 42.0
+            ).reshape(2, 3),
+            F64Array.of(
+                1.0, 2.0,
+                3.0, 4.0
+            ).reshape(2, 2).append(F64Array.of(42.0, 42.0).reshape(2, 1), axis = 1)
+        )
     }
 
     @Test(expected = IllegalArgumentException::class) fun appendMismatch() {
-        F64Array.of(1.0, 2.0,
-                    3.0, 4.0).reshape(2, 2)
-                .append(F64Array.of(42.0, 42.0).reshape(2, 1), axis = 0)
+        F64Array.of(
+            1.0, 2.0,
+            3.0, 4.0
+        ).reshape(2, 2).append(F64Array.of(42.0, 42.0).reshape(2, 1), axis = 0)
     }
 
     @Test fun copy() {
@@ -118,4 +159,26 @@ class F64ArrayCreationTest {
         v[0] = 42.0
         assertEquals(1.0, copy[0], Precision.EPSILON)
     }
+
+    @Test fun reshapeNonFlat() {
+        val v = F64Array.of(
+            1.0, 2.0,
+            3.0, 4.0
+        )
+        assertEquals(v.reshape(4, 1), v.reshape(2, 2).reshape(4, 1))
+    }
+
+    @Test(expected = IllegalStateException::class) fun reshapeSizeMismatch() {
+        F64Array.of(
+            1.0, 2.0,
+            3.0, 4.0
+        ).reshape(3, 2)
+    }
+
+    @Test(expected = IllegalArgumentException::class) fun reshapeNegativeShape() {
+        F64Array.of(
+            1.0, 2.0,
+            3.0, 4.0
+        ).reshape(-2, -2)
+    }
 }
\ No newline at end of file
diff --git a/src/test/kotlin/org/jetbrains/bio/viktor/F64ArrayGetSetTests.kt b/src/test/kotlin/org/jetbrains/bio/viktor/F64ArrayGetSetTests.kt
index 6832123..b124dbe 100644
--- a/src/test/kotlin/org/jetbrains/bio/viktor/F64ArrayGetSetTests.kt
+++ b/src/test/kotlin/org/jetbrains/bio/viktor/F64ArrayGetSetTests.kt
@@ -10,15 +10,17 @@ import kotlin.test.assertFailsWith
 import kotlin.test.assertNotEquals
 
 @RunWith(Parameterized::class)
-class F64FlatArrayGetSetTest(private val values: DoubleArray,
-                             private val offset: Int,
-                             size: Int,
-                             private val stride: Int) {
+class F64FlatArrayGetSetTest(
+        private val values: DoubleArray,
+        private val offset: Int,
+        size: Int,
+        private val stride: Int
+) {
 
     private val v = F64FlatArray(values, offset, stride, size)
 
     @Test fun get() {
-        for (i in 0..v.size - 1) {
+        for (i in 0 until v.size) {
             assertEquals(values[offset + i * stride], v[i], Precision.EPSILON)
         }
     }
@@ -28,13 +30,13 @@ class F64FlatArrayGetSetTest(private val values: DoubleArray,
     }
 
     @Test fun set() {
-        for (i in 0..v.size - 1) {
+        for (i in 0 until v.size) {
             val copy = v.copy()
             copy[i] = 42.0
             assertEquals(42.0, copy[i], Precision.EPSILON)
 
             // Ensure all other elements are unchanged.
-            for (j in 0..v.size - 1) {
+            for (j in 0 until v.size) {
                 if (j == i) {
                     continue
                 }
@@ -50,7 +52,7 @@ class F64FlatArrayGetSetTest(private val values: DoubleArray,
 
     @Test fun setMagicScalar() {
         val copy = v.copy()
-        copy[_I] = 42.0
+        copy.V[_I] = 42.0
 
         assertEquals(F64Array.full(copy.size, 42.0), copy)
     }
@@ -58,7 +60,7 @@ class F64FlatArrayGetSetTest(private val values: DoubleArray,
     @Test fun setMagicVector() {
         val other = F64Array.full(v.size, 42.0)
         val copy = v.copy()
-        copy[_I] = other
+        copy.V[_I] = other
 
         assertEquals(other, copy)
     }
@@ -66,20 +68,21 @@ class F64FlatArrayGetSetTest(private val values: DoubleArray,
     companion object {
         @Parameters(name = "F64Array({1}, {2}, {3})")
         @JvmStatic fun `data`() = listOf(
-                // Normal case.
-                arrayOf(doubleArrayOf(1.0, 2.0, 3.0), 0, 3, 1),
-                // Offset and stride.
-                arrayOf(doubleArrayOf(1.0, 2.0, 3.0), 1, 1, 3),
-                // Empty array.
-                arrayOf(doubleArrayOf(1.0, 2.0, 3.0), 3, 0, 1))
+            // Normal case.
+            arrayOf(doubleArrayOf(1.0, 2.0, 3.0), 0, 3, 1),
+            // Offset and stride.
+            arrayOf(doubleArrayOf(1.0, 2.0, 3.0, 4.0, 5.0, 6.0), 1, 1, 2)
+        )
     }
 }
 
 class F64ArrayGetSetTest {
     @Test fun get() {
-        val m = F64Array.of(0.0, 1.0,
-                            2.0, 3.0,
-                            4.0, 5.0).reshape(3, 2)
+        val m = F64Array.of(
+            0.0, 1.0,
+            2.0, 3.0,
+            4.0, 5.0
+        ).reshape(3, 2)
 
         assertEquals(0.0, m[0, 0], Precision.EPSILON)
         assertEquals(1.0, m[0, 1], Precision.EPSILON)
@@ -94,9 +97,11 @@ class F64ArrayGetSetTest {
     }
 
     @Test fun set() {
-        val m = F64Array.of(0.0, 1.0,
-                            2.0, 3.0,
-                            4.0, 5.0).reshape(3, 2)
+        val m = F64Array.of(
+            0.0, 1.0,
+            2.0, 3.0,
+            4.0, 5.0
+        ).reshape(3, 2)
         val copy = m.copy()
         copy[0, 1] = 42.0
         assertEquals(42.0, copy[0, 1], Precision.EPSILON)
@@ -107,75 +112,98 @@ class F64ArrayGetSetTest {
     }
 
     @Test fun setMagicRowScalar() {
-        val m = F64Array.of(0.0, 1.0,
-                            2.0, 3.0,
-                            4.0, 5.0).reshape(3, 2)
+        val m = F64Array.of(
+            0.0, 1.0,
+            2.0, 3.0,
+            4.0, 5.0
+        ).reshape(3, 2)
         val copy = m.copy()
         copy.V[0] = 42.0
         assertEquals(F64Array.full(copy.shape[1], 42.0), copy.V[0])
     }
 
     @Test fun setMagicRowVector() {
-        val m = F64Array.of(0.0, 1.0,
-                            2.0, 3.0,
-                            4.0, 5.0).reshape(3, 2)
+        val m = F64Array.of(
+            0.0, 1.0,
+            2.0, 3.0,
+            4.0, 5.0
+        ).reshape(3, 2)
         val copy = m.copy()
         val v = F64Array.full(copy.shape[1], 42.0)
         copy.V[0] = v
         assertEquals(v, copy.V[0])
 
-        for (r in 1..copy.shape[0] - 1) {
+        for (r in 1 until copy.shape[0]) {
             assertNotEquals(v, copy.V[r])
             assertEquals(m.V[r], copy.V[r])
         }
     }
 
     @Test fun setMagicColumnScalar() {
-        val m = F64Array.of(0.0, 1.0,
-                            2.0, 3.0,
-                            4.0, 5.0).reshape(3, 2)
+        val m = F64Array.of(
+            0.0, 1.0,
+            2.0, 3.0,
+            4.0, 5.0
+        ).reshape(3, 2)
         val copy = m.copy()
         copy.V[_I, 0] = 42.0
         assertEquals(F64Array.full(copy.shape[0], 42.0), copy.V[_I, 0])
     }
 
     @Test fun setMagicColumnVector() {
-        val m = F64Array.of(0.0, 1.0,
-                            2.0, 3.0,
-                            4.0, 5.0).reshape(3, 2)
+        val m = F64Array.of(
+            0.0, 1.0,
+            2.0, 3.0,
+            4.0, 5.0
+        ).reshape(3, 2)
         val copy = m.copy()
         val v = F64Array.full(copy.shape[0], 42.0)
         copy.V[_I, 0] = v
         assertEquals(v, copy.V[_I, 0])
 
-        for (c in 1..copy.shape[1] - 1) {
+        for (c in 1 until copy.shape[1]) {
             assertNotEquals(v, copy.V[_I, c])
             assertEquals(m.V[_I, c], copy.V[_I, c])
         }
     }
 
     @Test fun setMagicMatrix() {
-        val m = F64Array.of(0.0, 1.0,
-                            2.0, 3.0,
-                            4.0, 5.0)
-                .reshape(3, 1, 2)
+        val m = F64Array.of(
+            0.0, 1.0,
+            2.0, 3.0,
+            4.0, 5.0
+        ).reshape(3, 1, 2)
 
         val copy = m.copy()
         val replacement = F64Array.full(m.shape[1], m.shape[2], init = 42.0)
         copy.V[0] = replacement
         assertEquals(replacement, copy.V[0])
 
-        for (d in 1..m.shape[0] - 1) {
+        for (d in 1 until m.shape[0]) {
             assertNotEquals(replacement, copy.V[d])
             assertEquals(m.V[d], copy.V[d])
         }
     }
 
+    @Test fun setMagicArray() {
+        val m = F64Array.of(
+            0.0, 1.0,
+            2.0, 3.0,
+            4.0, 5.0
+        ).reshape(3, 1, 2)
+        val copy = m.copy()
+        val replacement = m.copy().apply { fill(42.0) }
+        copy.V[_I] = replacement
+        assertEquals(replacement, copy)
+        assertNotEquals(replacement, m)
+    }
+
     @Test fun setMagicMatrixViaScalar() {
-        val m = F64Array.of(0.0, 1.0,
-                            2.0, 3.0,
-                            4.0, 5.0)
-                .reshape(3, 1, 2)
+        val m = F64Array.of(
+            0.0, 1.0,
+            2.0, 3.0,
+            4.0, 5.0
+        ).reshape(3, 1, 2)
 
         val copy1 = m.copy()
         copy1.V[0] = 42.0
@@ -183,4 +211,17 @@ class F64ArrayGetSetTest {
         copy2.V[0] = F64Array.full(m.shape[1], m.shape[2], init = 42.0)
         assertEquals(copy1, copy2)
     }
+
+    @Test fun setMagicArrayViaScalar() {
+        val m = F64Array.of(
+            0.0, 1.0,
+            2.0, 3.0,
+            4.0, 5.0
+        ).reshape(3, 1, 2)
+        val copy = m.copy()
+        copy.V[_I] = 42.0
+        assertNotEquals(copy, m)
+        m.fill(42.0)
+        assertEquals(copy, m)
+    }
 }
\ No newline at end of file
diff --git a/src/test/kotlin/org/jetbrains/bio/viktor/F64ArrayOpsTests.kt b/src/test/kotlin/org/jetbrains/bio/viktor/F64ArrayOpsTests.kt
index 83d7716..f755464 100644
--- a/src/test/kotlin/org/jetbrains/bio/viktor/F64ArrayOpsTests.kt
+++ b/src/test/kotlin/org/jetbrains/bio/viktor/F64ArrayOpsTests.kt
@@ -13,6 +13,7 @@ import java.util.stream.DoubleStream
 import java.util.stream.IntStream
 import kotlin.math.sqrt
 import kotlin.test.assertNotEquals
+import kotlin.test.assertNotNull
 import kotlin.test.assertTrue
 
 class F64FlatArrayAgainstRTest {
@@ -20,7 +21,6 @@ class F64FlatArrayAgainstRTest {
         val v = VALUES.asF64Array()
         assertEquals(18.37403, v.sum(), 1E-5)
         assertEquals(1.837403, v.mean(), 1E-6)
-        assertEquals(18.37403, v.balancedSum(), 1E-5)
         assertEquals(0.8286257, v.sd(), 1E-7)
     }
 
@@ -28,7 +28,6 @@ class F64FlatArrayAgainstRTest {
         val v = VALUES.asF64Array(offset = 3, size = 4)
         assertEquals(8.292786, v.sum(), 1E-6)
         assertEquals(2.073197, v.mean(), 1E-6)
-        assertEquals(8.292786, v.balancedSum(), 1E-6)
         assertEquals(1.016512, v.sd(), 1E-6)
     }
 
@@ -64,20 +63,27 @@ class F64FlatArrayAgainstRTest {
 @RunWith(Parameterized::class)
 class F64FlatArrayOpsTest(private val v: F64Array) {
     @Test fun contains() {
-        for (i in 0 until v.size) {
-            assertTrue(v[i] in v)
+        v.asSequence().forEach {
+            assertTrue(it in v)
         }
     }
 
     @Test fun equals() {
         assertEquals(v, v)
-        assertEquals(v, v.toDoubleArray().asF64Array())
+        assertEquals(v, v.copy())
+        assertEquals(v.copy(), v)
+
+        if (v.nDim == 1) {
+            assertEquals(v, v.toDoubleArray().asF64Array())
+        } else {
+            assertEquals(v, v.toGenericArray().toF64Array())
+        }
+
         assertNotEquals(v, gappedArray(2..4))
         assertNotEquals(v, gappedArray(1..30))
     }
 
-    @Test fun _toString() {
-        assertEquals("[]", F64Array(0).toString())
+    @Test fun toStringNormal() {
         assertEquals("[42]", F64Array.of(42.0).toString())
         assertEquals("[0, 1, 2, 3]", gappedArray(0..3).toString())
     }
@@ -99,48 +105,35 @@ class F64FlatArrayOpsTest(private val v: F64Array) {
     @Test fun fill() {
         val copy = v.copy()
         copy.fill(42.0)
-        assertEquals(F64Array.full(copy.size, 42.0), copy)
-    }
-
-    @Test fun reversed() {
-        assertEquals(
-            F64Array.of(3.0, 2.0, 1.0),
-            F64Array.of(1.0, 2.0, 3.0).reversed()
-        )
-        assertEquals(
-            F64Array.of(4.0, 5.0, 6.0, 1.0, 2.0, 3.0).reshape(2, 3),
-            F64Array.of(1.0, 2.0, 3.0, 4.0, 5.0, 6.0).reshape(2, 3).reversed()
-        )
-        assertEquals(
-            F64Array.of(3.0, 2.0, 1.0, 6.0, 5.0, 4.0).reshape(2, 3),
-            F64Array.of(1.0, 2.0, 3.0, 4.0, 5.0, 6.0).reshape(2, 3).reversed(axis = 1)
-        )
-    }
-
-    @Test fun reversedReversed() {
-        val v = F64Array.of(3.0, 2.0, 1.0)
-        assertEquals(v, v.reversed().reversed())
+        assertEquals(F64Array.full(*v.shape, init = 42.0), copy)
     }
 
     @Test fun dot() {
+        if (v.nDim != 1) return // only applicable to flat arrays
         assertEquals((0 until v.size).sumByDouble { v[it] * v[it] }, v.dot(v), 1E-10)
     }
 
+    @Test fun dotWithNotDense() {
+        if (v.nDim != 1) return // only applicable to flat arrays
+        val other = F64Array(v.size, 2) { _, _ -> Random().nextDouble() }.V[_I, 0]
+        assertEquals((0 until v.size).sumByDouble { v[it] * other[it] }, v.dot(other), 1E-10)
+    }
+
     @Test fun mean() {
-        assertEquals(
-            (0 until v.size).sumByDouble { v[it] } / v.size, v.mean(), Precision.EPSILON
-        )
+        assertEquals(v.asSequence().sumByDouble { it } / v.shape.product(), v.mean(), 1E-10)
     }
 
     @Test fun sd() {
+        if (v.nDim != 1) return // only applicable to flat arrays
         assertEquals(sqrt(StatUtils.variance(v.toDoubleArray())), v.sd(), 1E-10)
     }
 
     @Test fun sum() {
-        assertEquals((0 until v.size).sumByDouble { v[it] }, v.sum(), 1E-10)
+        assertEquals(v.asSequence().sumByDouble { it }, v.sum(), 1E-10)
     }
 
     @Test fun cumSum() {
+        if (v.nDim != 1) return // only applicable to flat arrays
         val copy = v.copy()
         copy.cumSum()
 
@@ -151,17 +144,6 @@ class F64FlatArrayOpsTest(private val v: F64Array) {
         }
     }
 
-    @Test fun argMinMax() {
-        val values = v.toDoubleArray()
-        val min = values.min()!!
-        assertEquals(min, v.min(), Precision.EPSILON)
-        assertEquals(values.indexOf(min), v.argMin())
-
-        val max = values.max()!!
-        assertEquals(v.max(), max, Precision.EPSILON)
-        assertEquals(values.indexOf(max), v.argMax())
-    }
-
     companion object {
         @Parameters(name = "{0}")
         @JvmStatic fun `data`() = CASES
@@ -190,12 +172,13 @@ class F64ArrayOpsTest {
         assertEquals(m, m)
         assertEquals(m, m.copy())
         assertNotEquals(m, m.exp())
+        assertNotEquals(m.toArray(), m)
+        assertNotEquals(m, m.flatten())
     }
 
-    @Test fun _toString2() {
-        assertEquals("[]", F64Array(0, 0).toString())
-        assertEquals("[[]]", F64Array(1, 0).toString())
+    @Test fun toString2() {
         assertEquals("[[0], [0]]", F64Array(2, 1).toString())
+        assertEquals("[[0, 0]]", F64Array(1, 2).toString())
     }
 
     @Test fun toString2Large() {
@@ -210,8 +193,7 @@ class F64ArrayOpsTest {
         )
     }
 
-    @Test fun _toString3() {
-        assertEquals("[]", F64Array(0, 0, 0).toString())
+    @Test fun toString3() {
         assertEquals("[[[0]]]", F64Array(1, 1, 1).toString())
         assertEquals(
             "[[[0], [0]], [[0], [0]], [[0], [0]]]",
@@ -223,30 +205,32 @@ class F64ArrayOpsTest {
 @RunWith(Parameterized::class)
 class F64VectorMathTest(private val v: F64Array) {
     @Test fun exp() {
-        val expV = (v / v.max()).exp()
-        for (i in 0 until v.size) {
-            assertEquals(FastMath.exp(v[i] / v.max()), expV[i], 1E-8)
+        val vMax = v.max()
+        val expV = (v / vMax).exp()
+        v.asSequence().zip(expV.asSequence()).forEach { (vx, expVx) ->
+            assertEquals(FastMath.exp(vx / vMax), expVx, 1E-8)
         }
     }
 
     @Test fun expm1() {
-        val expm1V = (v / v.max()).expm1()
-        for (i in 0 until v.size) {
-            assertEquals(FastMath.expm1(v[i] / v.max()), expm1V[i], 1E-8)
+        val vMax = v.max()
+        val expm1V = (v / vMax).expm1()
+        v.asSequence().zip(expm1V.asSequence()).forEach { (vx, expm1Vx) ->
+            assertEquals(FastMath.expm1(vx / vMax), expm1Vx, 1E-8)
         }
     }
 
     @Test fun log() {
         val logV = v.log()
-        for (i in 0 until v.size) {
-            assertEquals(FastMath.log(v[i]), logV[i], 1E-8)
+        v.asSequence().zip(logV.asSequence()).forEach { (vx, logVx) ->
+            assertEquals(FastMath.log(vx), logVx, 1E-8)
         }
     }
 
     @Test fun log1p() {
         val log1pV = v.log1p()
-        for (i in 0 until v.size) {
-            assertEquals(FastMath.log1p(v[i]), log1pV[i], 1E-8)
+        v.asSequence().zip(log1pV.asSequence()).forEach { (vx, log1pVx) ->
+            assertEquals(FastMath.log1p(vx), log1pVx, 1E-8)
         }
     }
 
@@ -264,14 +248,23 @@ class F64VectorMathTest(private val v: F64Array) {
 
     @Test fun logAddExp() {
         val vLaeV = v logAddExp v
-        for (i in 0 until v.size) {
-            assertEquals(v[i] logAddExp v[i], vLaeV[i], 1e-8)
+        v.asSequence().zip(vLaeV.asSequence()).forEach { (vx, vLaeVx) ->
+            assertEquals(vx logAddExp vx, vLaeVx, 1e-8)
+        }
+    }
+
+    @Test fun logAddExpWithNotDense() {
+        if (v.nDim != 1) return // only applicable to flat arrays
+        val other = F64Array(v.size, 2) { _, _ -> Random().nextDouble() }.V[_I, 0]
+        val vLaeO = v logAddExp other
+        (0 until v.size).forEach { pos ->
+            assertEquals(v[pos] logAddExp other[pos], vLaeO[pos], Precision.EPSILON)
         }
     }
 
     @Test fun logSumExp() {
         assertEquals(
-            (0 until v.size).sumByDouble { FastMath.exp(v[it]) }, FastMath.exp(v.logSumExp()), 1e-6
+            v.asSequence().sumByDouble { FastMath.exp(it) }, FastMath.exp(v.logSumExp()), 1e-6
         )
     }
 
@@ -279,8 +272,28 @@ class F64VectorMathTest(private val v: F64Array) {
         val copy = v.copy()
         v.expInPlace()
         v.logInPlace()
-        (0 until v.size).forEach { i ->
-            assertEquals(copy[i], v[i], 1e-6)
+        v.asSequence().zip(copy.asSequence()).forEach { (vx, copyx) ->
+            assertEquals(copyx, vx, 1e-6)
+        }
+    }
+
+    @Test fun min() {
+        val sequenceMin = v.asSequence().min()
+        assertNotNull(sequenceMin, "Sequential min of an array was null")
+        assertEquals(sequenceMin, v.min(), 0.0)
+        if (v.nDim == 1) {
+            // argMin is only applicable to flat arrays
+            assertEquals(v[v.argMin()], v.min(), 0.0)
+        }
+    }
+
+    @Test fun max() {
+        val sequenceMax = v.asSequence().max()
+        assertNotNull(sequenceMax, "Sequential max of an array was null")
+        assertEquals(sequenceMax, v.max(), 0.0)
+        if (v.nDim == 1) {
+            // argMax is only applicable to flat arrays
+            assertEquals(v[v.argMax()], v.max(), 0.0)
         }
     }
 
@@ -297,58 +310,96 @@ class F64FlatVectorArithTest(private val v: F64Array) {
     @Test fun unaryMinus() = assertEquals(v, -(-v))
 
     @Test fun plus() {
-        val u = v + v
-        for (pos in 0 until v.size) {
-            assertEquals(v[pos] + v[pos], u[pos], Precision.EPSILON)
+        val u = v.copy().apply { fill(42.0) } + v
+        v.asSequence().zip(u.asSequence()).forEach { (vx, ux) ->
+            assertEquals(vx + 42.0, ux, Precision.EPSILON)
         }
     }
 
     @Test fun plusScalar() {
         val u = v + 42.0
-        for (pos in 0 until v.size) {
-            assertEquals(v[pos] + 42, u[pos], Precision.EPSILON)
+        v.asSequence().zip(u.asSequence()).forEach { (vx, ux) ->
+            assertEquals(vx + 42, ux, Precision.EPSILON)
+        }
+    }
+
+    @Test fun plusWithNotDense() {
+        if (v.nDim != 1) return // this is a test for flat arrays
+        val other = F64Array(v.size, 2) { _, _ -> Random().nextDouble() }.V[_I, 0]
+        val u = v + other
+        (0 until v.size).forEach { pos ->
+            assertEquals(v[pos] + other[pos], u[pos], Precision.EPSILON)
         }
     }
 
     @Test fun minus() {
-        val u = v - F64Array.full(v.size, 42.0)
-        for (pos in 0 until v.size) {
-            assertEquals(v[pos] - 42.0, u[pos], Precision.EPSILON)
+        val u = v.copy().apply { fill(42.0) } - v
+        v.asSequence().zip(u.asSequence()).forEach { (vx, ux) ->
+            assertEquals(42.0 - vx, ux, Precision.EPSILON)
         }
     }
 
     @Test fun minusScalar() {
         val u = v - 42.0
-        for (pos in 0 until v.size) {
-            assertEquals(v[pos] - 42.0, u[pos], Precision.EPSILON)
+        v.asSequence().zip(u.asSequence()).forEach { (vx, ux) ->
+            assertEquals(vx - 42.0, ux, Precision.EPSILON)
+        }
+    }
+
+    @Test fun minusWithNotDense() {
+        if (v.nDim != 1) return // this is a test for flat arrays
+        val other = F64Array(v.size, 2) { _, _ -> Random().nextDouble() }.V[_I, 0]
+        val u = v - other
+        (0 until v.size).forEach { pos ->
+            assertEquals(v[pos] - other[pos], u[pos], Precision.EPSILON)
         }
     }
 
     @Test fun times() {
-        val u = v * v
-        for (pos in 0 until v.size) {
-            assertEquals(v[pos] * v[pos], u[pos], Precision.EPSILON)
+        val u = v.copy().apply { fill(42.0) } * v
+        v.asSequence().zip(u.asSequence()).forEach { (vx, ux) ->
+            assertEquals(vx * 42.0, ux, Precision.EPSILON)
         }
     }
 
     @Test fun timesScalar() {
         val u = v * 42.0
-        for (pos in 0 until v.size) {
-            assertEquals(v[pos] * 42, u[pos], Precision.EPSILON)
+        v.asSequence().zip(u.asSequence()).forEach { (vx, ux) ->
+            assertEquals(vx * 42, ux, Precision.EPSILON)
+        }
+    }
+
+    @Test fun timesWithNotDense() {
+        if (v.nDim != 1) return // this is a test for flat arrays
+        val other = F64Array(v.size, 2) { _, _ -> Random().nextDouble() }.V[_I, 0]
+        val u = v * other
+        (0 until v.size).forEach { pos ->
+            assertEquals(v[pos] * other[pos], u[pos], Precision.EPSILON)
         }
     }
 
     @Test fun div() {
-        val u = v / F64Array.full(v.size, 42.0)
-        for (pos in 0 until v.size) {
-            assertEquals(v[pos] / 42.0, u[pos], Precision.EPSILON)
+        // since the left argument is replaced by a copy, and we want to test
+        // non-standard arrays, `v` goes to the right side of div.
+        val u = v.copy().apply { fill(42.0) } / v
+        v.asSequence().zip(u.asSequence()).forEach { (vx, ux) ->
+            assertEquals(42.0 / vx, ux, Precision.EPSILON)
         }
     }
 
     @Test fun divScalar() {
         val u = v / 42.0
-        for (pos in 0 until v.size) {
-            assertEquals(v[pos] / 42.0, u[pos], Precision.EPSILON)
+        v.asSequence().zip(u.asSequence()).forEach { (vx, ux) ->
+            assertEquals(vx / 42.0, ux, Precision.EPSILON)
+        }
+    }
+
+    @Test fun divWithNotDense() {
+        if (v.nDim != 1) return // this is a test for flat arrays
+        val other = F64Array(v.size, 2) { _, _ -> Random().nextDouble() }.V[_I, 0]
+        val u = v / other
+        (0 until v.size).forEach { pos ->
+            assertEquals(v[pos] / other[pos], u[pos], Precision.EPSILON)
         }
     }
 
@@ -367,7 +418,9 @@ private val CASES = listOf(
     Random().doubles(F64DenseFlatArray.DENSE_SPLIT_SIZE + 1L).toArray().asF64Array(),
     // Dense large subarray.
     Random().doubles(3 * (F64DenseFlatArray.DENSE_SPLIT_SIZE + 1L)).toArray()
-            .asF64Array(F64DenseFlatArray.DENSE_SPLIT_SIZE + 1, F64DenseFlatArray.DENSE_SPLIT_SIZE + 1)
+            .asF64Array(F64DenseFlatArray.DENSE_SPLIT_SIZE + 1, F64DenseFlatArray.DENSE_SPLIT_SIZE + 1),
+    // Non-flattenable array
+    Random().doubles(4 * 3 * 2).toArray().asF64Array().reshape(4, 3, 2).view(1, 1)
 )
 
 private fun gappedArray(r: IntRange): F64Array {
@@ -375,7 +428,7 @@ private fun gappedArray(r: IntRange): F64Array {
     //
     // 1. to ensure 'offset' and 'stride' are used correctly,
     // 2. to force the use of fallback implementation.
-    val values = IntStream.range(r.start, r.endInclusive + 1)
+    val values = IntStream.range(r.first, r.last + 1)
             .mapToDouble(Int::toDouble)
             .flatMap { DoubleStream.of(Double.NaN, it) }
             .toArray()
diff --git a/src/test/kotlin/org/jetbrains/bio/viktor/F64ArraySlicingTest.kt b/src/test/kotlin/org/jetbrains/bio/viktor/F64ArraySlicingTest.kt
index 2ef1c6d..7cadf5a 100644
--- a/src/test/kotlin/org/jetbrains/bio/viktor/F64ArraySlicingTest.kt
+++ b/src/test/kotlin/org/jetbrains/bio/viktor/F64ArraySlicingTest.kt
@@ -1,21 +1,11 @@
 package org.jetbrains.bio.viktor
 
-import org.junit.Assert
 import org.junit.Assert.assertArrayEquals
 import org.junit.Assert.assertEquals
 import org.junit.Test
-import kotlin.test.assertEquals
-import kotlin.test.assertFails
 import kotlin.test.assertFailsWith
 
 class F64FlatArraySlicingTest {
-    @Test fun transpose() {
-        assertEquals(F64Array.of(1.0), F64Array.of(1.0).T.V[_I, 0])
-        assertEquals(F64Array.of(1.0, 2.0),
-                     F64Array.of(1.0, 2.0).T.V[_I, 0])
-        assertEquals(F64Array.of(1.0, 2.0, 3.0),
-                     F64Array.of(1.0, 2.0, 3.0).T.V[_I, 0])
-    }
 
     @Test fun slice() {
         val v = F64Array.of(1.0, 2.0, 3.0)
@@ -28,18 +18,32 @@ class F64FlatArraySlicingTest {
     }
 
     @Test fun sliceMatrix() {
-        val m = F64Array.of(1.0, 2.0, 3.0,
-                            4.0, 5.0, 6.0).reshape(2, 3)
-        assertEquals(F64Array.of(1.0, 2.0, 3.0).reshape(1, 3),
-                     m.slice(0, 1))
-        assertEquals(F64Array.of(4.0, 5.0, 6.0).reshape(1, 3),
-                     m.slice(1, 2))
-        assertEquals(F64Array.of(1.0,
-                                 4.0).reshape(2, 1),
-                     m.slice(0, 1, axis = 1))
-        assertEquals(F64Array.of(2.0, 3.0,
-                                 5.0, 6.0).reshape(2, 2),
-                     m.slice(1, 3, axis = 1))
+        val m = F64Array.of(
+            1.0, 2.0, 3.0,
+            4.0, 5.0, 6.0
+        ).reshape(2, 3)
+        assertEquals(
+            F64Array.of(1.0, 2.0, 3.0).reshape(1, 3),
+            m.slice(0, 1)
+        )
+        assertEquals(
+            F64Array.of(4.0, 5.0, 6.0).reshape(1, 3),
+            m.slice(1, 2)
+        )
+        assertEquals(
+            F64Array.of(
+                1.0,
+                4.0
+            ).reshape(2, 1),
+            m.slice(0, 1, axis = 1)
+        )
+        assertEquals(
+            F64Array.of(
+                2.0, 3.0,
+                5.0, 6.0
+            ).reshape(2, 2),
+            m.slice(1, 3, axis = 1)
+        )
     }
 
     @Test fun sliceWithStep() {
@@ -66,34 +70,31 @@ class F64FlatArraySlicingTest {
         }
     }
 
-    @Test(expected = IndexOutOfBoundsException::class) fun sliceOutOfBounds() {
-        F64Array(0).slice(0, 42)
+    @Test(expected = IllegalStateException::class) fun sliceOutOfBounds() {
+        F64Array(7).slice(10, 42)
     }
-}
 
-class F64ArraySlicing {
-    private val m = F64Array.of(0.0, 1.0,
-                                2.0, 3.0,
-                                4.0, 5.0).reshape(3, 2)
+    @Test(expected = IllegalArgumentException::class) fun sliceFromNegative() {
+        F64Array(7).slice(-1, 5)
+    }
 
-    @Test fun transposeUnit() {
-        val m = F64Array(1, 1)
-        assertEquals(m, m.T)
+    @Test(expected = IllegalArgumentException::class) fun sliceToBeforeFrom() {
+        F64Array(7).slice(3, 1)
     }
 
-    @Test fun transpose() {
-        val m = F64Array.of(0.0, 1.0,
-                            2.0, 3.0,
-                            4.0, 5.0).reshape(3, 2)
-        assertEquals(F64Array.of(0.0, 2.0, 4.0,
-                                 1.0, 3.0, 5.0).reshape(2, 3),
-                     m.T)
+    @Test(expected = IllegalArgumentException::class) fun sliceStepNegative() {
+        F64Array(7).slice(3, 5, -1)
     }
+}
+
+class F64ArraySlicing {
 
     @Test fun rowView() {
-        val m = F64Array.of(0.0, 1.0,
-                            2.0, 3.0,
-                            4.0, 5.0).reshape(3, 2)
+        val m = F64Array.of(
+            0.0, 1.0,
+            2.0, 3.0,
+            4.0, 5.0
+        ).reshape(3, 2)
         assertEquals(F64Array.of(0.0, 1.0), m.V[0])
         assertEquals(F64Array.of(2.0, 3.0), m.V[1])
         assertEquals(F64Array.of(4.0, 5.0), m.V[2])
@@ -102,9 +103,11 @@ class F64ArraySlicing {
     }
 
     @Test fun columnView() {
-        val m = F64Array.of(0.0, 1.0,
-                            2.0, 3.0,
-                            4.0, 5.0).reshape(3, 2)
+        val m = F64Array.of(
+            0.0, 1.0,
+            2.0, 3.0,
+            4.0, 5.0
+        ).reshape(3, 2)
 
         assertEquals(F64Array.of(0.0, 2.0, 4.0), m.V[_I, 0])
         assertEquals(F64Array.of(1.0, 3.0, 5.0), m.V[_I, 1])
@@ -113,9 +116,11 @@ class F64ArraySlicing {
     }
 
     @Test fun view() {
-        val m = F64Array.of(0.0, 1.0,
-                            2.0, 3.0,
-                            4.0, 5.0).reshape(3, 1, 2)
+        val m = F64Array.of(
+            0.0, 1.0,
+            2.0, 3.0,
+            4.0, 5.0
+        ).reshape(3, 1, 2)
 
         assertEquals(F64Array.of(0.0, 1.0).reshape(1, 2), m.V[0])
         assertEquals(F64Array.of(2.0, 3.0).reshape(1, 2), m.V[1])
@@ -126,75 +131,117 @@ class F64ArraySlicing {
 
     @Test fun reshape2() {
         val v = F64Array.of(0.0, 1.0, 2.0, 3.0, 4.0, 5.0)
-        assertArrayEquals(arrayOf(doubleArrayOf(0.0, 1.0, 2.0),
-                                  doubleArrayOf(3.0, 4.0, 5.0)),
-                          v.reshape(2, 3).toGenericArray())
-        assertArrayEquals(arrayOf(doubleArrayOf(0.0, 1.0),
-                                  doubleArrayOf(2.0, 3.0),
-                                  doubleArrayOf(4.0, 5.0)),
-                          v.reshape(3, 2).toGenericArray())
+        assertArrayEquals(
+            arrayOf(
+                doubleArrayOf(0.0, 1.0, 2.0),
+                doubleArrayOf(3.0, 4.0, 5.0)
+            ),
+            v.reshape(2, 3).toGenericArray()
+        )
+        assertArrayEquals(
+            arrayOf(
+                doubleArrayOf(0.0, 1.0),
+                doubleArrayOf(2.0, 3.0),
+                doubleArrayOf(4.0, 5.0)
+            ),
+            v.reshape(3, 2).toGenericArray()
+        )
     }
 
     @Test fun reshape2WithStride() {
-        val v = F64FlatArray(doubleArrayOf(0.0, 1.0, 2.0, 3.0,
-                                           4.0, 5.0, 6.0, 7.0),
-                             0, size = 4, stride = 2)
-        assertArrayEquals(arrayOf(doubleArrayOf(0.0, 2.0),
-                                  doubleArrayOf(4.0, 6.0)),
-                          v.reshape(2, 2).toGenericArray())
+        val v = F64FlatArray(
+            doubleArrayOf(
+                0.0, 1.0, 2.0, 3.0,
+                4.0, 5.0, 6.0, 7.0
+            ),
+            0, size = 4, stride = 2
+        )
+        assertArrayEquals(
+            arrayOf(doubleArrayOf(0.0, 2.0), doubleArrayOf(4.0, 6.0)),
+            v.reshape(2, 2).toGenericArray()
+        )
     }
 
     @Test fun reshape3() {
-        val v = F64Array.of(0.0, 1.0,
-                            2.0, 3.0,
-                            4.0, 5.0)
-        assertArrayEquals(arrayOf(arrayOf(doubleArrayOf(0.0, 1.0)),
-                                  arrayOf(doubleArrayOf(2.0, 3.0)),
-                                  arrayOf(doubleArrayOf(4.0, 5.0))),
-                          v.reshape(3, 1, 2).toGenericArray())
-        assertArrayEquals(arrayOf(arrayOf(doubleArrayOf(0.0),
-                                          doubleArrayOf(1.0)),
-                                  arrayOf(doubleArrayOf(2.0),
-                                          doubleArrayOf(3.0)),
-                                  arrayOf(doubleArrayOf(4.0),
-                                          doubleArrayOf(5.0))),
-                          v.reshape(3, 2, 1).toGenericArray())
+        val v = F64Array.of(
+            0.0, 1.0,
+            2.0, 3.0,
+            4.0, 5.0
+        )
+        assertArrayEquals(
+            arrayOf(
+                arrayOf(doubleArrayOf(0.0, 1.0)),
+                arrayOf(doubleArrayOf(2.0, 3.0)),
+                arrayOf(doubleArrayOf(4.0, 5.0))
+            ),
+            v.reshape(3, 1, 2).toGenericArray()
+        )
+        assertArrayEquals(
+            arrayOf(
+                arrayOf(doubleArrayOf(0.0), doubleArrayOf(1.0)),
+                arrayOf(doubleArrayOf(2.0), doubleArrayOf(3.0)),
+                arrayOf(doubleArrayOf(4.0), doubleArrayOf(5.0))
+            ),
+            v.reshape(3, 2, 1).toGenericArray()
+        )
     }
 
 
     @Test fun reshape3WithStride() {
         val v = F64FlatArray(doubleArrayOf(0.0, 1.0, 2.0, 3.0,
-                                           4.0, 5.0, 6.0, 7.0),
-                             0, size = 4, stride = 2)
-        assertArrayEquals(arrayOf(arrayOf(doubleArrayOf(0.0, 2.0)),
-                                  arrayOf(doubleArrayOf(4.0, 6.0))),
-                          v.reshape(2, 1, 2).toGenericArray())
-        assertArrayEquals(arrayOf(arrayOf(doubleArrayOf(0.0),
-                                          doubleArrayOf(2.0)),
-                                  arrayOf(doubleArrayOf(4.0),
-                                          doubleArrayOf(6.0))),
-                          v.reshape(2, 2, 1).toGenericArray())
+            4.0, 5.0, 6.0, 7.0),
+            0, size = 4, stride = 2)
+        assertArrayEquals(
+            arrayOf(
+                arrayOf(doubleArrayOf(0.0, 2.0)),
+                arrayOf(doubleArrayOf(4.0, 6.0))
+            ),
+            v.reshape(2, 1, 2).toGenericArray()
+        )
+        assertArrayEquals(
+            arrayOf(
+                arrayOf(doubleArrayOf(0.0), doubleArrayOf(2.0)),
+                arrayOf(doubleArrayOf(4.0), doubleArrayOf(6.0))
+            ),
+            v.reshape(2, 2, 1).toGenericArray()
+        )
     }
 
     @Test fun along0() {
-        val a = F64Array.of(1.0, 2.0, 3.0,
-                            4.0, 5.0, 6.0).reshape(2, 3)
+        val a = F64Array.of(
+            1.0, 2.0, 3.0,
+            4.0, 5.0, 6.0
+        ).reshape(2, 3)
 
         a.along(0).forEach { it /= it[0] }
-        assertEquals(F64Array.of(1.0, 2.0, 3.0,
-                                 1.0, 5.0 / 4.0, 6.0 / 4.0)
-                             .reshape(2, 3),
-                     a)
+        assertEquals(
+            F64Array.of(
+                1.0, 2.0, 3.0,
+                1.0, 5.0 / 4.0, 6.0 / 4.0
+            ).reshape(2, 3),
+            a
+        )
     }
 
     @Test fun along1() {
         val a = F64Array.of(1.0, 2.0, 3.0,
-                            4.0, 5.0, 6.0).reshape(2, 3)
+            4.0, 5.0, 6.0).reshape(2, 3)
 
         a.along(1).forEach { it /= it[1] }
-        assertEquals(F64Array.of(1.0 / 4.0, 2.0 / 5.0, 3.0 / 6.0,
-                                 1.0, 1.0, 1.0)
-                             .reshape(2, 3),
-                     a)
+        assertEquals(
+            F64Array.of(
+                1.0 / 4.0, 2.0 / 5.0, 3.0 / 6.0,
+                1.0, 1.0, 1.0
+            ).reshape(2, 3),
+            a
+        )
+    }
+
+    @Test fun view2() {
+        val a = DoubleArray(8) { it.toDouble() }.asF64Array().reshape(2, 2, 2)
+        val aView = a.view(0, 1)
+        assertEquals(F64Array.of(0.0, 1.0, 4.0, 5.0).reshape(2, 2), aView)
+        aView.expInPlace()
+        assertEquals(F64Array.of(2.0, 3.0, 6.0, 7.0).reshape(2, 2), a.view(1, 1))
     }
 }
\ No newline at end of file
diff --git a/src/test/kotlin/org/jetbrains/bio/viktor/F64MagicTest.kt b/src/test/kotlin/org/jetbrains/bio/viktor/F64MagicTest.kt
deleted file mode 100644
index 87f0037..0000000
--- a/src/test/kotlin/org/jetbrains/bio/viktor/F64MagicTest.kt
+++ /dev/null
@@ -1,40 +0,0 @@
-package org.jetbrains.bio.viktor
-
-import org.junit.Assert.assertArrayEquals
-import org.junit.Test
-import kotlin.test.assertEquals
-
-class F64MagicTest {
-    @Test fun ravelUnravel() {
-        for (indices in arrayOf(intArrayOf(1, 2, 3),
-                                intArrayOf(3, 2, 1),
-                                intArrayOf(1, 1, 1),
-                                intArrayOf(0, 0, 0))) {
-            val shape = intArrayOf(4, 5, 6)
-            assertArrayEquals(indices, unravelIndex(ravelMultiIndex(indices, shape), shape))
-        }
-    }
-
-    @Test(expected = IndexOutOfBoundsException::class) fun unravelImproper() {
-        unravelIndex(421, intArrayOf(4, 5, 6))
-    }
-
-    // See examples in `numpy.unravel_index` docstring.
-    @Test fun unravelAgainstNumPy() {
-        assertArrayEquals(intArrayOf(3, 4), unravelIndex(22, intArrayOf(7, 6)))
-        assertArrayEquals(intArrayOf(6, 1), unravelIndex(37, intArrayOf(7, 6)))
-        assertArrayEquals(intArrayOf(6, 5), unravelIndex(41, intArrayOf(7, 6)))
-
-        assertArrayEquals(intArrayOf(3, 1, 4, 1),
-                          unravelIndex(1621, intArrayOf(6, 7, 8, 9)))
-    }
-
-    // See examples in `numpy.ravel_multi_index` docstring.
-    @Test fun ravelAgainstNumPy() {
-        assertEquals(22, ravelMultiIndex(intArrayOf(3, 4), intArrayOf(7, 6)))
-        assertEquals(37, ravelMultiIndex(intArrayOf(6, 1), intArrayOf(7, 6)))
-        assertEquals(41, ravelMultiIndex(intArrayOf(6, 5), intArrayOf(7, 6)))
-
-        assertEquals(1621, ravelMultiIndex(intArrayOf(3, 1, 4, 1), intArrayOf(6, 7, 8, 9)))
-    }
-}
\ No newline at end of file
diff --git a/src/test/kotlin/org/jetbrains/bio/viktor/MoreMathTests.kt b/src/test/kotlin/org/jetbrains/bio/viktor/MoreMathTests.kt
index d928c45..d1f3153 100644
--- a/src/test/kotlin/org/jetbrains/bio/viktor/MoreMathTests.kt
+++ b/src/test/kotlin/org/jetbrains/bio/viktor/MoreMathTests.kt
@@ -2,17 +2,37 @@ package org.jetbrains.bio.viktor
 
 import org.junit.Test
 import java.util.*
+import kotlin.math.abs
 import kotlin.test.assertEquals
 import kotlin.test.assertTrue
 
 class MoreMathTest {
     @Test fun testLogAddExpEdgeCases() {
         val r = Random()
-        val logx = -Math.abs(r.nextDouble())
+        val logx = -abs(r.nextDouble())
+
         assertEquals(logx, Double.NEGATIVE_INFINITY logAddExp logx)
         assertEquals(logx, logx logAddExp Double.NEGATIVE_INFINITY)
-        assertEquals(Double.NEGATIVE_INFINITY,
-                     Double.NEGATIVE_INFINITY logAddExp Double.NEGATIVE_INFINITY)
+        assertEquals(Double.NEGATIVE_INFINITY, Double.NEGATIVE_INFINITY logAddExp Double.NEGATIVE_INFINITY)
+
+        assertEquals(Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY logAddExp logx)
+        assertEquals(Double.POSITIVE_INFINITY, logx logAddExp Double.POSITIVE_INFINITY)
+        assertEquals(Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY logAddExp Double.POSITIVE_INFINITY)
+
+        assertEquals(Double.POSITIVE_INFINITY, Double.NEGATIVE_INFINITY logAddExp Double.POSITIVE_INFINITY)
+        assertEquals(Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY logAddExp Double.NEGATIVE_INFINITY)
+
+        assertIsNan(Double.NaN logAddExp logx)
+        assertIsNan(Double.NaN logAddExp Double.NEGATIVE_INFINITY)
+        assertIsNan(Double.NaN logAddExp Double.POSITIVE_INFINITY)
+        assertIsNan(Double.NaN logAddExp Double.NaN)
+        assertIsNan(logx logAddExp Double.NaN)
+        assertIsNan(Double.NEGATIVE_INFINITY logAddExp Double.NaN)
+        assertIsNan(Double.POSITIVE_INFINITY logAddExp Double.NaN)
+    }
+
+    private fun assertIsNan(x: Double) {
+        assertTrue(x.isNaN(), "Expected NaN but got $x")
     }
 }
 
@@ -25,16 +45,18 @@ class KahanSumTest {
             val oneDth = 1.0 / d
             val preciseSum = KahanSum()
             var impreciseSum = 0.0
-            for (i in 0..bigNumber * d - 1) {
+            for (i in 0 until bigNumber * d) {
                 preciseSum += oneDth
                 impreciseSum += oneDth
             }
 
-            val imprecision = Math.abs(impreciseSum - bigNumber)
-            val precision = Math.abs(preciseSum.result() - bigNumber)
-            assertTrue(imprecision >= precision,
-                       "Kahan's algorithm yielded worse precision than ordinary summation: " +
-                       "$precision is greater than $imprecision")
+            val imprecision = abs(impreciseSum - bigNumber)
+            val precision = abs(preciseSum.result() - bigNumber)
+            assertTrue(
+                imprecision >= precision,
+                "Kahan's algorithm yielded worse precision than ordinary summation: " +
+                        "$precision is greater than $imprecision"
+            )
         }
     }
 }
\ No newline at end of file
diff --git a/src/test/kotlin/org/jetbrains/bio/viktor/NativeSpeedupTest.kt b/src/test/kotlin/org/jetbrains/bio/viktor/NativeSpeedupTest.kt
index 01300f1..4f23235 100644
--- a/src/test/kotlin/org/jetbrains/bio/viktor/NativeSpeedupTest.kt
+++ b/src/test/kotlin/org/jetbrains/bio/viktor/NativeSpeedupTest.kt
@@ -7,7 +7,7 @@ class NativeSpeedupTest {
     @Test
     fun nativeSpeedupEnabled() {
         assertTrue(
-            Loader.optimizationSupported,
+            Loader.nativeLibraryLoaded,
             """
 Native optimizations disabled.
 If running as a project: have you added -Djava.library.path=./build/libs to JVM options?
diff --git a/src/test/kotlin/org/jetbrains/bio/viktor/RandomTests.kt b/src/test/kotlin/org/jetbrains/bio/viktor/RandomTests.kt
index bec37df..0afd717 100644
--- a/src/test/kotlin/org/jetbrains/bio/viktor/RandomTests.kt
+++ b/src/test/kotlin/org/jetbrains/bio/viktor/RandomTests.kt
@@ -13,7 +13,7 @@ import java.util.*
 class QuickSelectTest {
     @Test fun quantileRandom() {
         val values = Random().doubles(1024).toArray().asF64Array()
-        for (i in 0..values.size - 1) {
+        for (i in 0 until values.size) {
             val q = (i.toDouble() + 1) / values.size
             assertEquals(StatUtils.percentile(values.data, q * 100),
                          values.quantile(q), Precision.EPSILON)
@@ -34,6 +34,8 @@ class QuickSelectTest {
     @Test fun quantileLarge() {
         val values = Random().doubles(1 shl 16).toArray().asF64Array()
         assertEquals(values.max(), values.quantile(1.0), Precision.EPSILON)
+        assertEquals(values.min(), values.quantile(0.0), Precision.EPSILON)
+        assertEquals(values.quantile(0.5), values.quantile(), Precision.EPSILON)
     }
 }
 
@@ -85,19 +87,26 @@ class ShuffleTest {
         val values = doubleArrayOf(0.0, 1.0, 2.0, 3.0).asF64Array()
         val counts = HashMap<F64Array, Int>()
 
-        val `n!` = CombinatoricsUtils.factorial(4).toInt()
-        for (i in 0..5000 * `n!`) {
+        val nFactorial = CombinatoricsUtils.factorial(4).toInt()
+        for (i in 0..5000 * nFactorial) {
             values.shuffle()
             val p = values.copy()
             counts[p] = (counts[p] ?: 0) + 1
         }
 
-        assertEquals(`n!`, counts.size)
+        assertEquals(nFactorial, counts.size)
 
         val total = counts.values.sum()
         for (count in counts.values) {
             val p = count.toDouble() / total
-            assertEquals(1.0 / `n!`, p, 1e-2)
+            assertEquals(1.0 / nFactorial, p, 1e-2)
         }
     }
+
+    @Test fun trivial() {
+        val values = F64Array.of(3.14)
+        val stored = values.copy()
+        values.shuffle()
+        assertEquals(stored, values)
+    }
 }
\ No newline at end of file
diff --git a/src/test/kotlin/org/jetbrains/bio/viktor/SearchingTests.kt b/src/test/kotlin/org/jetbrains/bio/viktor/SearchingTests.kt
index 405198e..4c32aad 100644
--- a/src/test/kotlin/org/jetbrains/bio/viktor/SearchingTests.kt
+++ b/src/test/kotlin/org/jetbrains/bio/viktor/SearchingTests.kt
@@ -4,10 +4,6 @@ import org.junit.Test
 import kotlin.test.assertEquals
 
 class SearchingTests {
-    @Test fun empty() {
-        assertEquals(0, F64Array(0).searchSorted(42.0))
-    }
-
     @Test fun singleLess() {
         assertEquals(0, F64Array.of(42.0).searchSorted(0.0))
         assertEquals(0, F64Array.of(42.0, 42.0, 42.0).searchSorted(0.0))
diff --git a/src/test/kotlin/org/jetbrains/bio/viktor/SerializationTests.kt b/src/test/kotlin/org/jetbrains/bio/viktor/SerializationTests.kt
index 05a728c..df15d0f 100644
--- a/src/test/kotlin/org/jetbrains/bio/viktor/SerializationTests.kt
+++ b/src/test/kotlin/org/jetbrains/bio/viktor/SerializationTests.kt
@@ -24,26 +24,41 @@ class TestReadWriteNpy {
         NpyFile.write(path, m)
         assertEquals(m, NpyFile.read(path).asF64Array())
     }
+
+    @Test fun nonFlattenable() = withTempFile("nf", ".npy") { path ->
+        val m = F64Array.of(
+            1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0
+        ).reshape(2, 2, 2).view(1, 1)
+        NpyFile.write(path, m)
+        assertEquals(m, NpyFile.read(path).asF64Array())
+    }
 }
 
 class TestReadWriteNpz {
     @Test fun combined() {
         val v = F64Array.of(1.0, 2.0, 3.0, 4.0)
         val m2 = F64Array.of(1.0, 2.0, 3.0, 4.0, 5.0, 6.0).reshape(2, 3)
-        val m3 = F64Array.of(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0)
-                .reshape(1, 4, 2)
+        val m3 = F64Array.of(
+            1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0
+        ).reshape(1, 4, 2)
+
+        val nf = F64Array.of(
+            1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0
+        ).reshape(2, 2, 2).view(1, 1)
 
         withTempFile("vm2m3", ".npz") { path ->
             NpzFile.write(path).use {
                 it.write("v", v)
                 it.write("m2", m2)
                 it.write("m3", m3)
+                it.write("nf", nf)
             }
 
             NpzFile.read(path).use {
                 assertEquals(v, it["v"].asF64Array())
                 assertEquals(m2, it["m2"].asF64Array())
                 assertEquals(m3, it["m3"].asF64Array())
+                assertEquals(nf, it["nf"].asF64Array())
             }
         }
     }
diff --git a/src/test/kotlin/org/jetbrains/bio/viktor/SortingTests.kt b/src/test/kotlin/org/jetbrains/bio/viktor/SortingTests.kt
index 8774f9f..aac4eef 100644
--- a/src/test/kotlin/org/jetbrains/bio/viktor/SortingTests.kt
+++ b/src/test/kotlin/org/jetbrains/bio/viktor/SortingTests.kt
@@ -11,21 +11,22 @@ class SortingTests {
     @Test fun partition() {
         assertEquals(
                 F64Array.of(1.0, 2.0, 3.0, 4.0),
-                F64Array.of(3.0, 4.0, 2.0, 1.0).apply { partition(2) })
+                F64Array.of(3.0, 4.0, 2.0, 1.0).apply { partition(2) }
+        )
     }
 
     @Test fun partitionInternal() {
         val values = F64Array.of(0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0)
         val length = values.size
-        for (p in 0..length - 1) {
+        for (p in 0 until length) {
             values.shuffle()
             val pivot = values[p]
             val split = values.partition(p, 0, length - 1)
-            for (i in 0..split - 1) {
+            for (i in 0 until split) {
                 assertTrue(values[i] < pivot, "<")
             }
 
-            for (i in split..length - 1) {
+            for (i in split until length) {
                 assertTrue(values[i] >= pivot, ">=")
             }
         }
@@ -66,16 +67,22 @@ class SortingTests {
         val indices = values.asF64Array().argSort()
         assertArrayEquals(intArrayOf(2, 3, 1, 5, 4, 0), indices)
 
-        val v = F64FlatArray(doubleArrayOf(Double.NaN, Double.NaN,  // Prefix.
+        val v = F64FlatArray(
+            doubleArrayOf(
+                Double.NaN, Double.NaN,  // Prefix.
                                            42.0, Double.NaN, 2.0,
                                            Double.NaN, -1.0,
                                            Double.NaN, 0.0,
                                            Double.NaN, 4.0,
-                                           Double.NaN, 2.0),
-                             offset = 2, size = values.size, stride = 2)
+                                           Double.NaN, 2.0
+            ),
+                             offset = 2, size = values.size, stride = 2
+        )
         v.reorder(indices)
-        assertArrayEquals(doubleArrayOf(-1.0, 0.0, 2.0, 2.0, 4.0, 42.0),
-                          v.toDoubleArray(), Precision.EPSILON)
+        assertArrayEquals(
+            doubleArrayOf(-1.0, 0.0, 2.0, 2.0, 4.0, 42.0),
+                          v.toDoubleArray(), Precision.EPSILON
+        )
     }
 
     @Test fun reorderFlat() {
@@ -85,20 +92,40 @@ class SortingTests {
     }
 
     @Test fun reorderMatrix0() {
-        val m = F64Array.of(1.0, 2.0, 3.0,
-                            4.0, 5.0, 6.0).reshape(2, 3)
+        val m = F64Array.of(
+            1.0, 2.0, 3.0,
+                            4.0, 5.0, 6.0
+        ).reshape(2, 3)
         m.reorder(intArrayOf(1, 0))
-        assertEquals(F64Array.of(4.0, 5.0, 6.0,
-                                 1.0, 2.0, 3.0).reshape(2, 3),
-                     m)
+        assertEquals(
+            F64Array.of(
+                4.0, 5.0, 6.0,
+                                 1.0, 2.0, 3.0
+            ).reshape(2, 3),
+                     m
+        )
     }
 
     @Test fun reorderMatrix1() {
-        val m = F64Array.of(1.0, 2.0, 3.0,
-                            4.0, 5.0, 6.0).reshape(2, 3)
+        val m = F64Array.of(
+            1.0, 2.0, 3.0,
+                            4.0, 5.0, 6.0
+        ).reshape(2, 3)
         m.reorder(intArrayOf(2, 1, 0), axis = 1)
-        assertEquals(F64Array.of(3.0, 2.0, 1.0,
-                                 6.0, 5.0, 4.0).reshape(2, 3),
-                     m)
+        assertEquals(
+            F64Array.of(
+                3.0, 2.0, 1.0,
+                                 6.0, 5.0, 4.0
+            ).reshape(2, 3),
+                     m
+        )
+    }
+
+    @Test(expected = IllegalArgumentException::class) fun reorderMatrix2() {
+        val m = F64Array.of(
+            1.0, 2.0, 3.0,
+            4.0, 5.0, 6.0
+        ).reshape(2, 3)
+        m.reorder(intArrayOf(2, 1, 0), axis = 0)
     }
 }
diff --git a/src/test/kotlin/org/jetbrains/bio/viktor/UnsupportedOpTest.kt b/src/test/kotlin/org/jetbrains/bio/viktor/UnsupportedOpTest.kt
new file mode 100644
index 0000000..7b6b735
--- /dev/null
+++ b/src/test/kotlin/org/jetbrains/bio/viktor/UnsupportedOpTest.kt
@@ -0,0 +1,70 @@
+package org.jetbrains.bio.viktor
+
+import org.junit.Test
+
+class UnsupportedOpTest {
+
+    @Test(expected = UnsupportedOperationException::class)
+    fun invalidArray() {
+        arrayOf("foo", "bar").toF64Array()
+    }
+
+    @Test(expected = IllegalStateException::class)
+    fun emptyArray() {
+        emptyArray<DoubleArray>().toF64Array()
+    }
+
+    @Test(expected = UnsupportedOperationException::class)
+    fun flatArrayToGeneric() {
+        F64Array.of(1.0, 2.0, 3.0).toGenericArray()
+    }
+
+    @Test(expected = UnsupportedOperationException::class)
+    fun flatArrayAlong() {
+        F64Array.of(1.0, 2.0, 3.0).along(0)
+    }
+
+    @Test(expected = UnsupportedOperationException::class)
+    fun flatArrayReorder() {
+        F64Array.of(1.0, 2.0, 3.0).reorder(intArrayOf(2, 1, 0), 1)
+    }
+
+    @Test(expected = IllegalArgumentException::class)
+    fun indicesSizeReorder() {
+        F64Array.of(1.0, 2.0, 3.0).reorder(intArrayOf(1, 0))
+    }
+
+    @Test(expected = IllegalStateException::class)
+    fun mismatchedArray1() {
+        arrayOf(doubleArrayOf(1.0), doubleArrayOf(2.0, 3.0)).toF64Array()
+    }
+
+    @Test(expected = IllegalStateException::class)
+    fun mismatchedArray2() {
+        arrayOf(
+            arrayOf(doubleArrayOf(1.0)),
+            arrayOf(doubleArrayOf(2.0), doubleArrayOf(3.0))
+        ).toF64Array()
+    }
+
+    @Test(expected = IllegalStateException::class)
+    fun mixedTypeArray1() {
+        arrayOf(doubleArrayOf(1.0), arrayOf(doubleArrayOf(2.0, 3.0))).toF64Array()
+    }
+
+    @Test(expected = IllegalStateException::class)
+    fun mixedTypeArray2() {
+        arrayOf(arrayOf(doubleArrayOf(1.0, 2.0)), doubleArrayOf(3.0)).toF64Array()
+    }
+
+    @Test(expected = IllegalStateException::class)
+    fun nullsArray1() {
+        arrayOf(doubleArrayOf(1.0, 2.0, 3.0), null).toF64Array()
+    }
+
+    @Test(expected = IllegalStateException::class)
+    fun nullsArray2() {
+        arrayOf(arrayOf(doubleArrayOf(1.0, 2.0, 3.0)), null).toF64Array()
+    }
+
+}