Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More features for NDArray #74

Closed
peastman opened this issue Oct 16, 2018 · 13 comments
Closed

More features for NDArray #74

peastman opened this issue Oct 16, 2018 · 13 comments

Comments

@peastman
Copy link
Contributor

Matrix has a lot of features that would be equally appropriate for NDArray. This includes methods like min(), max(), mean(), toString(), etc. Also a lot of the math functions defined in matrixfuncs.kt. If you agree, I'd like to start implementing some of them.

@kyonifer
Copy link
Owner

Yes this makes sense. Those functions being limited to Matrix are historical as NDArray is a relatively new addition. Feel free to make PRs and we can sort through any issues that come up.

Note that many of the top-level functions such as those in matrixfuncs.kt are Double only currently. It may be worth considering if those should be code-generated for each primitive and/or generically. If so, then supporting NDArray might be a simple matter of a few more generation targets being added which output NDArray extensions functions.

@kyonifer
Copy link
Owner

kyonifer commented Oct 17, 2018

Opened #75 to capture my above note as a separate ticket. It may be possible to close both this issue and that one at once.

@peastman
Copy link
Contributor Author

I think I've come up with a strategy that will allow all the different versions to be defined without needing code generation. I first create a function

inline fun <reified T> NDArray<T>.mapIfNumeric(crossinline f: (Double) -> Double): NDArray<Double> {
    when (T::class) {
        Double::class -> return (this as NDArray<Double>).map { f(it) }
        Float::class -> return (this as NDArray<Float>).map { f(it.toDouble()) }
        Long::class -> return (this as NDArray<Long>).map { f(it.toDouble()) }
        Int::class -> return (this as NDArray<Int>).map { f(it.toDouble()) }
        Short::class -> return (this as NDArray<Short>).map { f(it.toDouble()) }
        Byte::class -> return (this as NDArray<Byte>).map { f(it.toDouble()) }
        else -> throw IllegalArgumentException("This NDArray does not contain numeric data.")
    }
}

Lots of functions can easily be defined by using that. For example,

inline fun <reified T> acos(arr: NDArray<T>): NDArray<Double> {
    return arr.mapIfNumeric { kotlin.math.acos(it) }
}

I decompiled it and verified it wasn't doing any boxing.

@kyonifer
Copy link
Owner

I like that idea. Looks good to me for functions that map through to an underlying function which expects a scalar Double (which is a lot of them).

@peastman
Copy link
Contributor Author

The one issue with that implementation is that the return type is always Double. If the input array has type Float, I would expect it to return an array of Floats. I need to think about possibilities.

Some functions will just need specialized implementations, due to their quirks. floor(), ceil(), and round() are only meaningful for Float and Double. Then there's abs(), which in kotlin.math is defined for Int, Long, Float, and Double. But the operation is equally applicable to Byte and Short, though the asymmetry at the lower end of the range is more likely to be a problem for them. (abs(Int.MIN_VALUE) is negative!) Perhaps it should promote those types to Int?

@peastman
Copy link
Contributor Author

You know, I think I'm trying to hard to be clever. It's simpler to just write two functions,

fun acos(arr: NDArray<Float>): NDArray<Float> {
    return arr.map { kotlin.math.acos(it) }
}

fun acos(arr: NDArray<Double>): NDArray<Double> {
    return arr.map { kotlin.math.acos(it) }
}

The amount of code isn't that much more, it's easier to understand, and it avoids complications about return types.

@peastman
Copy link
Contributor Author

I have a question about code generation. Some of the extension functions I'll be adding only apply to certain types. For example, mean() and sum() are only meaningful for numeric types. They shouldn't be defined for generic arrays. Given how codegen is implemented, it isn't obvious how to handle that. What do you suggest?

@kyonifer
Copy link
Owner

I'm traveling today but I'll look into this soon. Given the apparent recent changes to how inlining works, I'd like to explore the available options and see if we can find a way to clean up the codegen as you were attempting to do here.

@kyonifer
Copy link
Owner

I played with this a bit and regrettably couldn't come up with something I hated less than codegen, so we might need to stick with that for now.

Some of the extension functions I'll be adding only apply to certain types. For example, mean() and sum() are only meaningful for numeric types. They shouldn't be defined for generic arrays. Given how codegen is implemented, it isn't obvious how to handle that. What do you suggest?

Take a look at the convertGetter and floatersOnly. The former inserts different strings depending on whether or not dtype is the generic type or one of the primitive types. The latter adds allClose to only the floating point types, by checking dtype and returning an empty string unless dtype is Float or Double. Something similar could be done for mean and sum on NDArray.

Alternatively, you could also just add another copy { } block to genCode and generate brand new files by iterating over just the dtypes you need. There's no need for everything to go into the same set of files.

Also, feel free to just PR the extension functions directly if you'd rather not deal with it, and I can update the codegen afterwards.

@peastman
Copy link
Contributor Author

One option to consider is switching to a more powerful templating engine. The one built into Gradle is pretty limited. A lot of other ones can handle conditional blocks, loops, etc., which would make cases like this easy. It also would let you put all the logic into the template, rather than splitting it between the template and the build script.

@kyonifer
Copy link
Owner

Fair enough. The only reason for using gradle's is to avoid another dependency. However, we could probably use something on the jvm like pebble and grab it directly from maven in the buildscript block or in a buildSrc project.

@kyonifer
Copy link
Owner

Opened #76 to track that idea.

@kyonifer
Copy link
Owner

Closed by #78.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants