Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added sliding-window functions sliding and slidingWithSizeAndStep #212

Closed
wants to merge 3 commits into from

Conversation

FloWi
Copy link

@FloWi FloWi commented Apr 13, 2021

Added the functions sliding and slidingWithSizeAndStep to allow sliding-window computations. In order to realise the slidingWithSizeAndStep I added rangeWithStep.

I decided to have the functions return empty arrays in case the user gave them nonsensical arguments. Can change that to Maybe if that's preferred.

Thanks for the review, @sigma-andex.

@FloWi
Copy link
Author

FloWi commented Apr 13, 2021

Hi maintainers!
I'm quite new to the purescript-community (coming from scala) and was missing a function that comes in handy when analysing time series data. So I created a PR that adds these functions to arrays. Let me know what you think and if I can improve my code.

@JordanMartinez
Copy link
Contributor

Just a few initial thoughts about this:

  • I don't see why we wouldn't add this. Adding a sliding view of an array is helpful. However, this sort of opens a can of worms because if we do so here, then we should likely add such API to both lists and their non-empty/lazy counterparts (though I'm not entirely sure about the lazy counterparts). That being said, I don't think we should wait until PRs adding that to other places are open before adding it here.
  • we could implement this without trying to be performant. For some use cases, this is "good enough", but we should document at least the Big O notation for these algorithms. If we want to be as fast as possible, then FFI would be preferred. However, that comes at the cost of making other backends implement this code since STFn as a concept can't be used in core libraries yet. If we used just ST to implement this, I'm not sure whether it would be more or less performant than the current implementation.
  • On naming things, I think slidingWithSizeAndStep is a very long name and hope we can agree upon a shorter name that still communicates its idea. That might be as simple as renaming it to sliding', a version of sliding with more configuration options shown. This name follows general conventions/patterns for naming things.
  • slidingWithSizeAndStep should likely return Array (NonEmptyArray a) because when step <= 0, an empty array will be returned. Changing the return type to NonEmptyArray means Foldable1 and its counterparts can be used in addition to Foldable.

@FloWi
Copy link
Author

FloWi commented Apr 13, 2021

Hi Jordan!

I started using purescript around one month ago and I love your great website with all the examples. Used it a lot while getting familiar with the language and it helped me a lot. 👍

Thanks for your comprehensive input on the PR.

tl;dr Here's my plan - what do you think?

  • introduce records to give the arguments names instead of currying? (if this is common for APIs in purescript). Don't like this signature-part -> Int -> Int -> Int
  • create rangeWithStep' :: forall a. (Int -> a) -> Int -> Int -> Int -> Array a to use ST.push to avoid unnecessary copying
  • implement rangeWithStep = rangeWithStep' identity
  • rename slidingWithSizeAndStep to sliding'
  • implement sliding and sliding' to use rangeWithStep' and pass the call to slice into it.
  • change return type of sliding' to Array (NonEmptyArray a)

  • I don't see why we wouldn't add this. Adding a sliding view of an array is helpful. However, this sort of opens a can of worms because if we do so here, then we should likely add such API to both lists and their non-empty/lazy counterparts (though I'm not entirely sure about the lazy counterparts). That being said, I don't think we should wait until PRs adding that to other places are open before adding it here.

I think the runtime of a list-implementation would be quite horrible due to the lack of index-lookups - if the windows overlap. In that case it could be faster to copy the list into an array, perform the function and convert the result back to a list. So +1 for waiting until people ask for it. ;-)

  • we could implement this without trying to be performant. For some use cases, this is "good enough", but we should document at least the Big O notation for these algorithms. If we want to be as fast as possible, then FFI would be preferred. However, that comes at the cost of making other backends implement this code since STFn as a concept can't be used in core libraries yet.

I never heard of STFn. Do you mean this? https://github.com/purescript/purescript-st I also can't judge that part, since I'm new to purescript.

If we used just ST to implement this, I'm not sure whether it would be more or less performant than the current implementation.

I use snoc for both creating my array of the starting-indices, and then again while mapping over the indices to collect the slices into the resulting array. Just saw that snoc always creates a mutable copy of the array. That is quite inefficient indeed. So, I like your idea of using ST to work on mutable arrays here. I think I'll rewrite it to use St.push on a mutable array. I can even rewrite my rangeWithStep function to also accept a (Int -> a) and push the result into the mutable array. That way I can reuse it with slice function and avoid the allocation of the intermediate array.

  • On naming things, I think slidingWithSizeAndStep is a very long name and hope we can agree upon a shorter name that still communicates its idea. That might be as simple as renaming it to sliding', a version of sliding with more configuration options shown. This name follows general conventions/patterns for naming things.

Good to know, I like the idea. Is it ok to use a record to give the parameters names? Or is currying more common in purescript?

  • slidingWithSizeAndStep should likely return Array (NonEmptyArray a) because when step <= 0, an empty array will be returned. Changing the return type to NonEmptyArray means Foldable1 and its counterparts can be used in addition to Foldable.

Great idea. Haven't thought about that.

@hdgarrood
Copy link
Contributor

I think we should avoid FFI in core because of portability - it should ideally not be any harder than it needs to be to get the core libraries running on a new backend.

I also prefer slidingWithStepAndSize over sliding'. I think primed names are fine for local variable names but not suitable for function names exported from a library, especially not a core library. All a prime tells you is that the function is slightly different from the non-primed version, but that’s not very helpful if you can’t remember how it’s different.

@JordanMartinez
Copy link
Contributor

I started using purescript around one month ago and I love your great website with all the examples. Used it a lot while getting familiar with the language and it helped me a lot. +1

Glad to hear it helped!

introduce records to give the arguments names instead of currying? (if this is common for APIs in purescript). Don't like this signature-part
...
Good to know, I like the idea. Is it ok to use a record to give the parameters names? Or is currying more common in purescript?

I think what you have is good (i.e. currying). While it does make it harder to understand which argument is which, the alternatives aren't much better. Below are a few examples that highlight their problems:

  1. One could pass in a record, but is it worth it to create a record object before consuming it pretty much immediately in the function just to fix this documentation issue? Probably not. Adding docs that indicate what the arguments are would likely be better.
  2. One could define type alias that documents at the type signature what each type is (e.g. type InclusiveStartIndex = Int). However, type aliases sometimes do more harm than good because now you have the new problem of figuring out what InclusiveStartIndex is. For example, is that a zero-based index or a one-based index? Moreover, you need to export the type alias, but it doesn't really do anything. Perhaps this issue would be less important if a core library provided a number of common type aliases that others could use in their code for documentation purposes, but I'm not sure whether the tradeoffs are worth it. Lastly, at the end of the day, it's still just an Int, so one does not gain any type safety from it.

I think the runtime of a list-implementation would be quite horrible due to the lack of index-lookups - if the windows overlap. In that case it could be faster to copy the list into an array, perform the function and convert the result back to a list. So +1 for waiting until people ask for it. ;-)

Yeah, terrible performance would likely cause us to go in that direction (i.e. transforming to an array first, sliding, and transforming back to a list).

I never heard of STFn. Do you mean this? https://github.com/purescript/purescript-st I also can't judge that part, since I'm new to purescript.

STFn doesn't actually exist, but the idea does. There is Effect, that curries its arguments, and EffectFn that uncurries its arguments. For example, using arrow syntax for functions from JavaScript.

  • Effect a translates to () => computationProducingA().
  • a -> b -> c -> Effect d translates to (a) => (b) => (c) => () => computationProducingD(a, b, c), or a curried function, which isn't performance.
  • To get an uncurried version of the above code, we'd use EffectFn4 a b c d. That translates to (a, b, c) => () => computationProducingD(a, b, c).
  • ST h a translates to () => mutableComputationProducingA()
  • a -> b -> c -> ST h d translates to (a) => (b) => (c) => () => mutableComputationProducingD(a, b, c)
  • We don't have an uncurried version of ST, which would be called STFn4 h a b c d in this context.

That's what the discussion around purescript/purescript-st#31 involves (though there is more to it than just that). If that idea was implemented, we would get something as (nearly?) fast as FFI, but without the backend portability issues that Harry mentioned. However, this idea is more nuanced than what I'm summarizing here, so it's not just an "obvious" fix that we haven't yet implemented.

Because ST is curried, it's not as fast as it could be if we got rid of the unneeded closures. I ran into a similar issue with performance when I tried implementing everything in ST in #203. I imagine yours would benefit if ST was used, but likely not as much as if an STFn-implementation could be used. I'd be curious to see the before/after benchmarks of your current implementation with an ST-based one.

I think we should avoid FFI in core because of portability - it should ideally not be any harder than it needs to be to get the core libraries running on a new backend.

Also, because I don't think I was clear earlier, I agree with Harry on this point. Someone who really wants the same thing in a very performant way could write the FFI for their specific project.

I also prefer slidingWithStepAndSize over sliding'. I think primed names are fine for local variable names but not suitable for function names exported from a library, especially not a core library.

I agree with this on principle, but I'd still like to see if we could make the name shorter. If we dropped with with and and part, would slidingStepSize be a good tradeoff? Or should we just stick with slidingWithStepAndSize?

src/Data/Array.purs Outdated Show resolved Hide resolved
@FloWi
Copy link
Author

FloWi commented Apr 14, 2021

Thanks for your input, @JordanMartinez and @hdgarrood. I really appreciate it. Always a pleasure when the first contact with the community of a programming language is positive.

I'm going to measure the current implementation and come up with improvements probably on the weekend and get back to you with results. Currently running a bit low on energy after work, since I have allergies and my archenemy (birch tree) is in full bloom.

@JordanMartinez
Copy link
Contributor

Sounds good!

@FloWi
Copy link
Author

FloWi commented Apr 18, 2021

It took me a while to figure out how to use STA.new andSTA.push, since I wasn't able to write the signature of the array. Compiler was also complaining about scoping issues with the h parameter. I put my helper function now inside the STA.run and omitted the type-signature (type-inference ftw :)). Let me know if anything can be improved.

I skipped the measurement-part of the implementation, since it now operates on a mutable array and avoids copying and I didn't want to open another rabbit hole tbh ;-) Still at the stage in my learning curve where I struggle with the syntax from time to time.

Copy link
Contributor

@JordanMartinez JordanMartinez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. Just need to clean up a few docs and some of the code.

src/Data/Array.purs Outdated Show resolved Hide resolved
src/Data/Array.purs Show resolved Hide resolved
src/Data/Array.purs Outdated Show resolved Hide resolved
src/Data/Array.purs Outdated Show resolved Hide resolved
src/Data/Array.purs Outdated Show resolved Hide resolved
test/Test/Data/Array.purs Show resolved Hide resolved
src/Data/Array.purs Show resolved Hide resolved
@FloWi
Copy link
Author

FloWi commented Apr 18, 2021

Thanks again for your input, @JordanMartinez and @hdgarrood. Made the documentation much clearer imo.

src/Data/Array.purs Outdated Show resolved Hide resolved
src/Data/Array.purs Outdated Show resolved Hide resolved
@FloWi FloWi force-pushed the sliding-window branch 3 times, most recently from 4994b9f to 25dd1cd Compare April 18, 2021 20:45
Copy link
Contributor

@JordanMartinez JordanMartinez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let other core team members weigh in on this PR.

One minor change that could also be done is to replace void $ with _ <-. Not sure how much of a difference it makes. For example, in rangeWithStep'

arr <- STA.new
-- Original
-- void $ helper start arr

-- Updated
_ <- helper start arr
pure arr

then []
else STA.run
( do
let
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a stylistic change. The parenthesis after the ST.run can be removed if you follow it with do

else ST.run do
  let
    helper current acc = {- code... -}
    
  arr <- STA.new

Just to clarify, normally ST.run $ codeBlock will work. However, because of the Rank-2 type in ST.run (i.e. the h type parameter), using $ in this way will always produce a compiler error. However, using do creates a new block, so the parenthesis are no longer needed.

Other than that, I think this is ready to merge

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I found that out the hard way when I tried to move the helper function out of the ST.run block. Never heard about rank-2 types before - at least not consciously ;-)

@FloWi
Copy link
Author

FloWi commented Apr 19, 2021

The build failed because of an unused variable, but I haven't touched that place in the code. 🤔
Were there some changes to the compiler-settings in CI? Couldn't find anything in the commits and the build runs fine on my machine.

@thomashoneyman
Copy link
Member

PureScript 0.14.1 was released yesterday and we have CI on core libraries run on the latest compiler. The build is failing because of a newly-introduced warning. It's fixed by #213, so once that merges your build will be fixed as well.

@thomashoneyman
Copy link
Member

You can merge the default branch to fix the build now. Thanks!

@FloWi
Copy link
Author

FloWi commented Apr 20, 2021

Thanks, @thomashoneyman, worked like a charm!

@hdgarrood
Copy link
Contributor

Sorry that I didn't raise this sooner, but I am actually a bit hesitant to merge this at this moment in time. We ought to be really careful about adding things to the core libraries, because if we don't get the APIs for functions right the first time, it means we need to make breaking changes to fix them, and breaking changes in libraries that the overwhelming majority of the ecosystem depends on are much more painful than breaking changes that only a small section of the ecosystem depends on. In practice, I've found that it's almost impossible to get APIs right the first time, especially if we have not considered a variety of situations where the API might be useful to have.

Making sure the core libraries continue to improve and don't stagnate at the same time as avoiding unnecessary breakage and ecosystem churn is a very delicate balancing act. I'll suggest some rules of thumb for when something should be added to the core libraries based on my experience:

  • There should be evidence of demand for API to be used relatively frequently, by different people in different situations,
  • We should validate that the chosen design is appropriate in as many of these different situations as possible,
  • We should have considered alternative designs for the API and be able to explain why the chosen design is more appropriate than those alternatives.

For an example of the last point - as I recall, the last time I wanted a sliding view over an array, I wanted something more like this:

type Window a = { previous :: Maybe a, current :: a, next :: Maybe a }
windows :: Array a -> Array (Window a)

where length (windows xs) == length xs.

Another reason to be wary about adding things to the core libraries is that it adds more of a maintenance burden on the core team. This is why I am often in favour of letting APIs be trialled in third party packages first - it is much easier to iterate on the design when the API lives in a separate package (because breaking changes can be made much more frequently), and it's also easier to provide evidence that the API is in demand if we see other packages start to depend on it. (as an aside: I think it might be nice to adopt a system like Rust's stability annotations which would make it much easier to iterate on things within the core libraries without introducing too much ecosystem churn, but that's probably quite a big project.)

I'm not saying we shouldn't merge this, I just think we should consider these points a bit more carefully before merging. I personally think I am spreading myself a bit too thin, and I'd like to focus more on the language and compiler, so in this specific case I am going to step back and let others in the core team decide (and I'll be happy with whatever your decision is).

@JordanMartinez
Copy link
Contributor

@hdgarrood Good thoughts!

as an aside: I think it might be nice to adopt a system like Rust's stability annotations which would make it much easier to iterate on things within the core libraries without introducing too much ecosystem churn, but that's probably quite a big project.

This sounds like a larger discussion to be had somewhere outside of this repo. I'll open an issue in the purescript/governance repo where we can discuss that more.

For an example of the last point - as I recall, the last time I wanted a sliding view over an array, I wanted something more like this

Hmm... The Zipper-like design you propose seems similar to the approach I took when writing purescript-arrays-zipper. But it seems like your approach is more flexible in that the resulting zipper could focus one element at a time or focus one window as done in this PR at a time.

I personally think I am spreading myself a bit too thin, and I'd like to focus more on the language and compiler, so in this specific case I am going to step back and let others in the core team decide (and I'll be happy with whatever your decision is).

Thanks! I think the direction I would like to take here is to continue with what you proposed, however, and think through this design more. In other words, I think it would make more sense to open a new issue on this repo, talk about the problem these functions try to solve, the various ways they could be designed, which approach makes the most sense, and whether that approach should exist as a separate library or be merged into here.

@JordanMartinez
Copy link
Contributor

@FloWi I'm going to open an issue where we can discuss the design of this idea more. I'll backlink to this PR in the issue.

@JordanMartinez
Copy link
Contributor

@FloWi See #214

@sigma-andex
Copy link

Can this be merged?

@sigma-andex
Copy link

@JordanMartinez can we merge this? I think it is better to have it than not and it implements a well established function that is also part of the standard libs in other programming languages, like e.g. Scala https://www.scala-lang.org/api/3.1.2/scala/collection/immutable/IndexedSeq.html#sliding-fffff156

@JordanMartinez
Copy link
Contributor

@sigma-andex Did you look at the discussion in #214? Could you continue the conversation there?

@FloWi FloWi closed this Dec 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants