added sliding-window functions sliding and slidingWithSizeAndStep #212

FloWi · 2021-04-13T07:40:50Z

Added the functions sliding and slidingWithSizeAndStep to allow sliding-window computations. In order to realise the slidingWithSizeAndStep I added rangeWithStep.

I decided to have the functions return empty arrays in case the user gave them nonsensical arguments. Can change that to Maybe if that's preferred.

Thanks for the review, @sigma-andex.

FloWi · 2021-04-13T11:29:53Z

Hi maintainers!
I'm quite new to the purescript-community (coming from scala) and was missing a function that comes in handy when analysing time series data. So I created a PR that adds these functions to arrays. Let me know what you think and if I can improve my code.

JordanMartinez · 2021-04-13T19:33:41Z

Just a few initial thoughts about this:

I don't see why we wouldn't add this. Adding a sliding view of an array is helpful. However, this sort of opens a can of worms because if we do so here, then we should likely add such API to both lists and their non-empty/lazy counterparts (though I'm not entirely sure about the lazy counterparts). That being said, I don't think we should wait until PRs adding that to other places are open before adding it here.
we could implement this without trying to be performant. For some use cases, this is "good enough", but we should document at least the Big O notation for these algorithms. If we want to be as fast as possible, then FFI would be preferred. However, that comes at the cost of making other backends implement this code since STFn as a concept can't be used in core libraries yet. If we used just ST to implement this, I'm not sure whether it would be more or less performant than the current implementation.
On naming things, I think slidingWithSizeAndStep is a very long name and hope we can agree upon a shorter name that still communicates its idea. That might be as simple as renaming it to sliding', a version of sliding with more configuration options shown. This name follows general conventions/patterns for naming things.
slidingWithSizeAndStep should likely return Array (NonEmptyArray a) because when step <= 0, an empty array will be returned. Changing the return type to NonEmptyArray means Foldable1 and its counterparts can be used in addition to Foldable.

FloWi · 2021-04-13T22:09:55Z

Hi Jordan!

I started using purescript around one month ago and I love your great website with all the examples. Used it a lot while getting familiar with the language and it helped me a lot. 👍

Thanks for your comprehensive input on the PR.

tl;dr Here's my plan - what do you think?

introduce records to give the arguments names instead of currying? (if this is common for APIs in purescript). Don't like this signature-part -> Int -> Int -> Int
create rangeWithStep' :: forall a. (Int -> a) -> Int -> Int -> Int -> Array a to use ST.push to avoid unnecessary copying
implement rangeWithStep = rangeWithStep' identity
rename slidingWithSizeAndStep to sliding'
implement sliding and sliding' to use rangeWithStep' and pass the call to slice into it.
change return type of sliding' to Array (NonEmptyArray a)

I don't see why we wouldn't add this. Adding a sliding view of an array is helpful. However, this sort of opens a can of worms because if we do so here, then we should likely add such API to both lists and their non-empty/lazy counterparts (though I'm not entirely sure about the lazy counterparts). That being said, I don't think we should wait until PRs adding that to other places are open before adding it here.

I think the runtime of a list-implementation would be quite horrible due to the lack of index-lookups - if the windows overlap. In that case it could be faster to copy the list into an array, perform the function and convert the result back to a list. So +1 for waiting until people ask for it. ;-)

we could implement this without trying to be performant. For some use cases, this is "good enough", but we should document at least the Big O notation for these algorithms. If we want to be as fast as possible, then FFI would be preferred. However, that comes at the cost of making other backends implement this code since STFn as a concept can't be used in core libraries yet.

I never heard of STFn. Do you mean this? https://github.com/purescript/purescript-st I also can't judge that part, since I'm new to purescript.

If we used just ST to implement this, I'm not sure whether it would be more or less performant than the current implementation.

I use snoc for both creating my array of the starting-indices, and then again while mapping over the indices to collect the slices into the resulting array. Just saw that snoc always creates a mutable copy of the array. That is quite inefficient indeed. So, I like your idea of using ST to work on mutable arrays here. I think I'll rewrite it to use St.push on a mutable array. I can even rewrite my rangeWithStep function to also accept a (Int -> a) and push the result into the mutable array. That way I can reuse it with slice function and avoid the allocation of the intermediate array.

On naming things, I think slidingWithSizeAndStep is a very long name and hope we can agree upon a shorter name that still communicates its idea. That might be as simple as renaming it to sliding', a version of sliding with more configuration options shown. This name follows general conventions/patterns for naming things.

Good to know, I like the idea. Is it ok to use a record to give the parameters names? Or is currying more common in purescript?

slidingWithSizeAndStep should likely return Array (NonEmptyArray a) because when step <= 0, an empty array will be returned. Changing the return type to NonEmptyArray means Foldable1 and its counterparts can be used in addition to Foldable.

Great idea. Haven't thought about that.

hdgarrood · 2021-04-13T23:48:19Z

I think we should avoid FFI in core because of portability - it should ideally not be any harder than it needs to be to get the core libraries running on a new backend.

I also prefer slidingWithStepAndSize over sliding'. I think primed names are fine for local variable names but not suitable for function names exported from a library, especially not a core library. All a prime tells you is that the function is slightly different from the non-primed version, but that’s not very helpful if you can’t remember how it’s different.

JordanMartinez · 2021-04-14T01:08:12Z

I started using purescript around one month ago and I love your great website with all the examples. Used it a lot while getting familiar with the language and it helped me a lot. +1

Glad to hear it helped!

introduce records to give the arguments names instead of currying? (if this is common for APIs in purescript). Don't like this signature-part
...
Good to know, I like the idea. Is it ok to use a record to give the parameters names? Or is currying more common in purescript?

I think what you have is good (i.e. currying). While it does make it harder to understand which argument is which, the alternatives aren't much better. Below are a few examples that highlight their problems:

One could pass in a record, but is it worth it to create a record object before consuming it pretty much immediately in the function just to fix this documentation issue? Probably not. Adding docs that indicate what the arguments are would likely be better.
One could define type alias that documents at the type signature what each type is (e.g. type InclusiveStartIndex = Int). However, type aliases sometimes do more harm than good because now you have the new problem of figuring out what InclusiveStartIndex is. For example, is that a zero-based index or a one-based index? Moreover, you need to export the type alias, but it doesn't really do anything. Perhaps this issue would be less important if a core library provided a number of common type aliases that others could use in their code for documentation purposes, but I'm not sure whether the tradeoffs are worth it. Lastly, at the end of the day, it's still just an Int, so one does not gain any type safety from it.

I think the runtime of a list-implementation would be quite horrible due to the lack of index-lookups - if the windows overlap. In that case it could be faster to copy the list into an array, perform the function and convert the result back to a list. So +1 for waiting until people ask for it. ;-)

Yeah, terrible performance would likely cause us to go in that direction (i.e. transforming to an array first, sliding, and transforming back to a list).

I never heard of STFn. Do you mean this? https://github.com/purescript/purescript-st I also can't judge that part, since I'm new to purescript.

STFn doesn't actually exist, but the idea does. There is Effect, that curries its arguments, and EffectFn that uncurries its arguments. For example, using arrow syntax for functions from JavaScript.

Effect a translates to () => computationProducingA().
a -> b -> c -> Effect d translates to (a) => (b) => (c) => () => computationProducingD(a, b, c), or a curried function, which isn't performance.
To get an uncurried version of the above code, we'd use EffectFn4 a b c d. That translates to (a, b, c) => () => computationProducingD(a, b, c).
ST h a translates to () => mutableComputationProducingA()
a -> b -> c -> ST h d translates to (a) => (b) => (c) => () => mutableComputationProducingD(a, b, c)
We don't have an uncurried version of ST, which would be called STFn4 h a b c d in this context.

That's what the discussion around purescript/purescript-st#31 involves (though there is more to it than just that). If that idea was implemented, we would get something as (nearly?) fast as FFI, but without the backend portability issues that Harry mentioned. However, this idea is more nuanced than what I'm summarizing here, so it's not just an "obvious" fix that we haven't yet implemented.

Because ST is curried, it's not as fast as it could be if we got rid of the unneeded closures. I ran into a similar issue with performance when I tried implementing everything in ST in #203. I imagine yours would benefit if ST was used, but likely not as much as if an STFn-implementation could be used. I'd be curious to see the before/after benchmarks of your current implementation with an ST-based one.

I think we should avoid FFI in core because of portability - it should ideally not be any harder than it needs to be to get the core libraries running on a new backend.

Also, because I don't think I was clear earlier, I agree with Harry on this point. Someone who really wants the same thing in a very performant way could write the FFI for their specific project.

I also prefer slidingWithStepAndSize over sliding'. I think primed names are fine for local variable names but not suitable for function names exported from a library, especially not a core library.

I agree with this on principle, but I'd still like to see if we could make the name shorter. If we dropped with with and and part, would slidingStepSize be a good tradeoff? Or should we just stick with slidingWithStepAndSize?

src/Data/Array.purs

FloWi · 2021-04-14T21:45:47Z

Thanks for your input, @JordanMartinez and @hdgarrood. I really appreciate it. Always a pleasure when the first contact with the community of a programming language is positive.

I'm going to measure the current implementation and come up with improvements probably on the weekend and get back to you with results. Currently running a bit low on energy after work, since I have allergies and my archenemy (birch tree) is in full bloom.

JordanMartinez · 2021-04-15T13:37:30Z

Sounds good!

FloWi · 2021-04-18T10:54:35Z

It took me a while to figure out how to use STA.new andSTA.push, since I wasn't able to write the signature of the array. Compiler was also complaining about scoping issues with the h parameter. I put my helper function now inside the STA.run and omitted the type-signature (type-inference ftw :)). Let me know if anything can be improved.

I skipped the measurement-part of the implementation, since it now operates on a mutable array and avoids copying and I didn't want to open another rabbit hole tbh ;-) Still at the stage in my learning curve where I struggle with the syntax from time to time.

JordanMartinez

Overall looks good. Just need to clean up a few docs and some of the code.

src/Data/Array.purs

test/Test/Data/Array.purs

src/Data/Array.purs

FloWi · 2021-04-18T16:16:15Z

Thanks again for your input, @JordanMartinez and @hdgarrood. Made the documentation much clearer imo.

src/Data/Array.purs

JordanMartinez

I'll let other core team members weigh in on this PR.

One minor change that could also be done is to replace void $ with _ <-. Not sure how much of a difference it makes. For example, in rangeWithStep'

arr <- STA.new
-- Original
-- void $ helper start arr

-- Updated
_ <- helper start arr
pure arr

JordanMartinez · 2021-04-19T00:21:51Z

src/Data/Array.purs

+  then []
+  else STA.run
+   ( do
+      let


Just a stylistic change. The parenthesis after the ST.run can be removed if you follow it with do

else ST.run do let helper current acc = {- code... -} arr <- STA.new

Just to clarify, normally ST.run $ codeBlock will work. However, because of the Rank-2 type in ST.run (i.e. the h type parameter), using $ in this way will always produce a compiler error. However, using do creates a new block, so the parenthesis are no longer needed.

Other than that, I think this is ready to merge

I think I found that out the hard way when I tried to move the helper function out of the ST.run block. Never heard about rank-2 types before - at least not consciously ;-)

FloWi · 2021-04-19T07:56:09Z

The build failed because of an unused variable, but I haven't touched that place in the code. 🤔
Were there some changes to the compiler-settings in CI? Couldn't find anything in the commits and the build runs fine on my machine.

thomashoneyman · 2021-04-19T23:22:06Z

PureScript 0.14.1 was released yesterday and we have CI on core libraries run on the latest compiler. The build is failing because of a newly-introduced warning. It's fixed by #213, so once that merges your build will be fixed as well.

thomashoneyman · 2021-04-19T23:26:52Z

You can merge the default branch to fix the build now. Thanks!

FloWi · 2021-04-20T05:46:56Z

Thanks, @thomashoneyman, worked like a charm!

… slidingWithSizeAndStep

…a mapping function

hdgarrood · 2021-04-25T16:00:29Z

Sorry that I didn't raise this sooner, but I am actually a bit hesitant to merge this at this moment in time. We ought to be really careful about adding things to the core libraries, because if we don't get the APIs for functions right the first time, it means we need to make breaking changes to fix them, and breaking changes in libraries that the overwhelming majority of the ecosystem depends on are much more painful than breaking changes that only a small section of the ecosystem depends on. In practice, I've found that it's almost impossible to get APIs right the first time, especially if we have not considered a variety of situations where the API might be useful to have.

Making sure the core libraries continue to improve and don't stagnate at the same time as avoiding unnecessary breakage and ecosystem churn is a very delicate balancing act. I'll suggest some rules of thumb for when something should be added to the core libraries based on my experience:

There should be evidence of demand for API to be used relatively frequently, by different people in different situations,
We should validate that the chosen design is appropriate in as many of these different situations as possible,
We should have considered alternative designs for the API and be able to explain why the chosen design is more appropriate than those alternatives.

For an example of the last point - as I recall, the last time I wanted a sliding view over an array, I wanted something more like this:

type Window a = { previous :: Maybe a, current :: a, next :: Maybe a }
windows :: Array a -> Array (Window a)

where length (windows xs) == length xs.

Another reason to be wary about adding things to the core libraries is that it adds more of a maintenance burden on the core team. This is why I am often in favour of letting APIs be trialled in third party packages first - it is much easier to iterate on the design when the API lives in a separate package (because breaking changes can be made much more frequently), and it's also easier to provide evidence that the API is in demand if we see other packages start to depend on it. (as an aside: I think it might be nice to adopt a system like Rust's stability annotations which would make it much easier to iterate on things within the core libraries without introducing too much ecosystem churn, but that's probably quite a big project.)

I'm not saying we shouldn't merge this, I just think we should consider these points a bit more carefully before merging. I personally think I am spreading myself a bit too thin, and I'd like to focus more on the language and compiler, so in this specific case I am going to step back and let others in the core team decide (and I'll be happy with whatever your decision is).

JordanMartinez · 2021-04-25T20:44:43Z

@hdgarrood Good thoughts!

as an aside: I think it might be nice to adopt a system like Rust's stability annotations which would make it much easier to iterate on things within the core libraries without introducing too much ecosystem churn, but that's probably quite a big project.

This sounds like a larger discussion to be had somewhere outside of this repo. I'll open an issue in the purescript/governance repo where we can discuss that more.

For an example of the last point - as I recall, the last time I wanted a sliding view over an array, I wanted something more like this

Hmm... The Zipper-like design you propose seems similar to the approach I took when writing purescript-arrays-zipper. But it seems like your approach is more flexible in that the resulting zipper could focus one element at a time or focus one window as done in this PR at a time.

I personally think I am spreading myself a bit too thin, and I'd like to focus more on the language and compiler, so in this specific case I am going to step back and let others in the core team decide (and I'll be happy with whatever your decision is).

Thanks! I think the direction I would like to take here is to continue with what you proposed, however, and think through this design more. In other words, I think it would make more sense to open a new issue on this repo, talk about the problem these functions try to solve, the various ways they could be designed, which approach makes the most sense, and whether that approach should exist as a separate library or be merged into here.

JordanMartinez · 2021-04-27T13:28:14Z

@FloWi I'm going to open an issue where we can discuss the design of this idea more. I'll backlink to this PR in the issue.

JordanMartinez · 2021-04-27T14:14:01Z

@FloWi See #214

sigma-andex · 2022-03-19T14:10:04Z

Can this be merged?

sigma-andex · 2022-06-28T09:49:00Z

@JordanMartinez can we merge this? I think it is better to have it than not and it implements a well established function that is also part of the standard libs in other programming languages, like e.g. Scala https://www.scala-lang.org/api/3.1.2/scala/collection/immutable/IndexedSeq.html#sliding-fffff156

JordanMartinez · 2022-06-30T03:41:40Z

@sigma-andex Did you look at the discussion in #214? Could you continue the conversation there?

JordanMartinez reviewed Apr 14, 2021

View reviewed changes

src/Data/Array.purs Outdated Show resolved Hide resolved

JordanMartinez suggested changes Apr 18, 2021

View reviewed changes

FloWi force-pushed the sliding-window branch from c71cbaf to 79b800f Compare April 18, 2021 16:12

JordanMartinez reviewed Apr 18, 2021

View reviewed changes

src/Data/Array.purs Outdated Show resolved Hide resolved

JordanMartinez reviewed Apr 18, 2021

View reviewed changes

src/Data/Array.purs Outdated Show resolved Hide resolved

FloWi force-pushed the sliding-window branch 3 times, most recently from 4994b9f to 25dd1cd Compare April 18, 2021 20:45

JordanMartinez approved these changes Apr 19, 2021

View reviewed changes

FloWi force-pushed the sliding-window branch from 25dd1cd to c33db7b Compare April 19, 2021 07:47

FloWi added 3 commits April 20, 2021 11:01

added creator-function rangeWithStep and window-functions sliding and…

82a50c0

… slidingWithSizeAndStep

introduced rangeWithStep', that works on a mutable array and takes …

54782cd

…a mapping function

improved docs and tests based on code review

0660bde

FloWi force-pushed the sliding-window branch from c33db7b to 0660bde Compare April 20, 2021 09:01

JordanMartinez mentioned this pull request Apr 25, 2021

Clarify guidelines for when to accept new features into core libraries purescript/governance#12

Open

JordanMartinez mentioned this pull request Apr 27, 2021

Add API: sliding and slidingSizeStep #214

Open

FloWi closed this Dec 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added sliding-window functions sliding and slidingWithSizeAndStep #212

added sliding-window functions sliding and slidingWithSizeAndStep #212

FloWi commented Apr 13, 2021

FloWi commented Apr 13, 2021

JordanMartinez commented Apr 13, 2021

FloWi commented Apr 13, 2021

hdgarrood commented Apr 13, 2021

JordanMartinez commented Apr 14, 2021

FloWi commented Apr 14, 2021

JordanMartinez commented Apr 15, 2021

FloWi commented Apr 18, 2021

JordanMartinez left a comment

FloWi commented Apr 18, 2021

JordanMartinez left a comment

JordanMartinez Apr 19, 2021

FloWi Apr 19, 2021

FloWi commented Apr 19, 2021

thomashoneyman commented Apr 19, 2021

thomashoneyman commented Apr 19, 2021

FloWi commented Apr 20, 2021 •

edited

Loading

hdgarrood commented Apr 25, 2021

JordanMartinez commented Apr 25, 2021

JordanMartinez commented Apr 27, 2021

JordanMartinez commented Apr 27, 2021

sigma-andex commented Mar 19, 2022

sigma-andex commented Jun 28, 2022

JordanMartinez commented Jun 30, 2022

added sliding-window functions sliding and slidingWithSizeAndStep #212

added sliding-window functions sliding and slidingWithSizeAndStep #212

Conversation

FloWi commented Apr 13, 2021

FloWi commented Apr 13, 2021

JordanMartinez commented Apr 13, 2021

FloWi commented Apr 13, 2021

hdgarrood commented Apr 13, 2021

JordanMartinez commented Apr 14, 2021

FloWi commented Apr 14, 2021

JordanMartinez commented Apr 15, 2021

FloWi commented Apr 18, 2021

JordanMartinez left a comment

Choose a reason for hiding this comment

FloWi commented Apr 18, 2021

JordanMartinez left a comment

Choose a reason for hiding this comment

JordanMartinez Apr 19, 2021

Choose a reason for hiding this comment

FloWi Apr 19, 2021

Choose a reason for hiding this comment

FloWi commented Apr 19, 2021

thomashoneyman commented Apr 19, 2021

thomashoneyman commented Apr 19, 2021

FloWi commented Apr 20, 2021 • edited Loading

hdgarrood commented Apr 25, 2021

JordanMartinez commented Apr 25, 2021

JordanMartinez commented Apr 27, 2021

JordanMartinez commented Apr 27, 2021

sigma-andex commented Mar 19, 2022

sigma-andex commented Jun 28, 2022

JordanMartinez commented Jun 30, 2022

FloWi commented Apr 20, 2021 •

edited

Loading