Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation Proposals #2046

Closed
zm711 opened this issue Sep 27, 2023 · 34 comments
Closed

Documentation Proposals #2046

zm711 opened this issue Sep 27, 2023 · 34 comments
Labels
documentation Improvements or additions to documentation

Comments

@zm711
Copy link
Collaborator

zm711 commented Sep 27, 2023

As I was looking over the docs I was thinking of two changes that I think would be beneficial, but since they would be a bit of work to implement (I'm happy to do 1 or 2 PRs for this), I wanted to make sure no one was absolutely against them.

  1. Switch import statements to follow module aliases, for example in the curation module docs it says:
from spikeinterface.curation import MergeUnitsSorting, get_potential_auto_merge

sorting = run_sorter('kilosort', recording)

we = extract_waveforms(recording, sorting, folder='wf_folder')

# merges is a list of lists, with unit_ids to be merged.
merges = get_potential_auto_merge(we, minimum_spikes=1000,  maximum_distance_um=150.,
                                  peak_sign="neg", bin_ms=0.25, window_ms=100.,
                                  corr_diff_thresh=0.16, template_diff_thresh=0.25,
                                  censored_period_ms=0., refractory_period_ms=1.0,
                                  contamination_threshold=0.2, num_channels=5, num_shift=5,
                                  firing_contamination_balance=1.5)

# here we apply the merges
clean_sorting = MergeUnitsSorting(sorting, merges)

I would propose instead to use

import spikeinterface as si
import spikeinterface.curation as scur
import spikeinterface.sorters as ss

sorting = ss.run_sorter('kilosort', recording)
we = si.extract_waveforms(recording, sorting, folder = 'wf_folder')
...
clean_sorting = scur.MergeUnitSorting(sorting, merges)

Although an advanced user can import whatever they want, I think helping beginners to use the namespace would be beneficial. And in the docs seeing that run_sorter is part of ss which is the alias of sorters will help them find their way around the documentation by knowing which module to go to if they aren't sure where a function came from.

  1. The documentation makes extensive usage of positional arguments rather than keyword arguments. I think adding in keyword arguments would help users understand what they are putting into a function and reduce the risk of putting the wrong data into a function. in the same example I would add
sorting = ss.run_sorter(sorter_name='kilosort', recording=recording)
we = si.extract_waveforms(recording=recording, sorting=sorting, folder='wf_folder')
...

This is a bit more verbose, but it encourages good function practices and if something goes wrong it will help them generate issues on the tracker/ search through the source code to see what is going on.

But do let me know if anyone is completely against adding these features to the docs. Otherwise I'll get started working on some of those updates when I have some free time.

@zm711 zm711 added the documentation Improvements or additions to documentation label Sep 27, 2023
@samuelgarcia
Copy link
Member

Hi Zach.
On point 1.
On this Alessio will be your best friend.
But not me.

I do not like this submodule with alias in documentation in general.
I like alias for package (np, pd, si, ...) but not for sub package (scipy.signal, spikeinterface.preprocessing).
I do not have any explanation on this feeling.

In spikeinterface there is 3 ways:

import spikeinterface.full as si
si.bandpass_filter()

from spikeinterface.preprocessing import bandpass_filter
bandpass_filter()

import spikeinterface.preprocessing as pre
pre.bandpass_filter()

My feeling is that this is good to have a bit of the 3 situations in the documentations and not impose a unique way of writting.

Also I am not totally convince that a beginner should need to remember from which module a function come from.

This is my very personnal point of view!
Alessio, Herberto, Aurelien should disagree.
I am so impatient to read threre feeling on this :)

On point 2.
I am totally agree. Lets make it more verbose and explicit.

@alejoe91
Copy link
Member

  1. I think there are 2 people in the planet that can use full without any problems, and one of them doesn't use it any more for sake of clarity ;)

Importing modules and aliases makes everything easier to understand and tidier. Especially for the documentation, we should use this approach.

So I vote for submodule imports + aliases

  1. Yes let's use keywords! I agree it's a little more verbose but definitely less ambiguous.

@h-mayorquin
Copy link
Collaborator

I agree on point 2. Keywords are better than positional. I also would like to use that within the library and not just in the documentation.

For the first point. I share the aesthethics of Sam:

I do not like this submodule with alias in documentation in general.

I also don't have an explanation for this. I am willing to change my mind based on usability / design arguments for this. If somebody has a good sourrce where that is discussed that would be great.

I am glad Sam enumerated the options we have here currently:

import spikeinterface.full as si
si.bandpass_filter()

from spikeinterface.preprocessing import bandpass_filter
bandpass_filter()

import spikeinterface.preprocessing as pre
pre.bandpass_filter()

I always use 2 personally. I think that 1 is error prone and I don't trust it I would prefer to drop it. My biggest problem with 3 is that we are using acronmys there instead of full names (pre instead of preprocessing, sw instead of widgets, etc). I dislike being forced to learn ad-hoc acronmys for every library that I use and I don't find them natural at all. Why force people to learn even more stuff? If we go with 3 in the documentation I would like to address that.

My feelings about something being tidy. For me just importing the specific functions that I will use -and as I use them- seems like the tydiest approach. It also provides the benefit of making it explict where does the function comes from. The only problem is when people chose some function name that shadows another function in another library (.e.g from this_random_library import sort). I also have the feelnig that is easier to support full lazy access with a workflow like 2.

That said, I would like to have consistency in the documentation. If we managed to get consensus about something I would go behind that. Just a majority and not a super-majority would do it for me.

@zm711
Copy link
Collaborator Author

zm711 commented Sep 28, 2023

Cool. This is why I figured this one needed a proposal. People have their styles. But it sounds like like for point 2 we all agree so I can start on that as a PR as the debate rages on point 1.

I (personally) do not like importing individual functions because it pollutes the name space (the sort example above). And although not for me, I can image a user who has a different bandpass_filter that is also imported into the space (to be more si specific). This is a bad practice to encourage, though is at the discretion of the user. So for me I prefer to encourage either the first spikeinterface.full or spikeinterface.submodule, but to be honest not both. I think sometimes it is better to give the users one path to do things rather than giving them a lot of different ways to do things. (Basically I'm just quoting the Zen of Python at this point). With documentation it is nice to bring the user from beginning to end (which the how_to guides do a good job of), but if each module doc uses a different notation (although maybe good for longer term retention in a pedagogical sense), it is disorienting and confusing to the end reader.

I also find the knowing where a function came from (for me at least) is clearer with an acronym. If I see a function came from spre then I know it is preprocessing, whereas if I write a script or get down to the bottom of a page of documentation I will just see a function bandpass_filter but (on the docs) I will have the scour the page (it's not always at the top) to find where I imported that from. As the experts you and designers you put functions into modules that make sense for you but why is pca where it is? It could be post it could be qm, maybe as a user I think that I would only use pca for curation so I assume its curation.

On the level of acronyms most of them are just first letters so if I want widgets I need [s]pikeinterface.[w]idgets-> sw. I think people catch onto that pretty quick. Put I agree spre or scur don't follow the general pattern.

For point 1 then, I will await consensus should it ever arrive, but I think we need to at least add some additional import statements because currently the docs use functions from other modules without 'importing' them and so the user has no way to connect a function being used in the current docs to other sections of the docs within the examples. So something needs to be added (even if we choose to still retain all three styles).

@samuelgarcia
Copy link
Member

@h-mayorquin : happy to be your friend on point 1.

@h-mayorquin
Copy link
Collaborator

h-mayorquin commented Sep 28, 2023

I agree with the one way only, at least on the docs.

On the level of acronyms most of them are just first letters so if I want widgets I need [s]pikeinterface.[w]idgets-> sw. I think people catch onto that pretty quick. But I agree spre or scur don't follow the general pattern.

The difficulty is on the reverse inference, all abbreviations makes sense when you see them next to the full word. But if I see sw.some_function in the middle of some script how do I know where that comes from? This happens to me with the areas of spikeinterface that I am not familiar with. The problem is recall not recognition and spikeinterface is not the only library that users will import in a script. Why not just use the full name?

Concerning the pollution of the namespace: very short stuff like a sw is also likely to clash with some other library as someone else might decide to alias their library or something else with sw:

See examples:
https://github.com/spyder-ide/spyder/blob/fa82bcab9a17b44df15c0874b1d0c0e459b3593e/spyder/plugins/tours/tours.py#L14
https://github.com/achael/eht-imaging/blob/2612ee9da599dd09b099792e4e2b3b8d9d73fa69/examples/example_starwarps.py#L21
https://github.com/statsmodels/statsmodels/blob/01b19d7d111b29c183f620ff0a949ef6391ff8ee/statsmodels/regression/linear_model.py#L2186

Namespace clashing becomes less of a problem the more specific you are like using a function, having a more specific module name or reflecting the nesting structure of your library in the alias:

si.preprocessing.bandpass_filter
preprocessing.bandpass_filter

Plus, the latter follows the logic that explicit is better tha implicit kind of thing. The argument against of course is verbosity.

@h-mayorquin
Copy link
Collaborator

h-mayorquin commented Sep 28, 2023

I actually find it interesting that this "branded short import alias" (for lack of a better name) is some kind of scientific library pattern (numpy -> np, pandas -> pd, seaborn -> sns). Probably because people work a lot in notebooks and it somehow convenient (not clear to me)? Or it could be just an historical accident As a counter example, scikit-learn just does not have this type of branded imports in their documentation (they just import full functions) and they seem to be doing well.

On the other hand, people do feel like having this type of brand is really important, check out this guy feeling the pull to fill the pattern:
https://stackoverflow.com/questions/55562269/scikit-learn-import-convention

Why compete for namespace as a brand like that? The only suggestion the PEP 8 is that you juse alises to avoid namespace clash.

Edit: Scipy started importing submodules because of heavy import costs. I have the memory than scipy is older than most other modules, maybe it comes from there.

@zm711
Copy link
Collaborator Author

zm711 commented Sep 28, 2023

I have to run in a minute, (so I'll be back with fuller thoughts later), but I agree. (And to be honest a few of spikeinterface abbreviations have other well known meanings in US English--for example si-> is the abbreviation for sports illustrated a well known magazine). I would never argue for branded namespace, I just argue for namespace in general to reduce potential for clash. And people follow the recommended namespace (I don't know why seaborn is sns, but I definitely use it).

The difficulty is on the reverse inference, all abbreviations makes sense when you see them next to the full word. But if I see sw.some_function in the middle of some script how do I know where that comes from?

I find this to be the most compelling point. I would say look to the top of the script to see the import, but then you could argue that also works for the pure function import. The current problem is the docs don't do either for some functions, so I think that needs to be fixed separate from style. I guess for my I prefer an alias.function and I as the user control the alias whereas you and Sam (but correct me if I'm wrong), prefer the user to control the function important and keep track of not importing clashing functions.

Just to address one more point before I run (but not a fully formed thought yet)

Namespace clashing becomes less of a problem the more specific you are (like using a function), having a more specific module name or reflecting the nesting structure of your library in the alias:

I agree doing the full is better for a script. I at least don't do that because I usually test something in the terminal see that I like it and then once I know my flow save it into a script for later. So I alias so I don't have to type as much, but then when I copy that into a script I save the alias function rather than the full function. I think better practice for a script would be to save full name to be as explicit as possible, but (mostly from laziness), I don't do that. Not to say it is the right thing, but it is just common. People want to do less and get the same results.

But unfortunately with 4 people in this debate (2-2) I don't think a majority is currently possible. Haha.

@h-mayorquin
Copy link
Collaborator

I guess for my I prefer an alias.function and I as the user control the alias whereas you and Sam (but correct me if I'm wrong), prefer the user to control the function important and keep track of not importing clashing functions.

That's how I work but I will be fine with using alias.function and also moving the documentation there. I don't feel very strong about this and I think consistency is more important. I was just pointing out the reasons why I don't like the current candidate for aliases (sw, ss !, sc, sqm etc). I would rather have the aliases be the fulll names of the modules.

We can wait for @DradeAW to break the tie but I also think that the person who volunteers to do the unification should have a double vote : )

@DradeAW
Copy link
Contributor

DradeAW commented Sep 28, 2023

Interesting discussion!
I'll give my two cents:

  1. I always use si / se / scur / spre / spost ..., because I really don't like importing individual functions/classes (in the code you never know where they come from, when you want to use something new you need to scroll to the top to add it). Since I tend to use a lot of functions, I prefer using the abbreviations, but for the purpose of documentation the full names (e.g. si.preprocessing.filter instead of spre.filter) can make sense to be clearer.
  2. I agree that in general, it's better to use keywork arguments, however I find that there is an exception: when the variable name is identical (or near identical) to the keywork argument (e.g recording=recording, sorting=sorting). I find that it makes things harder to read, especially with long variable names (e.g. refractory_period_ms=refractory_period, censor_period_ms=censor_period).

@alejoe91
Copy link
Member

Is it a tiebreaker? :)

@alejoe91
Copy link
Member

Maybe we can get more opinions, like in #1989

@zm711
Copy link
Collaborator Author

zm711 commented Sep 28, 2023

Although I do have a preference, I would support whatever the community wants in end (similarly to @h-mayorquin). So the more votes the merrier in my opinion.

@alejoe91
Copy link
Member

Me too! Again, I vote for module aliases (this is what we also use "internally"). We will need to disambiguate comparison and curation, so maybe we could go with scmp / scur.

@JoeZiminski @TomBugnon @bendichter @yger what do you guys think?

@h-mayorquin
Copy link
Collaborator

@alejoe91 your link is death for me.

@alejoe91
Copy link
Member

@alejoe91 your link is death for me.

Wrong one, fixed

@JoeZiminski
Copy link
Collaborator

JoeZiminski commented Oct 2, 2023

Good discussion! I typically prefer importing the module e.g. import spikeinterface.preprocessing as pre. The main reason is that, if I am unfamiliar with a package, I end up spending a lot of time CTRL+F function names to figure out where they have come from. For short documentation this is not much of a problem, but is more of a problem for longer tutorials or general use. However, there are also some very good points raised to the counter.

As mentioned above there is no consensus between the main packages. (e.g. from scipy import signal, import numpy as np, from sklearn.ensemble import RandomForestClassifier, so that is no help).

  1. module-level imports indicate to new users what package certain functions come from.
  2. it is easier to track what package / module functions came from using package or module-level import.
  3. module level imports use many confusing acronyms, I have found these hard to track in SI and means 1) is reduced because the module is not clear from the acronym anyway.
  4. All three import methods can result in polluted namespaces.

The only solution that solves all of these problems is to use very explicit module import names and fix these somewhere in the documentation. e.g. from import spikeinterface.preprocessing as si_prepro, import spikeinterface.sorting as si_sorting. But, this is quite verbose. Nonetheless I would probably favour this approach as verbosity is the least problematic of all the raised points.

It is also worth considering if this is just for documentation / tutorials, or as general practice in SI itself. If it is just for documentation / tutorials, then I think constraint 1) can be relaxed. I don't think it is so important for new users to worry about what module a function has come from, and a search of the documentation will indicate this. For documentation only, I would consider the import spikeinterface as si; si.bandpass_filter() pattern the best. But, for day-to-day use this is incredibly annoying, because programmers have to keep searching the docs to find the module a function is from before inspecting it. And, it is probably not good to have an import method in the docs that no one actually uses in the package.

So, overall I think fixing a set of explicit module names used for imports is the best of available solutions for all use-cases (documentation, tutorials, day-to-day programming), albeit not perfect.

@h-mayorquin
Copy link
Collaborator

h-mayorquin commented Oct 23, 2023

Related to point 2:

https://discuss.python.org/t/syntactic-sugar-to-encourage-use-of-named-arguments/36217/15

EDIT (And to @DradeAW comments!)

Apparently the verbosity of long keyword arguments annoys people enough that Guido is willing to sponsor a PEP for adding syntactic suggar for this.

I am weird in how little this verbosity annoys me in this regard then and should change my beliefs to be more aware of this. I think is becaus black just collapses everything into:

def(
long_argument_1 = long_argument_1,
long_argument_2 = long_argument_2,
...
long_argumnet_n = long_argumnet_n
)

Which does not bother me. But I guess in non-blacked code bases it is harder to look for the argument as they look like

def(long_argumnet_1 = long_argument_1, long_argumnet_2=long_argumnet_2, ..., long_argumnet_n=long_argumnet_n)

And then I guess is more difficult to get comas or typos.

@zm711
Copy link
Collaborator Author

zm711 commented Oct 23, 2023

I agree the verbosity doesn't annoy me. I think black helps a lot with that.

As far as point 1, which has been left unresolved (although I think submodule imports are slightly in the lead :) ), maybe going forward I will just make sure there is at least some import at the top of a documentation page.

I think the worst thing would be to have documents where people have no clue where a function is coming from, so at least adding explicit import statements in the docs is something I think we can all agree on?

@TomBugnon
Copy link
Contributor

FWIW for 1- I vote for module-level aliases as well, and I don't think it's a big deal if they are not so explicit (ie scmp instead of si_comp)

For 2- I think lengthy & explicit is good as well

@h-mayorquin
Copy link
Collaborator

I think that everyeon agreed with the full keywords and with @TomBugnon we also have a vote for the aliases now, right?

@zm711
Copy link
Collaborator Author

zm711 commented Nov 2, 2023

Yeah it looks like aliases have won based on pure democracy. @samuelgarcia and @alejoe91 (but mostly Sam), any final comments or can I officially put that on my docs todo list?

@samuelgarcia
Copy link
Member

Here we have a weird style of democracy. the voted are ponderated by the age.
I have lot of inluence so.

I am really not sure that unify the way we import is a good stuff.
As there are 3 ways to import we should put the 3 ways in the doc. And not the worst one all over the place.
The end user will be happy to choose the good one (with few aliases).

@TomBugnon
Copy link
Contributor

omg what have I done

@samuelgarcia
Copy link
Member

traitre.

@zm711
Copy link
Collaborator Author

zm711 commented Nov 2, 2023

I think at the beginning of the docs we still say there are three ways so users can always choose to do the Sam right way of the three ways. So the reference wouldn't be completely removed. But for other examples it could be more regularized.

@JoeZiminski
Copy link
Collaborator

JoeZiminski commented Nov 2, 2023

Agree it cannot hurt to let the user know about the different ways of importing, and also to keep this as a distinct section (e.g. 'ways to import functions in spikeinterface') in only one place in the docs. For the rest of the docs, I would keep the imports consistent, so the reader can focus on the specific content that is aimed to be taught in that part section of the docs.

The website diataxis is a great resource that tries to take a systematic approach to documentation writing. In their tutorials page they have a section 'Ignore options and alternatives, which I think is relevant for this dicsussion.

@h-mayorquin
Copy link
Collaborator

I beg of you that we don't start a tutorial by saying:

  • These three conventions for importing that you can use to accomplish the following task.

@zm711
Copy link
Collaborator Author

zm711 commented Nov 3, 2023

No, no. There will be a whole tutorial that explains how to import:

  • Today we will learn three different ways to import the library in order to accomplish importing the library. :P

@samuelgarcia
Copy link
Member

@zm711
Copy link
Collaborator Author

zm711 commented Nov 3, 2023

Something like this : https://spikeinterface.readthedocs.io/en/latest/modules/core.html#import-rules

Although that is missing the from module import function approach that some prefer.

@JoeZiminski
Copy link
Collaborator

This is looking very nice! I would still advocate for more explicit import names recommended in the documentation. At the moment the list is quite long (as it must be) but it means there are lots of random abbreviations there that I think are:

a) confusing / intimidating to new users
b) confusing to read in someone elses code

If I am reading someone's code and they have followed the docs, I'm going to have no idea what is going on with all of these module names and will keep having to refer back to the documentation to decode. I would suggest recommending import names such as si_extractors, si_preprocessing etc. Some can be shorted si_quality vs. si_qualitymetrics`. I think I am in the minority here but believe this would better reflect the Python idiom explicit is better than implicit.

@JoeZiminski
Copy link
Collaborator

JoeZiminski commented May 2, 2024

I just reread the thread and realised how I forgot this has already been discussed to death and resolved in a PR. Please ignore my last comment 😅. Maybe this can be closed with the merging of #2168?

@zm711
Copy link
Collaborator Author

zm711 commented May 10, 2024

I agree @JoeZiminski , we can close this now and move to your new issue #2800 for further docs work :)

@zm711 zm711 closed this as completed May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

7 participants