Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using ruff rule to enforce the existence of docstrings in public methods #1063

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

h-mayorquin
Copy link
Collaborator

Should come after #1062

This was the most annoying one.

@h-mayorquin h-mayorquin changed the title Using ruff rule to enforce the existence of docsrings in public functions Using ruff rule to enforce the existence of docsrings in public methods Sep 7, 2024
@h-mayorquin h-mayorquin self-assigned this Sep 7, 2024
Base automatically changed from more_ruff_rule to main September 10, 2024 04:49
@h-mayorquin h-mayorquin changed the title Using ruff rule to enforce the existence of docsrings in public methods Using ruff rule to enforce the existence of docstrings in public methods Sep 10, 2024
@h-mayorquin h-mayorquin marked this pull request as ready for review September 10, 2024 20:17
Copy link
Member

@pauladkisson pauladkisson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @h-mayorquin for working through all of this!

One point of discussion is should we include Returns sections for standard methods like get_metadata, get_metadata_schema, etc.? Or should we just have them be 1-liners.


Warnings
--------
Ensure that the `.session.mat` file is correctly located in the expected session path, or the method will not generate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is worthwhile to explicitly specify the expected path: self.session_path / f"{self.session_id}.session.mat"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, will be up in the next commit.

return self.imaging_extractor.frame_to_time(frames=np.arange(stop=self.imaging_extractor.get_num_frames()))

def set_aligned_timestamps(self, aligned_timestamps: np.ndarray):
"""Replace all timestamps for this interface with those aligned to the common session start time."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a Parameters section for aligned_timestamps

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to noqa, this docstring is identical and should be inherited from the base class.

return self.segmentation_extractor.frame_to_time(
frames=np.arange(stop=self.segmentation_extractor.get_num_frames())
)

def set_aligned_timestamps(self, aligned_timestamps: np.ndarray):
"""set the aligned timestamps for the segmentation extractor."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again here, needs Parameters section for aligned_timestamps

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to noqa, this docstring is identical and should be inherited from the base class:

@abstractmethod
def set_aligned_timestamps(self, aligned_timestamps: np.ndarray) -> None:
"""
Replace all timestamps for this interface with those aligned to the common session start time.
Must be in units seconds relative to the common 'session_start_time'.
Parameters
----------
aligned_timestamps : numpy.ndarray
The synchronized timestamps for data in this interface.
"""
raise NotImplementedError(
"The protocol for synchronizing the timestamps of this interface has not been specified!"
)

by default False.
stub_frames : int, optional
The number of frames to include in the subset if `stub_test` is True, by default 100.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove empty line

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -28,6 +29,7 @@ def get_streams(
folder_path: DirectoryPath,
plane_separation_type: Literal["contiguous", "disjoint"] = None,
) -> dict:
"""get streams for the Bruker TIFF imaging data."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs Parameters section for folder_path and plane_separation_type

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -87,6 +88,7 @@ class ScanImageLegacyImagingInterface(BaseImagingExtractorInterface):

@classmethod
def get_source_schema(cls) -> dict:
""" " "Get the source schema for the ScanImage legacy imaging interface."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a typo with the extra "s

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Whether to include the centroids of regions of interest (ROIs) in the data, by default True.
include_roi_acceptance : bool, optional
Whether to include acceptance status of ROIs, by default True.
mask_type : str, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mask_type should include the full info from basesegmentationinterface:

mask_type : str, default: 'image'
            There are three types of ROI masks in NWB, 'image', 'pixel', and 'voxel'.

            * 'image' masks have the same shape as the reference images the segmentation was applied to, and weight each pixel
              by its contribution to the ROI (typically boolean, with 0 meaning 'not in the ROI').
            * 'pixel' masks are instead indexed by ROI, with the data at each index being the shape of the image by the number
              of pixels in each ROI.
            * 'voxel' masks are instead indexed by ROI, with the data at each index being the shape of the volume by the number
              of voxels in each ROI.

            Specify your choice between these two as mask_type='image', 'pixel', 'voxel', or None.
            If None, the mask information is not written to the NWB file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, should be done in the next commit.

"""
Get metadata for the time intervals.

Returns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This get_metadata fn has a Returns section, but all the others do not.

We should probably pick one style and stick with it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also changing the noqa, would be easier to unify at the base class and then propagate speicifities if needed.

conversion_options = conversion_options or dict()
for interface_name, data_interface in self.data_interface_objects.items():
data_interface.add_to_nwbfile(
nwbfile=nwbfile, metadata=metadata, **conversion_options.get(interface_name, dict())
)

return nwbfile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why you added a return value to add_to_nwbfile (it may need one, idk), but it definitely should not be in this PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -50,6 +50,8 @@ def estimate_default_chunk_shape(chunk_mb: float, maxshape: tuple[int, ...], dty
def estimate_default_buffer_shape(
buffer_gb: float, chunk_shape: tuple[int, ...], maxshape: tuple[int, ...], dtype: np.dtype
) -> tuple[int, ...]:
""" "Add docstring to this"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets do it then

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x D

Yep!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, will noqa this as well. I don't have the expertise to write the docstring of this and I don't want to clash with the version upstream.

@h-mayorquin
Copy link
Collaborator Author

ne point of discussion is should we include Returns sections for standard methods like get_metadata, get_metadata_schema, etc.? Or should we just have them be 1-liners.

I am fine with adding it, but this is just a minimal effort to get the ruff passing and then we can improve piecewise and unify.

@bendichter
Copy link
Contributor

When a method in a child class is overriding a parent class and does not have a docstring, it takes on the docstring of the parent. I think this works fine and in many cases makes it unnecessary to create a new docstring. However, it looks like this ruff inspector does not respect this and requires docstrings to be defined even on child classes where the parent has an appropriate docstring defined. Is that correct? If that's the case, I think we should be able to write a custom check that will work better for us.

@bendichter
Copy link
Contributor

e.g.

import importlib
import inspect
import pkgutil

def check_package_docstrings(package_name):
    """
    Check if all public methods of public classes in public modules of a package have docstrings.

    Args:
    package_name (str): The name of the package to check.

    Returns:
    dict: A dictionary containing information about missing docstrings.
    """
    missing_docstrings = {}
    package = importlib.import_module(package_name)
    
    for _, module_name, _ in pkgutil.walk_packages(package.__path__, package.__name__ + '.'):
        try:
            module = importlib.import_module(module_name)
            
            for name, obj in inspect.getmembers(module):
                if inspect.isclass(obj) and not name.startswith('_'):
                    class_missing = check_class_docstrings(obj)
                    if class_missing:
                        missing_docstrings[f"{module_name}.{name}"] = class_missing
        
        except ImportError as e:
            print(f"Error importing module {module_name}: {e}")
    
    return missing_docstrings

def check_class_docstrings(cls):
    """
    Check if all public methods of a class have docstrings.

    Args:
    cls (type): The class to check.

    Returns:
    list: A list of method names without docstrings.
    """
    missing = []
    for name, method in inspect.getmembers(cls, inspect.isfunction):
        if not name.startswith('_') and not method.__doc__:
            missing.append(name)
    return missing

# Example usage
if __name__ == "__main__":
    package_name = "your_package_name"
    result = check_package_docstrings(package_name)
    
    if result:
        print("Missing docstrings found:")
        for class_name, methods in result.items():
            print(f"  {class_name}: {', '.join(methods)}")
    else:
        print("All public methods have docstrings.")

@h-mayorquin
Copy link
Collaborator Author

h-mayorquin commented Sep 26, 2024

When a method in a child class is overriding a parent class and does not have a docstring, it takes on the docstring of the parent. I think this works fine and in many cases makes it unnecessary to create a new docstring. However, it looks like this ruff inspector does not respect this and requires docstrings to be defined even on child classes where the parent has an appropriate docstring defined. Is that correct? If that's the case, I think we should be able to write a custom check that will work better for us.

Yeah, that's a downside. The approach that I am taking here is just annotating it like this so it gets ignored:

image

This is good because it forces one to be explicit about it (you add it when you know the docstrings should be inherited) and is less complex than implementing something on our own. Although, @pauladkisson already implemented an action that I think accomplishes this if you prefer to avoid having to add #noqa annotations

Copy link

codecov bot commented Sep 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.58%. Comparing base (36464df) to head (716d400).
Report is 16 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1063      +/-   ##
==========================================
+ Coverage   90.44%   90.58%   +0.13%     
==========================================
  Files         129      129              
  Lines        8055     8164     +109     
==========================================
+ Hits         7285     7395     +110     
+ Misses        770      769       -1     
Flag Coverage Δ
unittests 90.58% <100.00%> (+0.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/neuroconv/basedatainterface.py 95.34% <ø> (+0.82%) ⬆️
src/neuroconv/baseextractorinterface.py 100.00% <100.00%> (ø)
...nv/datainterfaces/behavior/audio/audiointerface.py 87.95% <100.00%> (ø)
...ces/behavior/deeplabcut/deeplabcutdatainterface.py 93.18% <100.00%> (ø)
...nterfaces/behavior/fictrac/fictracdatainterface.py 91.78% <100.00%> (ø)
...s/behavior/lightningpose/lightningposeconverter.py 96.82% <100.00%> (ø)
...havior/lightningpose/lightningposedatainterface.py 95.32% <100.00%> (ø)
...atainterfaces/behavior/medpc/medpcdatainterface.py 94.31% <100.00%> (ø)
...faces/behavior/miniscope/miniscopedatainterface.py 100.00% <100.00%> (ø)
...aces/behavior/neuralynx/neuralynx_nvt_interface.py 98.36% <100.00%> (ø)
... and 50 more

... and 1 file with indirect coverage changes

@bendichter
Copy link
Contributor

I tend to side more on not adding chores for devs. They tend to pile up in a way that creates a lot of busy work and makes it difficult to onboard people and is unwelcoming to new contributors. In this particular case I suppose I could go either way

pyproject.toml Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants