Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use numpy datetime64 instead of datetime.date #77 #84

Merged
merged 17 commits into from
Nov 7, 2024

Conversation

rlskoeser
Copy link
Member

@rlskoeser rlskoeser commented Jun 6, 2024

ref #77

  • new local date object that lets us interact with numpy.datetime64 like a python datetime.date

previous notes

I implemented this as spike to see how hard it would be to use numpy's datetime64 instead of python's builtin datetime.date. Using np.datetime64 gets us way past the year limitation of datetime.date ([2.5e16 BC, 2.5e16 AD] for day-precision dates).

We can make this an optional dependency so you would only install numpy if you need the expanded year range. Making it optional makes the code and the testing more complicated. Do you think numpy as a dependency would be a concern for folks doing e.g web-based projects?

(type check is failing because I haven't bothered updating the types yet)

Summary by CodeRabbit

  • New Features

    • Added a DEVELOPER_NOTES.md file providing comprehensive contribution guidelines and setup instructions.
    • Introduced Timedelta and Date classes for enhanced date and time handling.
    • Expanded Python version support in testing workflows.
  • Documentation

    • Updated DEVELOPER_NOTES.md with detailed setup and testing instructions.
    • Added a new .pythonversion file for specifying the Python version.
  • Chores

    • Transitioned the project configuration to use hatchling as the build system and updated dependency management.

@rlskoeser rlskoeser force-pushed the experiment/numpy-datetime64 branch from 81299bb to 3306bde Compare June 6, 2024 21:09
@rlskoeser rlskoeser marked this pull request as draft June 6, 2024 21:17
@rlskoeser rlskoeser changed the title Experiment: use numpy datetime64 instead of datetime.date Experiment: use numpy datetime64 instead of datetime.date #77 Jun 7, 2024
@rlskoeser rlskoeser marked this pull request as ready for review August 16, 2024 21:56
@rlskoeser rlskoeser changed the title Experiment: use numpy datetime64 instead of datetime.date #77 Use numpy datetime64 instead of datetime.date #77 Aug 19, 2024
@rlskoeser
Copy link
Member Author

@coderabbitai full review

Copy link
Contributor

coderabbitai bot commented Aug 30, 2024

Actions performed

Full review triggered.

Copy link
Contributor

coderabbitai bot commented Aug 30, 2024

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The changes encompass updates to several configuration files, workflow definitions, and documentation files for the undate project. Key modifications include updates to the unit test workflow to support new Python versions, the introduction of a .pythonversion file for version management, enhancements to the DEVELOPER_NOTES.md for contributor guidance, and a comprehensive restructuring of the pyproject.toml file to include project metadata and dependencies. Additionally, new classes for date handling have been introduced in the codebase.

Changes

File(s) Change Summary
.github/workflows/unit_tests.yml Updated Python version to 3.12, expanded job matrix to include 3.13, modified cache path, and changed testing commands from tox to pytest.
.pythonversion New file created specifying Python version as Python 3.12.7.
DEVELOPER_NOTES.md Added detailed instructions for setup, git flow, virtual environment, testing, and documentation building.
pyproject.toml Restructured to use hatchling, added project metadata, dependencies, optional dependencies, and URLs.
src/undate/date.py Introduced Timedelta and Date classes for enhanced date handling, along with a DatePrecision enum.
src/undate/dateformat/base.py Replaced lru_cache with cache for the import_formatters method in BaseDateFormat.

Possibly related PRs

Suggested reviewers

  • ColeDCrawford

🐰 In the garden of code, we hop and play,
With new contributors joining the fray.
Workflows updated, and docs refined,
A brighter future for all we find!
So let us celebrate, with joy and cheer,
For every change brings us closer, my dear! 🌼


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Outside diff range, codebase verification and nitpick comments (2)
setup.cfg (1)

44-44: Approved: Addition of NumPy as a Dependency

The inclusion of numpy in the install_requires section is crucial for the project's functionality, especially given the shift to using numpy.datetime64. However, consider the implications of this addition on installation size and dependency management, particularly for users in constrained environments.

Consider making numpy an optional dependency if feasible, to accommodate users who may not require the extended functionality.

src/undate/undate.py (1)

Line range hint 287-316: Review the duration method implementation.

The duration method has been significantly changed to accommodate the new Date class and potentially return numpy.timedelta64. Ensure that the logic correctly calculates the duration based on the date precision and handles all edge cases, especially with unknown or partially known dates.

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between fe7c795 and 329fa3d.

Files selected for processing (7)
  • pyproject.toml (1 hunks)
  • setup.cfg (1 hunks)
  • src/undate/date.py (1 hunks)
  • src/undate/dateformat/iso8601.py (1 hunks)
  • src/undate/undate.py (9 hunks)
  • tests/test_date.py (1 hunks)
  • tests/test_undate.py (3 hunks)
Additional context used
Ruff
tests/test_date.py

1-1: numpy imported but unused

Remove unused import: numpy

(F401)


3-3: undate.date.ONE_DAY imported but unused

Remove unused import

(F401)


3-3: undate.date.ONE_YEAR imported but unused

Remove unused import

(F401)


3-3: undate.date.ONE_MONTH_MAX imported but unused

Remove unused import

(F401)


24-24: Redefinition of unused test_init_year_month from line 18

(F811)


42-42: Comparison to None should be cond is None

Replace with cond is None

(E711)

src/undate/date.py

4-4: typing.Dict imported but unused

Remove unused import

(F401)


4-4: typing.Union imported but unused

Remove unused import

(F401)


68-68: Use self.dtype != "datetime64[Y]" instead of not self.dtype == "datetime64[Y]"

Replace with != operator

(SIM201)

src/undate/undate.py

6-6: typing.Any imported but unused

Remove unused import: typing.Any

(F401)


8-8: numpy imported but unused

Remove unused import: numpy

(F401)


9-9: numpy.typing.ArrayLike imported but unused

Remove unused import

(F401)


9-9: numpy.typing.DTypeLike imported but unused

Remove unused import

(F401)

Additional comments not posted (3)
pyproject.toml (1)

8-9: Approved: Enhanced Type Checking for NumPy

The addition of the [tool.mypy] section with the NumPy typing plugin is a significant improvement for the project. It ensures enhanced type safety and better integration with NumPy, which is crucial given the project's new dependency on this library.

src/undate/undate.py (2)

267-267: Static method usage.

The from_datetime_date method is correctly updated to handle the datetime.date input and convert it to an Undate object. This method serves as a bridge between the old datetime.date usage and the new Date class, facilitating easier transition and integration.


128-129: Initialization of Date objects.

The earliest and latest attributes of the Undate class are now initialized using the custom Date class. Ensure that the Date class constructor correctly handles the extreme values and any edge cases that might arise from the broad range of years.

Verification successful

Initialization of Date objects is well-handled.

The Date class uses NumPy's datetime64 for date handling, and test cases cover various initialization scenarios, ensuring robustness against edge cases. The __new__ method in Date handles year, month, and day parameters effectively. No further action is needed.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify the Date class constructor with extreme values.
ast-grep --lang python --pattern $'class Date {
  $$$
  __init__($_, $_, $_) {
    $$$
  }
  $$$  
}'

Length of output: 447


Script:

#!/bin/bash
# Correct the pattern to search for the Date class and its __init__ method in Python.
ast-grep --lang python --pattern $'class Date:\n  def __init__(self, $_, $_, $_):\n    $$$'

Length of output: 90


Script:

#!/bin/bash
# Search for any class definitions related to dates in the codebase.
rg 'class\s+\w+' -A 2

# Search for any imports related to date functionality.
rg 'import\s+\w+' -A 2

Length of output: 11688


Script:

#!/bin/bash
# Extract the __init__ method of the Date class from src/undate/date.py to review its implementation.
ast-grep --lang python --pattern $'class Date(np.ndarray):\n  def __init__(self, $_, $_, $_):\n    $$$'

Length of output: 102


Script:

#!/bin/bash
# Search for method definitions within the Date class in src/undate/date.py.
rg 'def\s+\w+' src/undate/date.py -A 2

# Search for test cases that use the Date class to check for handling of extreme values.
rg 'Date\(' tests/ -A 2

Length of output: 1921

assert d.dtype == "datetime64[M]"
assert str(d) == "2010-05"

def test_init_year_month(self):
Copy link
Contributor

@coderabbitai coderabbitai bot Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct Redefinition of Test Function

The function test_init_year_month is defined twice (lines 18 and 24), which is likely an oversight. This redefinition could cause one of the tests to be skipped, leading to incomplete test coverage.

Consider renaming one of the functions or combining them if they are intended to test different scenarios. Here is a proposed fix to rename the second function to test_init_year_month_day:

-    def test_init_year_month(self):
+    def test_init_year_month_day(self):

Also applies to: 18-18

Tools
Ruff

24-24: Redefinition of unused test_init_year_month from line 18

(F811)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah, I think I found and fixed this in a different branch when I was working on it yesterday

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rlskoeser, thanks for the update! Since you've already fixed this in another branch, no further action is needed here regarding the duplicate test function. I'll proceed with the rest of the review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Comment on lines 1 to 3
import numpy as np

from undate.date import Date, DatePrecision, ONE_DAY, ONE_YEAR, ONE_MONTH_MAX
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove Unused Imports

The static analysis has correctly identified several unused imports:

  • numpy is imported but not used directly in the tests.
  • ONE_DAY, ONE_YEAR, and ONE_MONTH_MAX are imported but not used.

Consider removing these to clean up the code and reduce confusion about dependencies.

Apply this diff to remove the unused imports:

-import numpy as np
-from undate.date import Date, DatePrecision, ONE_DAY, ONE_YEAR, ONE_MONTH_MAX
+from undate.date import Date, DatePrecision
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import numpy as np
from undate.date import Date, DatePrecision, ONE_DAY, ONE_YEAR, ONE_MONTH_MAX
from undate.date import Date, DatePrecision
Tools
Ruff

1-1: numpy imported but unused

Remove unused import: numpy

(F401)


3-3: undate.date.ONE_DAY imported but unused

Remove unused import

(F401)


3-3: undate.date.ONE_YEAR imported but unused

Remove unused import

(F401)


3-3: undate.date.ONE_MONTH_MAX imported but unused

Remove unused import

(F401)


def test_properties_day(self):
assert Date(2001).day is None
assert Date(2010, 5).day == None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use is for None Comparisons

The comparison to None using == is not recommended. Using is is the correct practice in Python for clarity and correctness.

Apply this diff to correct the comparison:

-        assert Date(2010, 5).day == None
+        assert Date(2010, 5).day is None
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
assert Date(2010, 5).day == None
assert Date(2010, 5).day is None
Tools
Ruff

42-42: Comparison to None should be cond is None

Replace with cond is None

(E711)

Comment on lines 1 to 4
from enum import IntEnum

# Pre 3.10 requires Union for multiple types, e.g. Union[int, None] instead of int | None
from typing import Optional, Dict, Union
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unused imports.

The imports for Dict and Union from the typing module are not used in the file. Removing these will clean up the code and reduce confusion about dependencies.

Apply this diff to remove the unused imports:

-from typing import Optional, Dict, Union
+from typing import Optional
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from enum import IntEnum
# Pre 3.10 requires Union for multiple types, e.g. Union[int, None] instead of int | None
from typing import Optional, Dict, Union
from enum import IntEnum
# Pre 3.10 requires Union for multiple types, e.g. Union[int, None] instead of int | None
from typing import Optional
Tools
Ruff

4-4: typing.Dict imported but unused

Remove unused import

(F401)


4-4: typing.Union imported but unused

Remove unused import

(F401)

Comment on lines 61 to 75
@property
def year(self):
return int(str(self.astype("datetime64[Y]")))

@property
def month(self):
# if date unit is year, don't return a month (only M/D)
if not self.dtype == "datetime64[Y]":
return int(str(self.astype("datetime64[M]")).split("-")[-1])

@property
def day(self):
# only return a day if date unit is in days
if self.dtype == "datetime64[D]":
return int(str(self.astype("datetime64[D]")).split("-")[-1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refine logical expressions in property methods.

The properties month and day use a less readable form of checking the data type. Replacing not self.dtype == "datetime64[Y]" with self.dtype != "datetime64[Y]" will improve the readability and maintainability of the code.

Apply this diff to refine the logical expression:

-        if not self.dtype == "datetime64[Y]":
+        if self.dtype != "datetime64[Y]":
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@property
def year(self):
return int(str(self.astype("datetime64[Y]")))
@property
def month(self):
# if date unit is year, don't return a month (only M/D)
if not self.dtype == "datetime64[Y]":
return int(str(self.astype("datetime64[M]")).split("-")[-1])
@property
def day(self):
# only return a day if date unit is in days
if self.dtype == "datetime64[D]":
return int(str(self.astype("datetime64[D]")).split("-")[-1])
@property
def year(self):
return int(str(self.astype("datetime64[Y]")))
@property
def month(self):
# if date unit is year, don't return a month (only M/D)
if self.dtype != "datetime64[Y]":
return int(str(self.astype("datetime64[M]")).split("-")[-1])
@property
def day(self):
# only return a day if date unit is in days
if self.dtype == "datetime64[D]":
return int(str(self.astype("datetime64[D]")).split("-")[-1])
Tools
Ruff

68-68: Use self.dtype != "datetime64[Y]" instead of not self.dtype == "datetime64[Y]"

Replace with != operator

(SIM201)

Comment on lines 257 to 263
return all(
[
self.earliest <= other.earliest,
self.latest >= other.latest,
# is precision sufficient for comparing partially known dates?
self.precision > other.precision,
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review precision comparison in __contains__ method.

The comparison of self.precision > other.precision in the __contains__ method assumes a certain hierarchy or ordering of precision values. This logic might not be intuitive or could lead to incorrect results if the precision levels are not strictly comparable in a linear fashion.


# Pre 3.10 requires Union for multiple types, e.g. Union[int, None] instead of int | None
from typing import Optional, Dict, Union
from typing import Optional, Dict, Union, Any
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unused import.

The import for typing.Any is flagged as unused. Removing unused imports helps maintain the cleanliness and readability of the code.

Proposed removal:

-from typing import Optional, Dict, Union, Any
+from typing import Optional, Dict, Union
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from typing import Optional, Dict, Union, Any
from typing import Optional, Dict, Union
Tools
Ruff

6-6: typing.Any imported but unused

Remove unused import: typing.Any

(F401)

Comment on lines 8 to 9
import numpy as np
from numpy.typing import ArrayLike, DTypeLike
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unused imports.

The imports for numpy, ArrayLike, and DTypeLike are flagged as unused by static analysis tools. If these are not used in the current implementation, they should be removed to clean up the code.

Proposed removal:

-import numpy as np
-from numpy.typing import ArrayLike, DTypeLike
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import numpy as np
from numpy.typing import ArrayLike, DTypeLike
Tools
Ruff

8-8: numpy imported but unused

Remove unused import: numpy

(F401)


9-9: numpy.typing.ArrayLike imported but unused

Remove unused import

(F401)


9-9: numpy.typing.DTypeLike imported but unused

Remove unused import

(F401)

Comment on lines 64 to 68
elif date_portion == "month":
date_parts.append("%02d" % undate.earliest.month)
elif date_portion == "day":
date_parts.append("%02d" % undate.earliest.day)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enhanced date portion handling in to_string method.

The modifications to explicitly handle "month" and "day" portions improve the method's ability to format dates accurately. The use of string formatting ensures that these portions are always represented with two digits, which enhances consistency.

However, the removal of the previously commented-out code (lines 69-74) reduces the method's flexibility to handle other date portions dynamically. If flexibility in handling various date formats is a requirement, consider reintroducing or refactoring this part of the code.

Comment on lines 289 to 303
# assert isinstance(day_duration, timedelta)
assert day_duration.astype("int") == 1

january_duration = Undate(2022, 1).duration()
assert january_duration.days == 31
assert january_duration.astype("int") == 31
feb_duration = Undate(2022, 2).duration()
assert feb_duration.days == 28
assert feb_duration.astype("int") == 28
# next leap year will be 2024
leapyear_feb_duration = Undate(2024, 2).duration()
assert leapyear_feb_duration.days == 29
assert leapyear_feb_duration.astype("int") == 29

year_duration = Undate(2022).duration()
assert year_duration.days == 365
assert year_duration.astype("int") == 365
leapyear_duration = Undate(2024).duration()
assert leapyear_duration.days == 366
assert leapyear_duration.astype("int") == 366
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated assertions in test_duration method.

The replacement of timedelta assertions with astype("int") reflects a change in the duration method's return type. This update simplifies the assertions and aligns them with the new method implementation.

However, it would be beneficial to add comments or documentation in the Undate class to clarify why the duration method's return type was changed and how it should be used.

Would you like me to help draft the necessary documentation or comments?

@ColeDCrawford
Copy link
Collaborator

ColeDCrawford commented Aug 30, 2024

Observations on coderabbit (will keep updating):

  • It used ruff even though we're using black / flake8. ruff is definitely worth moving too but worth noting that it didn't align with the project's conventions. I think this is going to be one of the negative externalities of gen AI code tools - they will work well on high profile languages / frameworks with lots of existing examples and much less well on those outside of that scope. In this case if you want to use a different formatter or linter that doesn't have support in their agent, then you are SOL.
  • Memaid sequence diagram generation is cool, if not particularly groundbreaking or even useful here.

# adapted from https://stackoverflow.com/a/27129510/9706217

def __new__(cls, year: int, month: Optional[int] = None, day: Optional[int] = None):
if isinstance(year, np.datetime64):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's my non-existing knowledge about typing in python, but shouldn't year be of type int per line 24? Or is datetime64 a subclass of int?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my question is if the type of year should be a different one :)

Copy link
Collaborator

@ColeDCrawford ColeDCrawford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of my comments are in setup.cfg around NumPy - mostly just thinking through the implications of adding it as a dependency. Overall I think I am sold as it expands the capability of the undate package quite a bit. It would be useful to add some benchmarks to the release notes for the next version to detail the previous and new installed sizes for undate, and the previous / new runtimes for some operations to see how numpy impacts performance and payload.

assert d.dtype == "datetime64[M]"
assert str(d) == "2010-05"

def test_init_year_month(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch

Comment on lines 70 to 74
# numpy datetime is stored as 64-bit integer, so min/max
# depends on the time unit; assume days for now
# See https://numpy.org/doc/stable/reference/arrays.datetime.html#datetime-units
max_year = int(2.5e16)
min_year = int(-2.5e16)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that I agree with the suggestion that this be configurable or what the advantages of that would be, if we're initializing a 64bit int I don't think making the min/max smaller would save any memory. If we're committing to NumPy then we should use it consistently.

setup.cfg Outdated
@@ -41,6 +41,7 @@ python_requires = >=3.8
install_requires =
python-dateutil
lark
numpy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we merge #88 first, we'll have to move this to the pyproject.toml dependencies.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to limit this to NumPy v2, which was released in May? That was the first major release since 2006. If we do limit it to v2, we should also drop support for Python < 3.10. It looks like NumPy v2 requires Python >= 3.10.

I'm pretty sure that there are no direct Python dependencies, just some optional dependencies for speeding up operations. pip should provide pre-built wheels.

NumPy looks like it will add 30-80MB to the payload. For most applications that's not an issue. But maybe we should benchmark the current version of undate and the size with numpy? The environments that I'm typically space-constrained in are Lambdas. I think there is still a 50MB (zipped) and 250MB (unzipped) limit for Lambda functions, so this could take up a significant chunk of that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm still leaning towards using NumPy given the need to support dates outside of 9999. The other way that we could go is with struct_time, which is what python-edtf uses. But it's a little clunky and you'd still have to have a shim class (like you add here) or a conversion function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to make NumPy optional seems like a nightmare

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking it would be nice to move to Python 3.10+ anyway, it gets us improved python type hinting and 3.9 is getting old enough at this point that it seems reasonable to drop.

@rlskoeser
Copy link
Member Author

I'll update this and reconcile with the other merged PRs, address the flagged comments about type hints, and then do some testing and documentation of benchmarking as suggested before I merge this.

Copy link

@kypso-io kypso-io bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📋 Review Summary

  • Number of files reviewed: 7
  • Number of comments: 3
  • Number of suggestions: 0

📚 File changes

File path File changes
pyproject.toml Added configuration for mypy to include numpy typing plugin.
setup.cfg Added numpy as a required dependency in the install_requires section.
src/undate/date.py Introduced a custom Date class for enhanced date handling with properties for year, month, and day.
src/undate/dateformat/iso8601.py Refactored the date formatting logic to directly append formatted strings for year, month, and day instead of using strftime.
src/undate/undate.py No summary.
tests/test_date.py Added tests for the new Date and DatePrecision classes.
tests/test_undate.py Removed the DatePrecision class and its associated tests, and updated duration tests to use astype('int') instead of checking for timedelta.

src/undate/date.py Outdated Show resolved Hide resolved
src/undate/date.py Show resolved Hide resolved
src/undate/date.py Show resolved Hide resolved
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Outside diff range and nitpick comments (29)
docs/readme.md (1)

1-4: Excellent use of content inclusion!

The use of the {include} directive to embed the main README content is a great approach to maintain consistency between the project's root documentation and the docs folder. This method ensures that updates to the main README are automatically reflected in the documentation.

A few suggestions for improvement:

  1. Consider adding a comment explaining the purpose of this file for future maintainers.
  2. The :relative-images: parameter is empty. If it's intentional, you might want to add a comment explaining why, or remove it if it's not needed.
  3. It might be helpful to add error handling in case the included file is missing, perhaps using a try-except block or a conditional include if your documentation tool supports it.

Here's a suggested improvement:

+# This file includes the main README.md content in the documentation
+# to ensure consistency and avoid duplication.
 ```{include} ../README.md
 :relative-docs: /
-:relative-images:
+:relative-images: /

This change adds a helpful comment and sets the `:relative-images:` parameter to the root, matching the `:relative-docs:` parameter. If this isn't the intended behavior, please adjust accordingly or remove the parameter if it's not needed.

</blockquote></details>
<details>
<summary>docs/_static/custom.css (1)</summary><blockquote>

`1-7`: **LGTM! Consider a minor improvement for consistency.**

The CSS rule effectively styles the "powered by" links in the Sphinx sidebar, removing text decoration and bottom borders while centering the text. This creates a clean and unobtrusive appearance for these links.



For consistency with common CSS formatting practices, consider adding a space after the colon in each property declaration. Here's a suggested refinement:

```diff
 div.sphinxsidebar .powered_by a {
-	text-decoration: none;
-	border-bottom: none;
-	text-align: center;
+	text-decoration: none;
+	border-bottom: none;
+	text-align: center;
 }

This minor change improves readability and aligns with widely-used CSS formatting conventions.

examples/README.md (1)

1-7: LGTM! Well-structured README with clear content.

The README provides a concise and informative overview of the examples directory. The structure is clear, with a title, description, and two well-defined sections for use cases and notebooks. The content aligns well with the purpose of introducing example data and projects using the undate library.

Consider adding a brief introduction to undate.

To make this README even more helpful, especially for newcomers, consider adding a brief one-sentence description of what undate is at the beginning of the file. This would provide immediate context for readers who might not be familiar with the library.

Here's a suggested addition after line 2:

# undate examples

+ `undate` is a Python library for handling uncertain dates in historical data.

Example data and projects with use cases for uncertain date logic and 
example code notebooks using undate.
docs/index.rst (1)

13-16: LGTM: Improved documentation structure with a minor suggestion.

The changes to the table of contents entries enhance the documentation structure:

  1. Adding CONTRIBUTING and DEVELOPER_NOTES improves guidance for potential contributors.
  2. Replacing "authors" with CONTRIBUTORS aligns with a more collaborative attribution model.
  3. Changing "license" to LICENSE maintains consistency with file naming conventions.

These updates align well with best practices for open-source project documentation.

Consider adding a brief description next to each new entry for better clarity, like this:

   CONTRIBUTING
   DEVELOPER_NOTES
   CONTRIBUTORS
   LICENSE

   CONTRIBUTING
      Guidelines for contributing to the project
   DEVELOPER_NOTES
      Additional information for developers
   CONTRIBUTORS
      List of project contributors
   LICENSE
      Project license information

This would provide users with a quick overview of each document's content without needing to navigate to it.

.github/workflows/unit_tests.yml (1)

33-33: Approve the update to pyproject.toml and suggest a minor improvement.

The change from setup.cfg to pyproject.toml for caching dependencies is a good modernization step, aligning with current Python packaging standards. This change is consistent with the project's transition to pyproject.toml for configuration.

Consider adding poetry.lock to the cache dependency path if you're using Poetry for dependency management:

cache-dependency-path: |
  **/pyproject.toml
  **/poetry.lock

This ensures that the cache is invalidated when either file changes, providing more precise dependency caching.

.all-contributorsrc (1)

3-3: Approved: Good practice to use a separate file for contributors

Moving the contributors list to a separate CONTRIBUTORS.md file is a good practice. It keeps the README focused on essential project information while still maintaining a dedicated space for acknowledging contributors.

Consider updating the project documentation to reflect this change, ensuring that any references to the contributors list point to the new CONTRIBUTORS.md file.

CONTRIBUTING.md (2)

11-13: Good updates to contribution guidelines.

The changes to use the develop branch for pull requests and emphasize including documentation and tests are excellent improvements to the contribution process. These practices will help maintain code quality and streamline the review process.

Consider adding a comma after "barrier" in the sentence "Ideally contributions should include documentation and tests for the proposed changes, but if that is a barrier please let us know when you submit a pull request." This small change would improve readability:

-Ideally contributions should include documentation and tests for the proposed changes, but if that is a barrier please let us know when you submit a pull request.
+Ideally contributions should include documentation and tests for the proposed changes, but if that is a barrier, please let us know when you submit a pull request.
🧰 Tools
🪛 LanguageTool

[uncategorized] ~11-~11: Possible missing comma found.
Context: ...evelop** branch when you are ready. For details see [this description of the forking Wo...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~12-~12: A comma might be missing here.
Context: ...comparing-workflows/forking-workflow). Ideally contributions should include documentat...

(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)


[typographical] ~13-~13: It seems that a comma is missing.
Context: ...oposed changes, but if that is a barrier please let us know when you submit a pull requ...

(IF_PLEASE_COMMA)


20-23: Excellent addition of Use Cases and Example Data section.

The new section encouraging contributors to submit use cases and example data is a valuable addition. This will help in understanding real-world applications of the project and potentially guide future development.

Consider adding a brief explanation of how these examples and use cases will be utilized. This could encourage more contributions. For example:

 Example data can be added to the [examples/](https://github.com/dh-tech/undate-python/tree/main/examples/) folder by a pull request. 
+These examples will be used to improve the project's functionality and documentation, ensuring undate meets real-world needs.
examples/use-cases/shakespeare-and-company-project/README.md (3)

5-9: Consider enhancing the project link visibility.

The overview and date format explanation are well-written and informative. To improve user experience, consider making the project link more prominent, perhaps by placing it on a separate line or using bold formatting.

Suggested change:

-The [Shakespeare and Company Project](https://shakespeareandco.princeton.edu/) publishes an events dataset that documents lending library activities such as subscribing, renewing, and borrowing and returning books. Some of the dates for those events are only partially known.
+# [Shakespeare and Company Project](https://shakespeareandco.princeton.edu/)
+
+This project publishes an events dataset that documents lending library activities such as subscribing, renewing, and borrowing and returning books. Some of the dates for those events are only partially known.

9-11: Clarify the explanation of the special case for borrow events.

The explanation of how event dates are used is informative. However, the description of the special case for borrow events with unknown years could be clearer.

Consider rephrasing the special case explanation for better clarity. For example:

-One special case is borrow events with unknown years: the Project assumes that the return was the same or successive year, so even though the years are unknown the duration can be calculated.
+One special case is borrow events with unknown years: the Project assumes that the return occurred in either the same year or the following year. This assumption allows for duration calculation even when the specific years are unknown.
🧰 Tools
🪛 LanguageTool

[misspelling] ~11-~11: This word is normally spelled with a hyphen.
Context: ... for subscriptions and borrow events in human readable format and number of days. The [Shakes...

(EN_COMPOUNDS_HUMAN_READABLE)


13-15: Address language issues and improve readability.

This section provides valuable context about the project's technical aspects and research applications. However, there are some minor language issues that could be addressed to improve readability.

Consider the following changes:

  1. Add hyphens to compound adjectives:
-The dataset includes durations for subscriptions and borrow events in human readable format and number of days.
+The dataset includes durations for subscriptions and borrow events in human-readable format and number of days.
  1. Improve sentence structure and clarity:
-Data analysis tasks that take these kinds of dates into account currently require either ignoring partially known dates or coercing them to a specific date, e.g. the first day of the month or year.
+Data analysis tasks that consider these types of dates currently require either ignoring partially known dates or coercing them to a specific date (e.g., the first day of the month or year).
  1. Use more concise language:
-Our hope is that in the future, `undate` will make it possible to work with those partial dates instead of ignoring them.
+In the future, we anticipate that `undate` will enable working with partial dates rather than ignoring them.
🧰 Tools
🪛 LanguageTool

[misspelling] ~13-~13: This word is normally spelled with a hyphen.
Context: ...rowing events and subscriptions in both human readable and machine readable form (text and cou...

(EN_COMPOUNDS_HUMAN_READABLE)


[misspelling] ~13-~13: This word is normally spelled with a hyphen.
Context: ...ubscriptions in both human readable and machine readable form (text and count by days). Data a...

(EN_COMPOUNDS_MACHINE_READABLE)


[style] ~15-~15: This phrase might be unnecessarily wordy.
Context: ...koeser/shxco-missingdata-specreading)). Our hope is that in the future, undate will make ...

(MY_HOPE_IS)

pyproject.toml (3)

1-3: Approved: Good choice of build system.

The transition to hatchling as the build system is a positive change. Hatchling is a modern, flexible build system for Python projects.

Consider adding a version constraint to the hatchling requirement:

-requires = ["hatchling"]
+requires = ["hatchling>=1.0.0"]

This ensures compatibility with future versions of hatchling.


82-87: Approved: Appropriate sdist build configuration.

The inclusion of Python files, .lark files, and tests in the source distribution is appropriate.

Consider adding a exclude section to explicitly exclude any files or directories that shouldn't be included in the distribution, such as:

exclude = [
    "tests/**/__pycache__",
    "**/*.pyc",
    "**/*.pyo",
    "**/*.pyd",
]

This helps ensure that no unnecessary files are included in the distribution.


89-98: Approved: Good test and type checking configurations.

The pytest and mypy configurations are well-set for the project's needs. The custom markers for pytest and the numpy typing plugin for mypy are good practices.

Consider adding a testpaths option to the pytest configuration to explicitly specify where to look for tests:

[tool.pytest.ini_options]
testpaths = ["tests"]

This can help pytest find your tests more efficiently, especially in larger projects.

docs/conf.py (1)

35-39: LGTM: Sphinx extension updated to MyST Parser.

The change from "m2r2" to "myst_parser" is a good improvement, as MyST Parser is a more modern and feature-rich Markdown parser for Sphinx.

Consider adding the "sphinx.ext.napoleon" extension to support Google-style docstrings, which can improve code documentation readability. You can add it like this:

 extensions = [
     "sphinx.ext.autodoc",
     "sphinx.ext.intersphinx",
     "myst_parser",
+    "sphinx.ext.napoleon",
 ]
DEVELOPER_NOTES.md (6)

1-12: LGTM! Clear and concise introduction and setup instructions.

The introduction and setup instructions are well-structured and easy to follow. The use of markdown formatting enhances readability.

Consider changing "checkout" to "check out" in the subheading on line 7 for grammatical correctness:

-### Use git to checkout a copy of the repository
+### Use git to check out a copy of the repository
🧰 Tools
🪛 LanguageTool

[grammar] ~7-~7: The word “checkout” is a noun. The verb is spelled with a white space.
Context: ... you started. ## Setup ### Use git to checkout a copy of the repository ```sh git clon...

(NOUN_VERB_CONFUSION)


13-31: LGTM! Comprehensive git-flow instructions with a manual alternative.

The git-flow instructions are well-explained and cover all necessary steps. The inclusion of a manual alternative is thoughtful and accommodating.

Consider adding a brief explanation of why git-flow is recommended for this project. This could help developers understand the benefits and encourage adoption.


48-55: LGTM! Clear instructions for setting up pre-commit hooks.

The instructions for installing pre-commit hooks are concise and easy to follow. The explanation of why pre-commit is used is helpful for new contributors.

Consider adding a link to the pre-commit documentation (https://pre-commit.com/) for developers who want to learn more about the tool.

🧰 Tools
🪛 LanguageTool

[style] ~48-~48: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... sh pip install -e ".[dev]" ### Install pre-commit hooks ```sh pre-commit inst...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


56-76: LGTM! Comprehensive instructions for tests, type checking, and documentation.

The instructions for running unit tests, checking Python types, and building documentation are clear and detailed. The inclusion of specific commands and examples is very helpful.

Consider adding a brief explanation of why type checking is important and how it benefits the project. This could help developers understand the value of maintaining proper type annotations.


77-89: LGTM! Clear instructions for building documentation.

The instructions for building documentation with Sphinx are clear and include all necessary steps. The indication of where to find the generated documentation is helpful.

To address the Markdownlint issue, consider specifying the language for the fenced code blocks. For example:

-```
+```sh
pip install -e ".[docs]"

- +sh
sphinx-build docs docs/_build


1-89: Excellent developer notes! A few minor suggestions for improvement.

The DEVELOPER_NOTES.md file provides comprehensive and well-structured instructions for contributing to the undate project. It covers all major aspects of the development process, from initial setup to running tests and building documentation.

Consider the following minor improvements to enhance the document further:

  1. Add a table of contents at the beginning of the file for easier navigation.
  2. Include information about the project's coding style guidelines or link to a separate style guide document.
  3. Mention any specific requirements for commit messages or pull request descriptions.
  4. Add a section on how to report issues or suggest improvements to the project.

These additions would make the document even more comprehensive and helpful for new contributors.

🧰 Tools
🪛 LanguageTool

[grammar] ~7-~7: The word “checkout” is a noun. The verb is spelled with a white space.
Context: ... you started. ## Setup ### Use git to checkout a copy of the repository ```sh git clon...

(NOUN_VERB_CONFUSION)


[style] ~48-~48: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... sh pip install -e ".[dev]" ### Install pre-commit hooks ```sh pre-commit inst...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 Markdownlint

35-35: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

CONTRIBUTORS.md (1)

9-45: LGTM: Well-structured contributors table with a minor suggestion

The contributors table is well-implemented and provides comprehensive information about each contributor, including their avatars, names, GitHub profiles, and contribution types. The use of All Contributors and markdown-specific comments ensures proper rendering and maintenance.

Consider adding a brief legend explaining the contribution type icons (e.g., 💻 for code, 👀 for reviews) to improve clarity for new contributors or visitors unfamiliar with the All Contributors specification.

README.md (6)

Line range hint 1-21: Excellent improvements to the README introduction!

The changes to the title, addition of the logo, and expanded introduction greatly enhance the presentation and clarity of the project. The badges provide valuable information about the project's status, documentation, and code quality.

Consider formatting the warning message for better visibility:

> ⚠️ **WARNING:** This is pre-alpha software and is **NOT** feature complete! Use with caution. ⚠️

This blockquote format will make the warning stand out more prominently.


33-46: Excellent code examples for creating Undate objects!

The code examples effectively demonstrate the various ways to create Undate objects, including fully known dates, partially known dates, and dates with labels. This showcases the flexibility of the Undate class.

Consider adding a brief comment to explain the "19XX" and "1X" syntax for partially known dates:

# 'X' represents an unknown digit in year or month
partially_known_year = Undate("19XX")  # A year in the 1900s
partially_known_month = Undate(2022, "1X")  # A month in 2022 starting with 1 (e.g., 10, 11, or 12)

This will help users understand the syntax for representing partially known dates more quickly.


47-67: Great examples for string conversion and duration calculation!

The examples effectively demonstrate how to convert Undate objects to strings and calculate their durations. The inclusion of various date precisions and special cases like leap years provides a comprehensive overview of the library's capabilities.

Consider adding an example of custom string formatting for dates, if the library supports it. For instance:

# Example of custom string formatting (if supported)
custom_format = "YYYY-MM-DD"
print(november7.format(custom_format))  # Output: 2000-11-07

This would showcase the flexibility of the library in terms of date representation.


70-118: Excellent examples for date comparisons and sorting!

The examples effectively demonstrate how to check if one date falls within another, compare and sort Undate objects, and handle comparisons with Python's built-in datetime.date objects. The inclusion of error handling for ambiguous comparisons is particularly valuable.

Consider adding a brief explanation of how the library determines the earliest and latest possible dates for imprecise dates. For example:

# Explanation of earliest and latest possible dates for imprecise dates
partially_known_year = Undate("19XX")
print(f"Earliest possible date: {partially_known_year.earliest}")  # Output: 1900-01-01
print(f"Latest possible date: {partially_known_year.latest}")      # Output: 1999-12-31

This would provide users with a clearer understanding of how the library handles imprecise dates in comparisons.


120-154: Great examples for UndateInterval and date parsing!

The examples effectively demonstrate how to create and use UndateInterval objects, including open-ended intervals and labeled intervals. The date parsing examples using the ISO8601 formatter showcase the library's flexibility in handling different date string formats.

Consider adding an example of how to check if a date falls within an interval:

# Example of checking if a date falls within an interval
interval = UndateInterval(Undate(1900), Undate(2000))
date_to_check = Undate(1950)
print(date_to_check in interval)  # Output: True

This would demonstrate another useful feature of the UndateInterval class.


28-28: Minor suggestion for improved readability

Consider adding a comma after "information" to improve readability:

-Read [Contributors](CONTRIBUTORS.md) for detailed contribution information.
+Read [Contributors](CONTRIBUTORS.md) for detailed contribution information,

This small change can help separate the two main clauses in the sentence.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~28-~28: Possible missing comma found.
Context: ...ibution information. ## Example Usage Often humanities and cultural data include im...

(AI_HYDRA_LEO_MISSING_COMMA)

examples/notebooks/shxco_partial_date_durations.ipynb (1)

44-44: Approved: Improved data loading with local file and performance optimization.

The changes to the data loading process are beneficial:

  1. Using a relative path to a local copy of the dataset improves portability and reduces dependency on internet access.
  2. Adding low_memory=False helps prevent potential issues with mixed data types in the CSV file.

Consider adding a comment explaining why low_memory=False is used, and possibly include a link to the original dataset for reference. For example:

# Load the local copy of the S&co events dataset (v1.2)
# Original dataset: https://doi.org/10.34770/nz90-ym25
events_df = pd.read_csv("../use-cases/shakespeare-and-company-project/SCoData_events_v1.2_2022-01.csv", 
                        low_memory=False)  # Prevents mixed type inference warnings

Also applies to: 291-291, 299-300

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 329fa3d and 282432e.

⛔ Files ignored due to path filters (6)
  • _static/undate_logo.png is excluded by !**/*.png
  • docs/_static/dhtech-logo.png is excluded by !**/*.png
  • docs/_static/dhtech-logo.svg is excluded by !**/*.svg
  • docs/_static/logo.png is excluded by !**/*.png
  • docs/_static/undate_logo.png is excluded by !**/*.png
  • examples/use-cases/shakespeare-and-company-project/SCoData_events_v1.2_2022-01.csv is excluded by !**/*.csv
📒 Files selected for processing (30)
  • .all-contributorsrc (5 hunks)
  • .github/workflows/check.yml (2 hunks)
  • .github/workflows/unit_tests.yml (1 hunks)
  • .gitignore (1 hunks)
  • .pre-commit-config.yaml (1 hunks)
  • CONTRIBUTING.md (1 hunks)
  • CONTRIBUTORS.md (1 hunks)
  • DEVELOPER_NOTES.md (1 hunks)
  • README.md (2 hunks)
  • docs/CONTRIBUTING.md (1 hunks)
  • docs/CONTRIBUTORS.md (1 hunks)
  • docs/DEVELOPER_NOTES.md (1 hunks)
  • docs/LICENSE.md (1 hunks)
  • docs/_static/custom.css (1 hunks)
  • docs/_templates/sidebar_dhtech.html (1 hunks)
  • docs/authors.rst (0 hunks)
  • docs/conf.py (3 hunks)
  • docs/index.rst (1 hunks)
  • docs/license.rst (0 hunks)
  • docs/readme.md (1 hunks)
  • docs/readme.rst (0 hunks)
  • docs/requirements.txt (0 hunks)
  • docs/undate.rst (1 hunks)
  • examples/README.md (1 hunks)
  • examples/notebooks/shxco_partial_date_durations.ipynb (2 hunks)
  • examples/use-cases/shakespeare-and-company-project/README.md (1 hunks)
  • pyproject.toml (1 hunks)
  • pytest.ini (0 hunks)
  • setup.cfg (0 hunks)
  • setup.py (0 hunks)
💤 Files not reviewed due to no reviewable changes (7)
  • docs/authors.rst
  • docs/license.rst
  • docs/readme.rst
  • docs/requirements.txt
  • pytest.ini
  • setup.cfg
  • setup.py
✅ Files skipped from review due to trivial changes (5)
  • docs/CONTRIBUTING.md
  • docs/CONTRIBUTORS.md
  • docs/DEVELOPER_NOTES.md
  • docs/LICENSE.md
  • docs/undate.rst
🧰 Additional context used
🪛 actionlint
.github/workflows/unit_tests.yml

37-37: property "python-cache" is not defined in object type {}

(expression)

🪛 LanguageTool
CONTRIBUTING.md

[uncategorized] ~5-~5: Possible missing comma found.
Context: ...date Hey there! If you found your way here that probably means you are curious abo...

(AI_HYDRA_LEO_MISSING_COMMA)


[style] ~5-~5: Consider a different adverb to strengthen your wording.
Context: ... to this project. This is great! We are always looking for new contributors. If you ca...

(ALWAYS_CONSTANTLY)


[uncategorized] ~11-~11: Possible missing comma found.
Context: ...evelop** branch when you are ready. For details see [this description of the forking Wo...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~12-~12: A comma might be missing here.
Context: ...comparing-workflows/forking-workflow). Ideally contributions should include documentat...

(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)


[typographical] ~13-~13: It seems that a comma is missing.
Context: ...oposed changes, but if that is a barrier please let us know when you submit a pull requ...

(IF_PLEASE_COMMA)


[uncategorized] ~16-~16: Possible missing preposition found.
Context: ...Requests If you find a bug or can think a feature you would really like to see be...

(AI_EN_LECTOR_MISSING_PREPOSITION)

DEVELOPER_NOTES.md

[grammar] ~7-~7: The word “checkout” is a noun. The verb is spelled with a white space.
Context: ... you started. ## Setup ### Use git to checkout a copy of the repository ```sh git clon...

(NOUN_VERB_CONFUSION)


[style] ~48-~48: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... sh pip install -e ".[dev]" ### Install pre-commit hooks ```sh pre-commit inst...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

README.md

[uncategorized] ~28-~28: Possible missing comma found.
Context: ...ibution information. ## Example Usage Often humanities and cultural data include im...

(AI_HYDRA_LEO_MISSING_COMMA)

examples/use-cases/shakespeare-and-company-project/README.md

[misspelling] ~11-~11: This word is normally spelled with a hyphen.
Context: ... for subscriptions and borrow events in human readable format and number of days. The [Shakes...

(EN_COMPOUNDS_HUMAN_READABLE)


[misspelling] ~13-~13: This word is normally spelled with a hyphen.
Context: ...rowing events and subscriptions in both human readable and machine readable form (text and cou...

(EN_COMPOUNDS_HUMAN_READABLE)


[misspelling] ~13-~13: This word is normally spelled with a hyphen.
Context: ...ubscriptions in both human readable and machine readable form (text and count by days). Data a...

(EN_COMPOUNDS_MACHINE_READABLE)


[style] ~15-~15: This phrase might be unnecessarily wordy.
Context: ...koeser/shxco-missingdata-specreading)). Our hope is that in the future, undate will make ...

(MY_HOPE_IS)

🪛 Markdownlint
CONTRIBUTORS.md

49-49: null
Link fragments should be valid

(MD051, link-fragments)

DEVELOPER_NOTES.md

35-35: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

examples/use-cases/shakespeare-and-company-project/README.md

22-22: null
Bare URL used

(MD034, no-bare-urls)

🔇 Additional comments (40)
docs/_templates/sidebar_dhtech.html (1)

1-8: Overall assessment: Good addition with room for minor improvements

The new HTML snippet for the sidebar is a good addition, providing clear attribution to the DHTech Community. The structure is generally sound, and basic accessibility features are in place. With the suggested improvements to semantic structure and accessibility, this will be an excellent enhancement to the documentation sidebar.

docs/index.rst (2)

1-1: LGTM: Title change improves consistency.

The change from "Undate documentation" to "undate documentation" aligns the title with the lowercase project name mentioned in the description. This improves overall consistency in the documentation.


8-8: Verify the impact of reduced ToC depth.

The maximum depth of the table of contents has been reduced from 3 to 2. While this can simplify the documentation structure, it might also hide important subsections from the main table of contents. Please ensure that this change doesn't negatively impact the discoverability of crucial information.

To assess the impact, you can run the following script to check the structure of the linked documents:

This script will help you visualize the heading structure of the linked documents and determine if important subsections are now hidden due to the reduced ToC depth.

.gitignore (1)

43-44: LGTM! Good addition for Jupyter notebook users.

The addition of ".ipynb_checkpoints/" to the .gitignore file is a good practice. This prevents Jupyter notebook checkpoint files from being tracked by Git, which is beneficial because:

  1. These files are temporary and don't need version control.
  2. It keeps the repository clean and focused on actual project files.
  3. It avoids potential conflicts caused by these auto-generated files.

This change will be helpful for any contributors using Jupyter notebooks with this project.

.pre-commit-config.yaml (2)

15-15: Good addition of trailing-whitespace hook.

The inclusion of the trailing-whitespace hook is a positive change. It helps maintain clean code by removing unnecessary whitespace at the end of lines.


16-20: Excellent addition of mypy for static type checking.

The inclusion of mypy (v1.11.2) is a great improvement for ensuring type safety in the codebase. This aligns well with the project's goals and can help catch potential type-related issues early in the development process.

.github/workflows/check.yml (6)

16-19: Excellent update to Python 3.12!

Upgrading to Python 3.12 is a great move. It allows the project to leverage the latest language features, performance improvements, and security updates.


22-23: Improved package installation step.

The renaming of the step to "Install package with development dependencies" and the updated installation command pip install -e ".[dev]" provide better clarity and ensure that development dependencies are included. This is a positive change for maintaining a consistent development environment.


Line range hint 1-40: Overall, these changes significantly improve the workflow.

The updates to this GitHub Actions workflow file demonstrate a commitment to using up-to-date tools and practices. Key improvements include:

  1. Upgrading to Python 3.12
  2. Shifting to pyproject.toml for project configuration
  3. Clarifying the package installation process
  4. Simplifying the documentation build step
  5. Adding checks for Jupyter notebooks

These changes should lead to a more robust and maintainable CI process. However, please consider the following points:

  1. Clarify the decision to comment out the --fail-on-warning flag in the documentation build step.
  2. Ensure that the project's development documentation is updated to reflect these changes, particularly the use of 'treon' for notebook checking.

Great work on modernizing this workflow!

🧰 Tools
🪛 actionlint

24-24: property "python-cache" is not defined in object type {}

(expression)


38-40: Great addition of Jupyter notebook checks!

Adding a step to check Jupyter notebooks using 'treon' is an excellent practice. This will help ensure that any example notebooks in the project remain functional and up-to-date.

To further improve this addition:

  1. Consider adding a comment explaining the purpose of this check for future maintainers.
  2. It might be helpful to document the 'treon' tool and its usage in the project's development documentation.

Let's verify the presence of Jupyter notebooks in the project:

#!/bin/bash
# Check for the presence of Jupyter notebooks
notebook_count=$(find . -name "*.ipynb" | wc -l)

if [ "$notebook_count" -gt 0 ]; then
    echo "Found $notebook_count Jupyter notebook(s) in the project."
    echo "Notebook locations:"
    find . -name "*.ipynb"
else
    echo "No Jupyter notebooks found in the project."
    echo "Consider removing the treon check if notebooks are not used."
fi

32-32: Updated documentation build process, but consider warning handling.

The switch from tox -e docs to sphinx-build docs docs/_build simplifies the documentation build process by directly using Sphinx. This is generally a good change for clarity and directness.

However, I noticed that the --fail-on-warning flag is commented out.

Could you clarify the reasoning behind commenting out this flag? It's often beneficial to fail the build on documentation warnings to maintain high-quality documentation. If there's a specific reason for this change, it might be worth documenting it in a comment.

Let's check if there are any existing documentation warnings:

#!/bin/bash
# Run sphinx-build with warnings enabled and capture the output
sphinx_output=$(sphinx-build -W -b html docs docs/_build 2>&1)

# Check if there were any warnings
if echo "$sphinx_output" | grep -q "WARNING"; then
    echo "Warnings detected in documentation build:"
    echo "$sphinx_output" | grep "WARNING"
else
    echo "No warnings detected in documentation build."
fi

21-21: Correct update to use pyproject.toml for caching.

The change from 'setup.cfg' to 'pyproject.toml' for the cache dependency path is appropriate and aligns with modern Python packaging standards.

Let's verify that the project has indeed moved to using pyproject.toml:

✅ Verification successful

Let's re-verify the presence of pyproject.toml and the absence of setup.cfg with a more straightforward script:


Cache dependency path correctly updated to pyproject.toml.

The presence of pyproject.toml and the absence of setup.cfg confirm that the project has transitioned to using pyproject.toml for configuration. The cache path update aligns with the current project setup.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify the existence of pyproject.toml and absence of setup.cfg
if [ -f "pyproject.toml" ] && [ ! -f "setup.cfg" ]; then
    echo "Confirmed: pyproject.toml exists and setup.cfg is absent."
else
    echo "Warning: Expected pyproject.toml to exist and setup.cfg to be absent."
    ls -l pyproject.toml setup.cfg 2>/dev/null
fi

Length of output: 442


Script:

#!/bin/bash
# Verify the existence of pyproject.toml and absence of setup.cfg

if [ -f "pyproject.toml" ]; then
    echo "pyproject.toml exists."
else
    echo "pyproject.toml does not exist."
fi

if [ -f "setup.cfg" ]; then
    echo "setup.cfg exists."
else
    echo "setup.cfg does not exist."
fi

Length of output: 166

.github/workflows/unit_tests.yml (3)

41-41: Approve the transition to pytest and suggest additional considerations.

The change from tox to directly using pytest for test execution simplifies the CI process. The explicit coverage command for the undate package is also a good improvement.

Consider the following enhancements:

  1. Add pytest configuration:
    Create a pytest.ini or add a [tool.pytest.ini_options] section in pyproject.toml to centralize pytest configurations.

  2. Consistent pytest options:
    Consider adding --verbose and --capture=no options to both test commands for more detailed output:

    run: pytest --verbose --capture=no
    run: pytest --verbose --capture=no --cov=undate
  3. Coverage report:
    Add a coverage report generation step after the coverage run:

    - name: Generate coverage report
      run: |
        pytest --cov=undate --cov-report=xml
        coverage report
      if: ${{ matrix.python == env.COV_PYTHON_VERSION }}

These suggestions can enhance the visibility and consistency of your test results.

Also applies to: 46-46


Line range hint 1-51: Summary of workflow changes and their impact.

The updates to this workflow file represent a significant modernization of the testing process:

  1. Transition from setup.cfg to pyproject.toml for dependency management.
  2. Focus on test-specific dependencies instead of broader development dependencies.
  3. Direct use of pytest instead of tox for test execution.
  4. Explicit coverage configuration for the undate package.

These changes should result in a more streamlined, faster, and more maintainable CI process. The workflow now aligns better with modern Python practices and provides a more direct approach to testing.

To fully leverage these improvements, consider implementing the suggested enhancements for pytest configuration, consistent options, and coverage reporting. Also, verify the cache step to ensure smooth execution of the workflow.

Overall, these changes are a positive step towards a more efficient and effective testing process for the project.

🧰 Tools
🪛 actionlint

37-37: property "python-cache" is not defined in object type {}

(expression)


37-37: Verify the cache step and its outputs.

The static analysis tool flagged an undefined property "python-cache". This might be a false positive, as it's common to check cache hits in GitHub Actions. However, it's worth verifying that the cache step is correctly defined earlier in the workflow.

Please run the following script to check for the cache step definition:

#!/bin/bash
# Description: Verify the cache step in the workflow

# Test: Check for the cache step definition
rg 'uses: actions/cache@v\d+' -A 5 .github/workflows/unit_tests.yml

Ensure that there's a step using actions/cache with an id of python-cache. If it's missing, consider adding it before the "Install package with dependencies" step:

- uses: actions/cache@v3
  id: python-cache
  with:
    path: ${{ env.pythonLocation }}
    key: ${{ env.pythonLocation }}-${{ hashFiles('**/pyproject.toml') }}

If the cache step is present and correctly defined, you can safely ignore the static analysis hint.

🧰 Tools
🪛 actionlint

37-37: property "python-cache" is not defined in object type {}

(expression)

.all-contributorsrc (5)

17-18: Approved: Updated contributions for ColeDCrawford

The addition of "test" and "ideas" to ColeDCrawford's contributions accurately reflects their expanded role in the project. This update ensures proper recognition of their efforts.


42-44: Approved: Updated contributions for robcast

The addition of "data", "ideas", and "review" to robcast's contributions accurately reflects their expanded role in the project. This update ensures proper recognition of their diverse contributions.


56-57: Approved: Updated contributions for jdamerow

The addition of "eventOrganizing" and "ideas" to jdamerow's contributions accurately reflects their expanded role in the project. This update ensures proper recognition of their involvement in both practical and conceptual aspects of the project.


Line range hint 1-81: Overall: Excellent updates to contributor recognition

The changes made to .all-contributorsrc are well-considered and beneficial to the project. They include:

  1. Moving the contributors list to a dedicated file.
  2. Updating and expanding contributor roles for several team members.
  3. Adding an All Contributors badge for visual representation.

These updates improve the project's ability to recognize and showcase its contributors, which is crucial for open-source projects. Great job on maintaining and enhancing the contributor information!


79-80: Approved: Added All Contributors badge template

The addition of the "badgeTemplate" property is a good enhancement. It will provide a visual representation of the number of contributors in the project's README, which is excellent for showcasing community involvement.

Please verify that the badge appears correctly in the README. You can use the following script to check:

✅ Verification successful

Verified: All Contributors badge is correctly displayed

The All Contributors badge is present in the README.md, correctly showcasing the number of contributors.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the All Contributors badge in the README

# Test: Check if the badge is present in the README
rg --type md "!\[All Contributors\]" README.md

# If the badge is not found in README.md, check CONTRIBUTORS.md
if [ $? -ne 0 ]; then
  echo "Badge not found in README.md. Checking CONTRIBUTORS.md..."
  rg --type md "!\[All Contributors\]" CONTRIBUTORS.md
fi

Length of output: 183

CONTRIBUTING.md (3)

1-1: Title change looks good.

The simplified title "Contributing to undate" is more concise and aligns well with the project name. It's a good improvement that maintains clarity while reducing redundancy.


26-26: Good update to help-seeking guidelines.

The recommendation to join the DHTech Slack for help is a positive change. It encourages community engagement and provides a platform for real-time support, which can be more efficient than GitHub issues for many types of questions or problems.


Line range hint 1-30: Overall, excellent updates to the contribution guidelines.

The changes to this file significantly improve the clarity and effectiveness of the contribution process. Key improvements include:

  1. Simplified title
  2. Updated setup instructions
  3. Clear guidelines for submitting changes
  4. New section for use cases and example data
  5. Improved help-seeking process

These updates should enhance the contributor experience and promote better quality contributions to the project.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~5-~5: Possible missing comma found.
Context: ...date Hey there! If you found your way here that probably means you are curious abo...

(AI_HYDRA_LEO_MISSING_COMMA)


[style] ~5-~5: Consider a different adverb to strengthen your wording.
Context: ... to this project. This is great! We are always looking for new contributors. If you ca...

(ALWAYS_CONSTANTLY)


[uncategorized] ~11-~11: Possible missing comma found.
Context: ...evelop** branch when you are ready. For details see [this description of the forking Wo...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~12-~12: A comma might be missing here.
Context: ...comparing-workflows/forking-workflow). Ideally contributions should include documentat...

(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)


[typographical] ~13-~13: It seems that a comma is missing.
Context: ...oposed changes, but if that is a barrier please let us know when you submit a pull requ...

(IF_PLEASE_COMMA)


[uncategorized] ~16-~16: Possible missing preposition found.
Context: ...Requests If you find a bug or can think a feature you would really like to see be...

(AI_EN_LECTOR_MISSING_PREPOSITION)


[style] ~29-~29: Using many exclamation marks might seem excessive (in this case: 5 exclamation marks for a text that’s 2528 characters long)
Context: ...oining](https://dh-tech.github.io/join/)!

(EN_EXCESSIVE_EXCLAMATION)

examples/use-cases/shakespeare-and-company-project/README.md (2)

1-3: LGTM: Clear title and author information.

The title and author information are well-presented, providing clear context for the document.


1-24: Overall, well-written and informative README with minor suggestions for improvement.

This README provides a comprehensive overview of the Shakespeare and Company Project dataset, its usage, and the challenges associated with partial dates. The content is informative and well-structured. The suggested improvements are minor and focus on enhancing readability, clarifying certain points, and improving Markdown formatting.

Key strengths:

  1. Clear explanation of the dataset and its unique features
  2. Discussion of data analysis challenges and potential solutions
  3. Proper citation information and DOI link

Areas for improvement:

  1. Minor language and formatting enhancements
  2. Clearer explanation of special cases
  3. More prominent presentation of important links and sections

These suggestions will further improve an already solid documentation file.

🧰 Tools
🪛 LanguageTool

[misspelling] ~11-~11: This word is normally spelled with a hyphen.
Context: ... for subscriptions and borrow events in human readable format and number of days. The [Shakes...

(EN_COMPOUNDS_HUMAN_READABLE)


[misspelling] ~13-~13: This word is normally spelled with a hyphen.
Context: ...rowing events and subscriptions in both human readable and machine readable form (text and cou...

(EN_COMPOUNDS_HUMAN_READABLE)


[misspelling] ~13-~13: This word is normally spelled with a hyphen.
Context: ...ubscriptions in both human readable and machine readable form (text and count by days). Data a...

(EN_COMPOUNDS_MACHINE_READABLE)


[style] ~15-~15: This phrase might be unnecessarily wordy.
Context: ...koeser/shxco-missingdata-specreading)). Our hope is that in the future, undate will make ...

(MY_HOPE_IS)

🪛 Markdownlint

22-22: null
Bare URL used

(MD034, no-bare-urls)

pyproject.toml (3)

79-80: Approved: Good version management practice.

Setting the version to be read from src/undate/__init__.py is a good practice for single-sourcing the version number.


43-69: ⚠️ Potential issue

Consider refining optional dependencies.

The categorization of optional dependencies is clear and logical. However, there are a few points to consider:

  1. Including 'undate' as a dev dependency might lead to circular dependencies.
  2. Some packages are missing version constraints, which could lead to compatibility issues in the future.

Consider applying these changes:

  1. Remove 'undate' from the dev dependencies to avoid circular dependencies.
  2. Add version constraints to packages without them.

Example diff:

 dev = [
   "black>=22.10.0",
   "pre-commit>=2.20.0",
-  "twine",
-  "wheel",
-  "build",
-  "mypy",
-  "treon",
-  "undate",
-  "undate[docs]",
-  "undate[test]",  
+  "twine>=4.0.0",
+  "wheel>=0.40.0",
+  "build>=0.10.0",
+  "mypy>=1.0.0",
+  "treon>=0.1.3",
 ]

Let's check if 'undate' is used as a dependency elsewhere in the project:

#!/bin/bash
# Check for 'undate' usage as a dependency
rg "undate" --type python

71-76: ⚠️ Potential issue

Approved: Comprehensive project URLs with a minor issue.

Providing these URLs is beneficial for users and contributors. However, there's a small issue with the changelog URL.

The changelog URL appears to have an incorrect path. Please update it as follows:

-Changelog = "https://github.com/dh-tech/undate/main/master/CHANGELOG.md"
+Changelog = "https://github.com/dh-tech/undate-python/blob/main/CHANGELOG.md"

Let's verify the existence of the CHANGELOG.md file:

#!/bin/bash
# Check for the existence of CHANGELOG.md
fd -t f "CHANGELOG.md"
docs/conf.py (5)

20-21: LGTM: Project information updated correctly.

The project name has been updated to lowercase "undate" for consistency, and the copyright year has been correctly updated to 2024.


49-52: LGTM: Improved file type parsing configuration.

The change from a list to a dictionary for source_suffix is a good improvement. It provides more explicit control over how different file types are parsed, which can prevent ambiguities and improve the documentation build process.


59-59: LGTM: HTML theme updated to Alabaster.

The change from "sphinx_rtd_theme" to "alabaster" is fine. Alabaster is the default theme for Sphinx and is known for its clean, minimal design.

Please verify that the new theme meets your documentation's visual requirements. You may want to build the documentation locally to check its appearance.


66-67: Clarify the logo configuration.

The logo configuration has been commented out, but there's a logo setting in the html_theme_options below. This inconsistency might lead to confusion.

Please clarify the intended logo configuration:

  1. If you want to use the logo, uncomment this line and ensure it's consistent with the html_theme_options.
  2. If you don't want to use the logo, remove this commented line and the logo setting from html_theme_options.

69-76: LGTM: Appropriate Alabaster theme options added.

The new html_theme_options configuration is well-structured and includes useful settings for the Alabaster theme. The GitHub repository information and sidebar settings will enhance the documentation's usability.

DEVELOPER_NOTES.md (2)

32-38: LGTM! Clear instructions for setting up a Python virtual environment.

The instructions for creating a Python virtual environment are concise and easy to follow. The recommendation to use a recent version of Python 3 is appropriate.

🧰 Tools
🪛 Markdownlint

35-35: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


40-46: LGTM! Clear instructions for installing the package with development dependencies.

The instructions for installing the local version of undate with development dependencies are clear and use the correct pip command. The use of an editable install is appropriate for development.

CONTRIBUTORS.md (2)

1-8: LGTM: Clear introduction and well-implemented badge

The introduction and badge section are well-structured and follow the All Contributors specification. The badge provides a visual representation of the number of contributors, which is a great addition to the project.


1-50: Excellent addition: Comprehensive contributor recognition

This CONTRIBUTORS.md file is an excellent addition to the project. It effectively recognizes all contributors and their various types of contributions, which is crucial for fostering a healthy open-source community. The use of the All Contributors specification ensures a standardized and visually appealing way to present this information.

Key strengths:

  1. Clear introduction explaining the purpose of using All Contributors
  2. Visual badge showing the number of contributors
  3. Detailed table with contributor information and contribution types
  4. Section for related blog posts, providing additional context

The minor issue with the link fragment has been addressed in a previous comment. Once that's fixed, this file will be a valuable asset to the project, encouraging recognition and potentially attracting new contributors.

🧰 Tools
🪛 Markdownlint

49-49: null
Link fragments should be valid

(MD051, link-fragments)

README.md (3)

22-25: Good update to the contributors section!

The all-contributors badge and the link to the CONTRIBUTORS.md file improve the visibility of contributors and make it easier for users to find detailed contribution information.


27-31: Great addition of the Example Usage section!

The introduction to the Example Usage section effectively explains the need for handling imprecise temporal information in humanities and cultural data. It provides valuable context for potential users and clearly introduces the Undate object and its capabilities.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~28-~28: Possible missing comma found.
Context: ...ibution information. ## Example Usage Often humanities and cultural data include im...

(AI_HYDRA_LEO_MISSING_COMMA)


157-165: Good organization of documentation and license information!

The documentation section effectively directs users to the ReadTheDocs page and the separate developer notes file. The license section clearly states the project's licensing terms. This organization keeps the README concise while providing easy access to important information for users and potential contributors.

docs/_templates/sidebar_dhtech.html Outdated Show resolved Hide resolved
.pre-commit-config.yaml Outdated Show resolved Hide resolved
.github/workflows/unit_tests.yml Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
docs/conf.py Outdated Show resolved Hide resolved
CONTRIBUTORS.md Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Outside diff range and nitpick comments (6)
src/undate/dateformat/iso8601.py (1)

Line range hint 1-69: Address TODO comments and type checking issues

There are several TODO comments and type checking issues throughout the file that need attention:

  1. Line 26: TODO for handling full ISO format dates with time.
  2. Line 27: TODO for handling invalid formats.
  3. Line 54: TODO for handling too many parts in date parsing.
  4. Line 60: TODO for error handling when year and day are known but month is missing.
  5. Line 76: TODO for fixing a type error with join() method.

Additionally, there are type checking issues mentioned in comments, such as on line 54.

These TODOs and type issues should be addressed to improve the robustness and type safety of the code.

Would you like assistance in resolving any of these TODOs or type checking issues? I can help generate solutions or open GitHub issues to track these tasks.

src/undate/date.py (1)

56-58: Evaluate the necessity of the Export method.

The Export method currently returns self without additional processing. If it does not serve a specific purpose or align with an interface requirement, consider removing it to simplify the class.

src/undate/undate.py (4)

267-269: Add return type annotation to from_datetime_date static method.

Adding a return type annotation enhances type checking and code clarity.

Apply this diff to specify the return type:

 @staticmethod
-def from_datetime_date(dt_date: datetime.date):
+def from_datetime_date(dt_date: datetime.date) -> "Undate":
     """Initialize an :class:`Undate` object from a :class:`datetime.date`"""
     return Undate(dt_date.year, dt_date.month, dt_date.day)

287-287: Specify return type annotation for the duration method in Undate class.

Including the return type improves readability and assists static type checkers.

Apply this diff to add the return type:

-def duration(self):
+def duration(self) -> np.timedelta64:
     """What is the duration of this date?
     Calculate based on earliest and latest date within range,
     taking into account the precision of the date even if not all
     parts of the date are known."""

305-309: Remove commented-out code to improve code cleanliness.

The commented code is unnecessary and can be safely removed to enhance readability.

Apply this diff to remove the commented lines:

     latest = Date(self.earliest.year, self.latest.month, self.latest.day)

-    # latest = datetime.date(
-    #     self.earliest.year, self.latest.month, self.latest.day
-    # )

396-396: Add return type annotation to duration method in UndateInterval class.

Specifying the return type enhances code clarity and type safety.

Apply this diff to include the return type:

-def duration(self):
+def duration(self) -> np.timedelta64:
     """Calculate the duration between two undates.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 282432e and 2a7de40.

📒 Files selected for processing (16)
  • .pre-commit-config.yaml (2 hunks)
  • CONTRIBUTORS.md (1 hunks)
  • README.md (2 hunks)
  • docs/conf.py (3 hunks)
  • pyproject.toml (1 hunks)
  • src/undate/date.py (1 hunks)
  • src/undate/dateformat/base.py (1 hunks)
  • src/undate/dateformat/edtf/parser.py (0 hunks)
  • src/undate/dateformat/edtf/transformer.py (1 hunks)
  • src/undate/dateformat/iso8601.py (2 hunks)
  • src/undate/undate.py (9 hunks)
  • tests/test_date.py (1 hunks)
  • tests/test_dateformat/edtf/test_edtf_parser.py (0 hunks)
  • tests/test_dateformat/edtf/test_edtf_transformer.py (1 hunks)
  • tests/test_dateformat/test_base.py (0 hunks)
  • tests/test_undate.py (3 hunks)
💤 Files not reviewed due to no reviewable changes (3)
  • src/undate/dateformat/edtf/parser.py
  • tests/test_dateformat/edtf/test_edtf_parser.py
  • tests/test_dateformat/test_base.py
✅ Files skipped from review due to trivial changes (3)
  • src/undate/dateformat/base.py
  • src/undate/dateformat/edtf/transformer.py
  • tests/test_dateformat/edtf/test_edtf_transformer.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • README.md
  • docs/conf.py
🧰 Additional context used
🪛 Markdownlint
CONTRIBUTORS.md

51-51: Expected: 0; Actual: 2
Unordered list indentation

(MD007, ul-indent)


50-50: null
Link fragments should be valid

(MD051, link-fragments)

🪛 Ruff
tests/test_undate.py

1-1: datetime.timedelta imported but unused

Remove unused import: datetime.timedelta

(F401)

🔇 Additional comments (30)
.pre-commit-config.yaml (4)

3-8: Excellent choice in transitioning to ruff!

The replacement of black with ruff is a great improvement. Ruff combines linting and formatting, which can significantly speed up your pre-commit hooks. The configuration looks well thought out:

  1. Using ruff for linting with import sorting (--select I) and auto-fixing.
  2. Using ruff-format for code formatting, which is compatible with black's style.

This setup should provide faster and more comprehensive code quality checks.


17-17: Good decision to keep the trailing-whitespace hook.

Retaining this hook ensures consistent whitespace across the codebase, complementing the ruff configuration.


18-22: Great addition of mypy for type checking!

Adding mypy to the pre-commit hooks is an excellent practice for catching type-related issues early in development. The inclusion of types-python-dateutil is appropriate for projects using dateutil.

However, regarding the numpy dependency:

As mentioned in a previous comment, consider making numpy an optional dependency. This approach would allow users to install numpy only if they require the extended year range, as mentioned in the PR summary.

You could implement this by:

  1. Moving the numpy dependency to a separate optional-dependencies section in your project configuration.
  2. Updating the mypy configuration to conditionally include numpy types when it's installed.
  3. Using conditional imports in your code to handle cases where numpy isn't installed.

This change would require careful testing to ensure the code works correctly both with and without numpy installed.


Line range hint 1-22: Overall, excellent improvements to the pre-commit configuration!

The changes in this file significantly enhance the project's code quality checks:

  1. Transitioning from black to ruff combines linting and formatting, potentially speeding up pre-commit hooks.
  2. Retaining the trailing-whitespace hook ensures consistent whitespace.
  3. Adding mypy for type checking helps catch type-related issues early.

These improvements will lead to more robust and consistent code across the project. The only consideration is to evaluate making numpy an optional dependency, as discussed in the mypy configuration comment.

tests/test_date.py (10)

1-2: LGTM: Import statements are correct and necessary.

The import statements are now correctly utilized in the test file. The numpy import is used for creating a datetime64 object, and both Date and DatePrecision are used throughout the tests.


5-11: LGTM: TestDatePrecision class looks good.

The TestDatePrecision class contains two well-structured test methods:

  1. test_str: Correctly verifies the string representation of DatePrecision.YEAR.
  2. test_precision_comparison: Properly checks the ordering of precision levels.

These tests provide good coverage for the DatePrecision enum functionality.


15-19: LGTM: test_init_year method is well-structured.

The test_init_year method effectively tests the initialization of a Date object with only a year:

  1. It verifies that the created object is an instance of Date.
  2. It checks that the dtype is correctly set to "datetime64[Y]".
  3. It ensures the string representation is correct.

These assertions provide good coverage for year-only initialization.


21-25: LGTM: test_init_year_np_datetime64 method is well-implemented.

The test_init_year_np_datetime64 method effectively tests the initialization of a Date object with a numpy datetime64 object:

  1. It correctly creates a numpy datetime64 object for the year 2024.
  2. It verifies that the created object is an instance of Date.
  3. It checks that the dtype is correctly set to "datetime64[Y]".
  4. It ensures the string representation is correct.

These assertions provide good coverage for initialization with a numpy datetime64 object.


27-31: LGTM: test_init_year_month method is correctly implemented.

The test_init_year_month method effectively tests the initialization of a Date object with a year and month:

  1. It verifies that the created object is an instance of Date.
  2. It checks that the dtype is correctly set to "datetime64[M]".
  3. It ensures the string representation is correct, including both year and month.

These assertions provide good coverage for year-month initialization.


33-37: LGTM: test_init_year_month_day method is properly implemented.

The test_init_year_month_day method effectively tests the initialization of a Date object with a full date (year, month, and day):

  1. It verifies that the created object is an instance of Date.
  2. It checks that the dtype is correctly set to "datetime64[D]".
  3. It ensures the string representation is correct, including year, month, and day.

These assertions provide good coverage for full date initialization.


39-42: LGTM: test_properties_year method covers all scenarios.

The test_properties_year method effectively tests the 'year' property of Date objects:

  1. It checks the 'year' property for a year-only initialization.
  2. It verifies the 'year' property for a year-month initialization.
  3. It tests the 'year' property for a full date initialization.

These assertions ensure that the 'year' property works correctly across all initialization scenarios.


44-47: LGTM: test_properties_month method is comprehensive.

The test_properties_month method effectively tests the 'month' property of Date objects:

  1. It correctly checks that the 'month' property is None for a year-only initialization.
  2. It verifies the 'month' property for a year-month initialization.
  3. It tests the 'month' property for a full date initialization.

These assertions ensure that the 'month' property works correctly across all initialization scenarios.


49-52: LGTM: test_properties_day method is well-implemented.

The test_properties_day method effectively tests the 'day' property of Date objects:

  1. It correctly checks that the 'day' property is None for a year-only initialization.
  2. It verifies that the 'day' property is None for a year-month initialization.
  3. It tests the 'day' property for a full date initialization.

These assertions ensure that the 'day' property works correctly across all initialization scenarios. The use of 'is' for None comparisons is the correct practice.


1-52: Overall, the test file is well-structured and provides comprehensive coverage.

The tests/test_date.py file effectively tests the Date and DatePrecision classes:

  1. It covers all initialization scenarios for the Date class.
  2. It tests all properties (year, month, day) across different precision levels.
  3. It verifies the behavior of the DatePrecision enum.

The test methods are well-organized, and the assertions are thorough. This test suite should provide good confidence in the correctness of the Date and DatePrecision implementations.

pyproject.toml (5)

1-3: LGTM: Build system configuration is correct and modern.

The build system configuration using hatchling is appropriate and follows modern Python packaging practices.


5-45: LGTM: Comprehensive project metadata.

The project metadata is well-structured and provides essential information. The Python version requirement, dependencies, and classifiers are appropriate and informative.


19-19: Clarify the commented DHTech author line.

There's a commented line regarding a DHTech author. Please clarify if this should be included or if the comment can be removed.

#!/bin/bash
# Check if DHTech is mentioned elsewhere in the project
rg "DHTech" --type-not toml

72-76: LGTM: Correct hatch configuration.

The hatch configuration for versioning and source distribution building is correct and follows best practices.


78-86: LGTM: Appropriate test and type checking configurations.

The pytest and mypy configurations are well-defined:

  • The pytest setup includes custom markers and sets the correct Python path.
  • The mypy configuration properly includes the numpy typing plugin.

These settings will enhance the project's testing and type checking capabilities.

src/undate/dateformat/iso8601.py (1)

65-68: Improved ISO8601 date formatting

The changes enhance the to_string method by explicitly handling "month" and "day" portions with zero-padding. This improvement ensures consistent two-digit representation for month and day, which aligns perfectly with the ISO8601 standard for date formatting.

These modifications resolve the concern raised in the previous review about reduced flexibility. The new implementation maintains the method's ability to handle various date formats while improving consistency and readability.

CONTRIBUTORS.md (3)

1-7: LGTM: Clear introduction and proper badge implementation.

The introduction clearly explains the use of All Contributors, and the badge is correctly implemented using the recommended HTML comment structure.


9-45: LGTM: Well-structured contributors table.

The contributors table is well-implemented, following the All Contributors specification. It clearly displays each contributor's information and contributions, and includes a footer with instructions for adding new contributions.


1-51: Overall: Well-structured and informative contributors file.

This CONTRIBUTORS.md file effectively acknowledges project contributors using the All Contributors specification. It includes a clear introduction, a properly implemented badge, a comprehensive contributors table, and a section for related blog posts. With the suggested fixes for the blog posts section, this file will serve as an excellent resource for recognizing and encouraging contributions to the project.

🧰 Tools
🪛 Markdownlint

51-51: Expected: 0; Actual: 2
Unordered list indentation

(MD007, ul-indent)


50-50: null
Link fragments should be valid

(MD051, link-fragments)

tests/test_undate.py (4)

288-302: Acknowledge duration return type change and request documentation

The changes from timedelta assertions to astype("int") checks are consistent with the project's objective to use numpy.datetime64. However, as mentioned in a previous review, it would be beneficial to add comments or documentation in the Undate class to clarify why the duration method's return type was changed and how it should be used.

Would you like assistance in drafting the necessary documentation or comments for the Undate class?


306-316: LGTM: Consistent updates to partially known duration tests

The changes in this section are consistent with the modifications made in the test_duration method. The use of astype("int") for assertions maintains consistency across the test file.


396-418: Approve consistent changes and request clarification on commented assertion

The updates to use astype("int") for assertions are consistent with the changes made throughout the file. However, there's a commented-out assertion on line 418 that suggests a change in behavior:

# assert month_noyear_duration.astype("int") == 32
# this seems wrong, but we currently count both start and dates

This comment indicates a potential issue or change in how durations are calculated. Could you please clarify:

  1. Why was this assertion commented out?
  2. Is the current behavior of counting both start and end dates intentional?
  3. If this is a known issue, should it be documented or addressed in a future update?

426-426: LGTM: Consistent update to duration assertion

This change is consistent with the modifications made throughout the file, updating the assertion to use astype("int").

src/undate/date.py (2)

70-74: Incomplete return in month property method.

The month property does not have a return statement if self.dtype == "datetime64[Y]". This could lead to returning None implicitly, which may not be the intended behavior. Consider raising an exception or returning a default value instead.


75-79: Incomplete return in day property method.

The day property does not have a return statement if self.dtype != "datetime64[D]". This could lead to returning None implicitly, which may not be the intended behavior. Consider raising an exception or returning a default value instead.

src/undate/undate.py (1)

420-421: Handle potential issues with negative durations.

When calculating the duration, the check duration.astype("int") < 0 assumes that negative durations indicate the need to adjust the end date by adding ONE_YEAR. This logic may not account for all edge cases, such as when dates span multiple years or the precise year is unknown.

Consider adding additional validation or handling to ensure that duration calculations are accurate across all scenarios.

Run the following script to identify any occurrences where the duration could be negative and assess whether the current logic handles them properly:

✅ Verification successful

Negative Duration Handling Verified Successfully.

No instances of UndateInterval with earliest after latest were found, ensuring that negative durations are effectively handled by the existing logic.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find cases where duration calculation may result in negative values.

# Test: Search for usages of `UndateInterval` where `earliest` is after `latest`.
# Expect: No such cases should exist.

rg --type python 'UndateInterval\(\s*earliest\s*=\s*(.+?),\s*latest\s*=\s*(.+?)\)' -A 5

Length of output: 120


Script:

#!/bin/bash
# Description: Find cases where duration calculation may result in negative values.

# Test: Search for usages of `UndateInterval` where `earliest` is after `latest`.
# Expect: No such cases should exist.

rg --type py 'UndateInterval\(\s*earliest\s*=\s*(.+?),\s*latest\s*=\s*(.+?)\)' -A 5

Length of output: 85

pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
CONTRIBUTORS.md Outdated Show resolved Hide resolved
tests/test_undate.py Outdated Show resolved Hide resolved
src/undate/date.py Show resolved Hide resolved
Comment on lines +254 to +263
return all(
[
self.earliest <= other.earliest,
self.latest >= other.latest,
# is precision sufficient for comparing partially known dates?
# checking based on less precise /less granular time unit,
# e.g. a day or month could be contained in a year
# but not the reverse
self.precision < other.precision,
]
Copy link
Contributor

@coderabbitai coderabbitai bot Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Review precision comparison logic in the __contains__ method.

The comparison self.precision < other.precision may not reflect the intended logic. In the DatePrecision enumeration, higher values represent more precise dates (e.g., DAY is more precise than MONTH). Therefore, comparing self.precision < other.precision means that a more precise date is considered to contain a less precise one, which may be counterintuitive.

Consider revising the comparison to self.precision > other.precision if the intention is that a less precise date (e.g., a year) contains a more precise date (e.g., a specific day within that year).

Apply this diff to adjust the comparison:

 return all(
     [
         self.earliest <= other.earliest,
         self.latest >= other.latest,
-        self.precision < other.precision,
+        self.precision > other.precision,
     ]
 )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return all(
[
self.earliest <= other.earliest,
self.latest >= other.latest,
# is precision sufficient for comparing partially known dates?
# checking based on less precise /less granular time unit,
# e.g. a day or month could be contained in a year
# but not the reverse
self.precision < other.precision,
]
return all(
[
self.earliest <= other.earliest,
self.latest >= other.latest,
# is precision sufficient for comparing partially known dates?
# checking based on less precise /less granular time unit,
# e.g. a day or month could be contained in a year
# but not the reverse
self.precision > other.precision,
]
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is wrong, but it's causing me to think about it quite a bit which makes me wonder if there's a simpler or more intuitive way to implement this check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rlskoeser, I understand. Let me know if I can assist further.

(_/)
( •_•)
('')('')

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (7)
src/undate/date.py (4)

19-20: Consider removing the Export method if unused.

The Export method simply returns self, which doesn't provide any additional functionality. Unless this method is used elsewhere in the codebase for consistency with other classes, consider removing it to simplify the class.


32-37: LGTM: Well-defined constants for common time intervals.

The constants ONE_DAY, ONE_YEAR, and ONE_MONTH_MAX are well-named and their purposes are clear. They provide convenient access to commonly used time intervals using the Timedelta class.

Consider adding a comment for ONE_YEAR to clarify that it represents a non-leap year:

#: timedelta for a single year (365 days, non-leap year)
ONE_YEAR = Timedelta(365)

80-81: Consider removing the Export method if unused.

The Export method simply returns self, which doesn't provide any additional functionality. Unless this method is used elsewhere in the codebase for consistency with other classes, consider removing it to simplify the class.


120-140: LGTM: DatePrecision enum is well-implemented and documented.

The DatePrecision enum effectively represents different levels of date precision. The ordering of enum members is clearly explained in the comments, which is excellent for maintainability. The custom str method provides a convenient string representation.

Regarding the note about harmonizing with numpy date units:
Consider creating a separate issue or TODO item to track the potential harmonization with numpy date units. This would ensure that the idea is not lost and can be addressed in future updates.

tests/test_undate.py (2)

117-120: Create an issue to track leap day handling in unknown years.

The TODO comment highlights an important edge case that needs to be addressed. To ensure this doesn't get overlooked, it would be beneficial to create a GitHub issue to track this improvement.

Would you like me to create a GitHub issue to track the implementation of leap day handling for unknown years in the Undate class?


428-441: Approve new test cases and suggest future improvement.

The new test cases for UndateInterval durations with partially known dates are excellent additions:

  1. They cover important edge cases, such as intervals spanning a year and intervals crossing February in unknown years.
  2. The comment explaining the non-leap year assumption for unknown years is crucial for understanding the expected behavior.

These additions significantly improve the test coverage and documentation of the UndateInterval class behavior.

Consider adding a TODO comment to explore handling leap years in unknown year durations in the future. Would you like me to draft a TODO comment or create a GitHub issue for this potential improvement?

src/undate/undate.py (1)

129-131: Offer assistance for handling Feb 29 with unknown year.

There's a TODO comment indicating the need to handle cases where the date is February 29 with an unknown year, requiring the min_year and max_year to be adjusted to known leap years.

Would you like assistance in implementing this special case? I can help by providing code to ensure that when the day is February 29 and the year is unknown, the min_year and max_year are set to known leap years.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 2a7de40 and 039ff0e.

📒 Files selected for processing (5)
  • examples/notebooks/shxco_partial_date_durations.ipynb (112 hunks)
  • src/undate/date.py (1 hunks)
  • src/undate/undate.py (10 hunks)
  • tests/test_date.py (1 hunks)
  • tests/test_undate.py (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • examples/notebooks/shxco_partial_date_durations.ipynb
  • tests/test_date.py
🧰 Additional context used
🪛 Ruff
tests/test_undate.py

2-2: datetime.timedelta imported but unused

Remove unused import: datetime.timedelta

(F401)


4-4: numpy imported but unused

Remove unused import: numpy

(F401)

🔇 Additional comments (11)
src/undate/date.py (4)

1-6: LGTM: Import statements are appropriate and well-documented.

The import statements are correct and necessary for the implemented functionality. The comment about Union usage for Python versions before 3.10 is informative and helps maintainers understand the code's compatibility considerations.


9-29: LGTM: Timedelta class implementation is correct and follows numpy subclassing practices.

The Timedelta class effectively extends np.ndarray to provide a more datetime.timedelta-like interface. The new method correctly handles both int and np.timedelta64 inputs, and the days property provides a convenient way to access the number of days.


40-115: LGTM: Date class implementation is correct and follows numpy subclassing practices.

The Date class effectively extends np.ndarray to provide a more datetime.date-like interface. The new method handles various input formats, and the year/month/day properties correctly handle different date precisions. The sub method appropriately returns a Timedelta object when subtracting dates.


1-140: Overall, the implementation is solid with room for minor improvements.

The custom Date and Timedelta classes effectively extend numpy's functionality to provide a more familiar datetime-like interface. The code is well-structured and follows numpy subclassing practices. Here are the main points for improvement:

  1. Add input validation for month and day in the Date.new method.
  2. Handle potential None returns in the month and day properties of the Date class.
  3. Consider removing the Export methods if they're not used elsewhere in the codebase.
  4. Create a separate issue to track the potential harmonization of DatePrecision with numpy date units.

Addressing these points will further enhance the robustness and usability of the classes.

tests/test_undate.py (5)

295-302: Approve changes and commend improved test coverage.

The updates to the test_duration method are well-implemented:

  1. The assertion now correctly checks for the Timedelta type, aligning with the updated Undate implementation.
  2. The addition of a leap year test case for February improves the coverage of edge cases.

These changes enhance the robustness of the test suite.


399-401: Approve addition of minimum year leap year check.

The new test_min_year_non_leapyear method is a valuable addition to the test suite. It ensures that the minimum allowable year in the Undate class is not a leap year, which is crucial for consistent date calculations, especially when dealing with partially known dates.

This test helps prevent potential issues with leap year calculations at the lower bound of the allowable date range.


406-406: Approve Timedelta type check in UndateInterval duration test.

The update to check if week_duration is an instance of Timedelta is consistent with the earlier changes in the Undate class tests. This change ensures that the UndateInterval class is correctly using the custom Timedelta implementation for duration calculations.


Line range hint 1-441: Summary of changes and overall approval.

The updates to the test suite in test_undate.py are well-implemented and align with the project objectives:

  1. The transition to using the custom Timedelta class is consistent throughout the file.
  2. New test cases have been added to cover edge cases, particularly for partially known dates and leap year scenarios.
  3. The test coverage for both Undate and UndateInterval classes has been improved.
  4. Comments have been added to explain important assumptions and limitations.

These changes enhance the robustness of the test suite and provide better documentation of the expected behavior of the Undate and UndateInterval classes.

There are a few minor issues to address, such as removing an unused import and a duplicate assertion, but overall, these updates significantly improve the quality of the test suite.


313-313: ⚠️ Potential issue

Remove duplicate assertion.

There appears to be a duplicate assertion in the test_partiallyknown_duration method. The same assertion is written twice, which is unnecessary.

Please remove one of the duplicate assertions:

-        # assert Undate(day=5).duration().days == 1
        assert Undate(day=5).duration().days == 1

Likely invalid or redundant comment.

src/undate/undate.py (2)

324-325: Verify the use of astype(int) on delta.

In the condition if delta.astype(int) > 31:, the astype method is used on delta. Ensure that delta is a numpy.timedelta64 object that supports the astype method and that this operation works as intended.

Please confirm that delta.astype(int) correctly returns the duration in days as an integer. If delta is not a numpy.timedelta64, consider using an alternative method to obtain the integer value of the duration.


263-272: ⚠️ Potential issue

Correct the precision comparison logic in the __contains__ method.

The comparison self.precision < other.precision may not reflect the intended logic. Since higher DatePrecision values represent more precise dates (e.g., DatePrecision.DAY is more precise than DatePrecision.MONTH), using < means a more precise date is considered to contain a less precise one, which is counterintuitive.

Consider changing the comparison to self.precision > other.precision to indicate that a less precise date (e.g., a year) contains a more precise date (e.g., a specific day within that year).

Apply this diff to adjust the comparison:

 return all(
     [
         self.earliest <= other.earliest,
         self.latest >= other.latest,
-        self.precision < other.precision,
+        self.precision > other.precision,
     ]
 )

Likely invalid or redundant comment.

src/undate/date.py Show resolved Hide resolved
src/undate/date.py Outdated Show resolved Hide resolved
tests/test_undate.py Outdated Show resolved Hide resolved
@rlskoeser
Copy link
Member Author

@ColeDCrawford I was hoping I could just clean this PR up and merge without further review, but an error in the S&co example notebook caused me to find some problems with how I was handling duration logic in the new numpy-based implementation, which resulted in a bit more substantial change (and effort!) than I'd anticipated. I'm going to tag you for review on the updates.

I've resolved all the coderabbitai comments that I've either addressed or don't care about; one interesting/tricky one left open for now. It flagged a lot of things, which I have mixed feelings about - glad to have things cleaner and in better shape, but it felt like it resulted in more work for me.

FWIW I also went ahead and switched from black to ruff in the pre-commit hook. It doesn't seem to be correcting unused imports automatically, so if you have a set of options you prefer maybe you could open a PR to add them after this is merged.

Copy link
Collaborator

@ColeDCrawford ColeDCrawford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again and sorry for leaving this one hanging for ages!

Comment on lines 36 to 37
python3 -m venv undate
source undate/bin/activate
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default it looks like undate isn't gitignored. Do we want to change this to .venv, or add undate to .gitignore? uv uses .venv by default and that's in most .gitignore files (including ours).

It might also be helpful to either specify a Python version here (e.g. python3.12 -m venv undate) or include a .pythonversion file so folks using pyenv would install a virtual environment with our preferred version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for flagging this. I typically put my virtualenvs elsewhere and I know other tools have specific preferences.

I like the suggestions to switch to .venv and create a pythonversion file as a default for development.


### Check python types

Python typing is currently enforced on pull requests as part of a GitHub Actions Continuous Integration check using `mypy`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also during pre-commit hooks if those are installed, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, probably added after this was documented - I'll update, thanks for flagging

README.md Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitting out the development config into a separate file makes this a lot cleaner and more approachable for end users 👍

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, documenting the dataset is definitely helpful for any researchers trying to understand why they might use the package

setup.py Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good riddance

# custom properties to access year, month, day

@property
def year(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want the properties to have return types?

Comment on lines 124 to 134
# NOTE: values MUST be ordered based on the relative size or
# precison of the time unit. That is, the smaller the unit, the more precise
# it is: a day is more precise than a month, a month is more precise than a year,
# (DatePrecision.year < DatePrecision.month)

#: year
YEAR = 1
#: month
MONTH = 2
#: day
DAY = 3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One potential issue with ordering it this way is that it's going to be harder to extend. I guess we could do DECADE = 0? My understanding is that we can't get precise than DAY (since we're not recording times) but there are potentially future cases where we might want to be less precise than YEAR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the numbers would change as long as they are sequenced properly. The numbers only need to be consistent within a given installation of undate, right? (we're not storing them in a db or anything).

We could add decade and century now if you think it would be helpful, but was waiting until we had use cases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mostly thinking of the conversation earlier today where we heard that even decade precision is still very useful and even century is much better than nothing.

If we were to update this kind of core config it should probably be in at least a minor release? Also - is this "the PR" for v1.0, or at least v0.3? Looks like the last release was April?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add century and decade to try to future proof a little bit, but I don't think the actual numbers should matter - they should never be referenced as numbers, only as constants, and they are only defined within the context of an installed version of undate. Am I thinking about that wrong?

@@ -15,9 +15,8 @@
import importlib
import logging
import pkgutil
from typing import Dict, Type
from functools import lru_cache # functools.cache not available until 3.9
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use this now, right? We dropped 3.8 support?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for catching - although not sure why this was in the change set, sorry for such an enormous PR!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice thanks!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (11)
src/undate/dateformat/base.py (2)

Line range hint 42-65: Consider a more robust way to handle base module exclusion

The current implementation uses string matching (not modname.endswith(".base")) to exclude the base module. This approach could be fragile if the module structure changes.

Consider using a more explicit approach:

-            if not modname.endswith(".base"):
+            # Get the module name without the package prefix
+            module_name = modname[len(formatter_prefix):]
+            # Explicitly exclude the current module
+            if module_name != "base":
                 importlib.import_module(modname)
                 import_count += 1

Line range hint 29-35: Document numpy datetime64 support in method docstrings

Since this PR introduces numpy datetime64 support, it would be helpful to document the expected types in the method docstrings, even if type hints can't be added due to circular imports.

Add documentation like this:

     def parse(self, value: str):
-        # can't add type hint here because of circular import
-        # should return an undate or undate interval
+        """Parse a string value into an Undate or UndateInterval.
+        
+        The implementation should support both datetime.date and numpy.datetime64
+        based dates to maintain compatibility.
+        
+        Args:
+            value: String representation of the date
+
+        Returns:
+            Undate or UndateInterval object
+        """
         raise NotImplementedError
pyproject.toml (3)

31-44: Add numpy-related classifiers.

Since numpy is now a core dependency, consider adding relevant classifiers:

 classifiers = [
     "Development Status :: 2 - Pre-Alpha",
     "Programming Language :: Python :: 3",
     "Programming Language :: Python :: 3.9",
     "Programming Language :: Python :: 3.10",
     "Programming Language :: Python :: 3.11",
     "Programming Language :: Python :: 3.12",
     "Intended Audience :: Developers",
     "License :: OSI Approved :: Apache Software License",
     "Operating System :: OS Independent",
     "Topic :: Software Development :: Libraries :: Python Modules",
     "Topic :: Utilities",
     "Typing :: Typed",
+    "Topic :: Scientific/Engineering",
+    "Intended Audience :: Science/Research",
 ]

12-12: Consider making numpy an optional dependency.

As discussed in the PR objectives, consider moving numpy to optional dependencies to allow users who don't need the extended year range functionality to avoid the dependency. This would make the package lighter for web-based projects.

-dependencies = ["python-dateutil", "lark", "numpy"]
+dependencies = ["python-dateutil", "lark"]

 [project.optional-dependencies]
+extended-dates = ["numpy"]
 docs = ["sphinx>=7.0.0", "alabaster", "myst-parser", "myst-parser[linkify]"]

79-82: Fix typo in pytest marker description.

There's a typo in the "first" marker description:

 markers = [
     "last : run marked tests after all others",
-    "first : run marked tests efore all others",
+    "first : run marked tests before all others",
 ]
DEVELOPER_NOTES.md (4)

7-11: Fix grammar: "checkout" should be "check out"

The word "checkout" is being used as a verb here, which should be written as two words: "check out".

-### Use git to checkout a copy of the repository
+### Use git to check out a copy of the repository
🧰 Tools
🪛 LanguageTool

[grammar] ~7-~7: The word “checkout” is a noun. The verb is spelled with a white space.
Context: ... you started. ## Setup ### Use git to checkout a copy of the repository ```sh git clon...

(NOUN_VERB_CONFUSION)


28-31: Fix numbering in manual git flow instructions

The step numbering jumps from 1 to 3, which appears to be a typo.

 If you cannot or prefer not to install git flow, you can do the equivalent manually.
 1. Check out the develop branch: `git checkout develop`
-3. Create new feature manually from the develop branch: `git checkout -b feature/xxx-name`
+2. Create new feature manually from the develop branch: `git checkout -b feature/xxx-name`

37-40: Add language specification to code block

For better syntax highlighting and consistency, specify the shell language for this code block.

-```
+```sh
 python3 -m venv .venv
 source .venv/bin/activate

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 Markdownlint</summary>

37-37: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

---

`60-67`: **Consider adding code coverage information**

The testing instructions are clear, but it might be helpful to add information about viewing code coverage reports, especially since the PR objectives mention benchmarking and testing requirements.

Consider adding:
```diff
 To test cases by method name, use `-k`: `pytest -k test_str`
+
+To run tests with coverage reporting:
+```sh
+pytest --cov=undate --cov-report=html
+```
+The coverage report will be generated in the `htmlcov` directory.
src/undate/date.py (2)

10-11: Typo in Docstring: Capitalize 'Python'

In the docstring, 'python' should be capitalized to 'Python' to adhere to proper noun conventions.


127-127: Typo in Comment: Correct 'precison' to 'precision'

There's a typographical error in the comment; 'precison' should be corrected to 'precision'.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 039ff0e and ed3f493.

📒 Files selected for processing (6)
  • .github/workflows/unit_tests.yml (2 hunks)
  • .pythonversion (1 hunks)
  • DEVELOPER_NOTES.md (1 hunks)
  • pyproject.toml (1 hunks)
  • src/undate/date.py (1 hunks)
  • src/undate/dateformat/base.py (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • .pythonversion
🧰 Additional context used
🪛 actionlint
.github/workflows/unit_tests.yml

37-37: property "python-cache" is not defined in object type {}

(expression)

🪛 LanguageTool
DEVELOPER_NOTES.md

[grammar] ~7-~7: The word “checkout” is a noun. The verb is spelled with a white space.
Context: ... you started. ## Setup ### Use git to checkout a copy of the repository ```sh git clon...

(NOUN_VERB_CONFUSION)


[style] ~50-~50: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... sh pip install -e ".[dev]" ### Install pre-commit hooks We use [pre-commit](h...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 Markdownlint
DEVELOPER_NOTES.md

37-37: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (8)
.github/workflows/unit_tests.yml (5)

14-14: LGTM! Python version update aligns with the check workflow.

The update to Python 3.12 for coverage reporting is consistent with the suggestion to use 3.12 as the default version.


33-33: LGTM! Cache configuration updated correctly.

The cache dependency path has been properly updated to reflect the move to pyproject.toml.


41-41: LGTM! Test execution simplified and coverage properly scoped.

The changes from tox to direct pytest commands simplify the workflow, and the coverage is correctly scoped to the 'undate' package.

Also applies to: 46-46


36-36: Verify [test] extra dependencies.

The installation command now uses .[test] instead of .[dev]. Let's verify the test dependencies are properly defined.

#!/bin/bash
# Description: Verify test dependencies configuration

# Test 1: Check if [test] extra exists in pyproject.toml
rg '\[project\.optional-dependencies\]' -A 10 pyproject.toml

# Test 2: Verify pytest and pytest-cov are included
rg '(pytest|pytest-cov)' pyproject.toml

21-21: Verify numpy compatibility with Python 3.13.

Since this PR introduces numpy as a dependency, we should verify its compatibility with Python 3.13 (currently in alpha).

src/undate/dateformat/base.py (1)

18-18: LGTM: Good modernization using cache

The switch from lru_cache to cache is a good modernization step, as cache is a simpler alternative introduced in Python 3.9 that's equivalent to lru_cache(maxsize=None).

pyproject.toml (1)

84-85: Verify numpy type hints usage.

The mypy numpy plugin is correctly configured. Let's verify that numpy type hints are being used in the codebase:

✅ Verification successful

Let me gather more information about numpy type hints usage by checking for additional numpy type annotation patterns.


NumPy type hints are properly used in the codebase

The codebase demonstrates appropriate usage of NumPy type hints:

  • Custom classes Date and Timedelta properly extend np.ndarray
  • Type annotations use np.datetime64 and np.timedelta64 correctly
  • Function parameters and return types are properly annotated with NumPy types combined with Python's typing module (Union)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for numpy type hint usage in the codebase
rg "numpy\.(datetime64|ndarray|typing)" src/

Length of output: 128


Script:

#!/bin/bash
# Check for more numpy type hint patterns
rg "np\.(ndarray|datetime64|typing)" src/

# Check for type annotations with numpy
rg "(:|->)\s*(np\.|numpy\.)" src/

# Look at the content of date.py to understand the context
cat src/undate/date.py

Length of output: 4974

DEVELOPER_NOTES.md (1)

42-56: LGTM! Clear installation instructions

The installation instructions are well-structured and include both package installation and pre-commit hooks setup as discussed in the previous review.

🧰 Tools
🪛 LanguageTool

[style] ~50-~50: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... sh pip install -e ".[dev]" ### Install pre-commit hooks We use [pre-commit](h...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

- name: Install package with dependencies
run: |
pip install -e ".[dev]"
python -m pip install tox tox-gh-actions
pip install -e ".[test]"
if: steps.python-cache.outputs.cache-hit != 'true'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix undefined step output reference.

The condition steps.python-cache.outputs.cache-hit references an undefined step. The step for caching is using actions/setup-python@v4 which doesn't create this output.

Remove the condition since the cache is handled by actions/setup-python:

- if: steps.python-cache.outputs.cache-hit != 'true'
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if: steps.python-cache.outputs.cache-hit != 'true'
🧰 Tools
🪛 actionlint

37-37: property "python-cache" is not defined in object type {}

(expression)

@rlskoeser rlskoeser changed the base branch from main to develop November 7, 2024 22:26
@rlskoeser rlskoeser merged commit 5c97e4b into develop Nov 7, 2024
6 checks passed
@rlskoeser rlskoeser deleted the experiment/numpy-datetime64 branch November 7, 2024 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants