-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use numpy datetime64 instead of datetime.date #77 #84
Conversation
81299bb
to
3306bde
Compare
@coderabbitai full review |
Actions performedFull review triggered. |
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe changes encompass updates to several configuration files, workflow definitions, and documentation files for the Changes
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 13
Outside diff range, codebase verification and nitpick comments (2)
setup.cfg (1)
44-44
: Approved: Addition of NumPy as a DependencyThe inclusion of
numpy
in theinstall_requires
section is crucial for the project's functionality, especially given the shift to usingnumpy.datetime64
. However, consider the implications of this addition on installation size and dependency management, particularly for users in constrained environments.Consider making
numpy
an optional dependency if feasible, to accommodate users who may not require the extended functionality.src/undate/undate.py (1)
Line range hint
287-316
: Review theduration
method implementation.The
duration
method has been significantly changed to accommodate the newDate
class and potentially returnnumpy.timedelta64
. Ensure that the logic correctly calculates the duration based on the date precision and handles all edge cases, especially with unknown or partially known dates.
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (7)
- pyproject.toml (1 hunks)
- setup.cfg (1 hunks)
- src/undate/date.py (1 hunks)
- src/undate/dateformat/iso8601.py (1 hunks)
- src/undate/undate.py (9 hunks)
- tests/test_date.py (1 hunks)
- tests/test_undate.py (3 hunks)
Additional context used
Ruff
tests/test_date.py
1-1:
numpy
imported but unusedRemove unused import:
numpy
(F401)
3-3:
undate.date.ONE_DAY
imported but unusedRemove unused import
(F401)
3-3:
undate.date.ONE_YEAR
imported but unusedRemove unused import
(F401)
3-3:
undate.date.ONE_MONTH_MAX
imported but unusedRemove unused import
(F401)
24-24: Redefinition of unused
test_init_year_month
from line 18(F811)
42-42: Comparison to
None
should becond is None
Replace with
cond is None
(E711)
src/undate/date.py
4-4:
typing.Dict
imported but unusedRemove unused import
(F401)
4-4:
typing.Union
imported but unusedRemove unused import
(F401)
68-68: Use
self.dtype != "datetime64[Y]"
instead ofnot self.dtype == "datetime64[Y]"
Replace with
!=
operator(SIM201)
src/undate/undate.py
6-6:
typing.Any
imported but unusedRemove unused import:
typing.Any
(F401)
8-8:
numpy
imported but unusedRemove unused import:
numpy
(F401)
9-9:
numpy.typing.ArrayLike
imported but unusedRemove unused import
(F401)
9-9:
numpy.typing.DTypeLike
imported but unusedRemove unused import
(F401)
Additional comments not posted (3)
pyproject.toml (1)
8-9
: Approved: Enhanced Type Checking for NumPyThe addition of the
[tool.mypy]
section with the NumPy typing plugin is a significant improvement for the project. It ensures enhanced type safety and better integration with NumPy, which is crucial given the project's new dependency on this library.src/undate/undate.py (2)
267-267
: Static method usage.The
from_datetime_date
method is correctly updated to handle thedatetime.date
input and convert it to anUndate
object. This method serves as a bridge between the olddatetime.date
usage and the newDate
class, facilitating easier transition and integration.
128-129
: Initialization ofDate
objects.The
earliest
andlatest
attributes of theUndate
class are now initialized using the customDate
class. Ensure that theDate
class constructor correctly handles the extreme values and any edge cases that might arise from the broad range of years.Verification successful
Initialization of
Date
objects is well-handled.The
Date
class uses NumPy'sdatetime64
for date handling, and test cases cover various initialization scenarios, ensuring robustness against edge cases. The__new__
method inDate
handles year, month, and day parameters effectively. No further action is needed.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify the Date class constructor with extreme values. ast-grep --lang python --pattern $'class Date { $$$ __init__($_, $_, $_) { $$$ } $$$ }'Length of output: 447
Script:
#!/bin/bash # Correct the pattern to search for the Date class and its __init__ method in Python. ast-grep --lang python --pattern $'class Date:\n def __init__(self, $_, $_, $_):\n $$$'Length of output: 90
Script:
#!/bin/bash # Search for any class definitions related to dates in the codebase. rg 'class\s+\w+' -A 2 # Search for any imports related to date functionality. rg 'import\s+\w+' -A 2Length of output: 11688
Script:
#!/bin/bash # Extract the __init__ method of the Date class from src/undate/date.py to review its implementation. ast-grep --lang python --pattern $'class Date(np.ndarray):\n def __init__(self, $_, $_, $_):\n $$$'Length of output: 102
Script:
#!/bin/bash # Search for method definitions within the Date class in src/undate/date.py. rg 'def\s+\w+' src/undate/date.py -A 2 # Search for test cases that use the Date class to check for handling of extreme values. rg 'Date\(' tests/ -A 2Length of output: 1921
tests/test_date.py
Outdated
assert d.dtype == "datetime64[M]" | ||
assert str(d) == "2010-05" | ||
|
||
def test_init_year_month(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct Redefinition of Test Function
The function test_init_year_month
is defined twice (lines 18 and 24), which is likely an oversight. This redefinition could cause one of the tests to be skipped, leading to incomplete test coverage.
Consider renaming one of the functions or combining them if they are intended to test different scenarios. Here is a proposed fix to rename the second function to test_init_year_month_day
:
- def test_init_year_month(self):
+ def test_init_year_month_day(self):
Also applies to: 18-18
Tools
Ruff
24-24: Redefinition of unused
test_init_year_month
from line 18(F811)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah, I think I found and fixed this in a different branch when I was working on it yesterday
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rlskoeser, thanks for the update! Since you've already fixed this in another branch, no further action is needed here regarding the duplicate test function. I'll proceed with the rest of the review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
tests/test_date.py
Outdated
import numpy as np | ||
|
||
from undate.date import Date, DatePrecision, ONE_DAY, ONE_YEAR, ONE_MONTH_MAX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove Unused Imports
The static analysis has correctly identified several unused imports:
numpy
is imported but not used directly in the tests.ONE_DAY
,ONE_YEAR
, andONE_MONTH_MAX
are imported but not used.
Consider removing these to clean up the code and reduce confusion about dependencies.
Apply this diff to remove the unused imports:
-import numpy as np
-from undate.date import Date, DatePrecision, ONE_DAY, ONE_YEAR, ONE_MONTH_MAX
+from undate.date import Date, DatePrecision
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
import numpy as np | |
from undate.date import Date, DatePrecision, ONE_DAY, ONE_YEAR, ONE_MONTH_MAX | |
from undate.date import Date, DatePrecision |
Tools
Ruff
1-1:
numpy
imported but unusedRemove unused import:
numpy
(F401)
3-3:
undate.date.ONE_DAY
imported but unusedRemove unused import
(F401)
3-3:
undate.date.ONE_YEAR
imported but unusedRemove unused import
(F401)
3-3:
undate.date.ONE_MONTH_MAX
imported but unusedRemove unused import
(F401)
tests/test_date.py
Outdated
|
||
def test_properties_day(self): | ||
assert Date(2001).day is None | ||
assert Date(2010, 5).day == None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use is
for None Comparisons
The comparison to None
using ==
is not recommended. Using is
is the correct practice in Python for clarity and correctness.
Apply this diff to correct the comparison:
- assert Date(2010, 5).day == None
+ assert Date(2010, 5).day is None
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
assert Date(2010, 5).day == None | |
assert Date(2010, 5).day is None |
Tools
Ruff
42-42: Comparison to
None
should becond is None
Replace with
cond is None
(E711)
src/undate/date.py
Outdated
from enum import IntEnum | ||
|
||
# Pre 3.10 requires Union for multiple types, e.g. Union[int, None] instead of int | None | ||
from typing import Optional, Dict, Union |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused imports.
The imports for Dict
and Union
from the typing
module are not used in the file. Removing these will clean up the code and reduce confusion about dependencies.
Apply this diff to remove the unused imports:
-from typing import Optional, Dict, Union
+from typing import Optional
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from enum import IntEnum | |
# Pre 3.10 requires Union for multiple types, e.g. Union[int, None] instead of int | None | |
from typing import Optional, Dict, Union | |
from enum import IntEnum | |
# Pre 3.10 requires Union for multiple types, e.g. Union[int, None] instead of int | None | |
from typing import Optional |
Tools
Ruff
4-4:
typing.Dict
imported but unusedRemove unused import
(F401)
4-4:
typing.Union
imported but unusedRemove unused import
(F401)
src/undate/date.py
Outdated
@property | ||
def year(self): | ||
return int(str(self.astype("datetime64[Y]"))) | ||
|
||
@property | ||
def month(self): | ||
# if date unit is year, don't return a month (only M/D) | ||
if not self.dtype == "datetime64[Y]": | ||
return int(str(self.astype("datetime64[M]")).split("-")[-1]) | ||
|
||
@property | ||
def day(self): | ||
# only return a day if date unit is in days | ||
if self.dtype == "datetime64[D]": | ||
return int(str(self.astype("datetime64[D]")).split("-")[-1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refine logical expressions in property methods.
The properties month
and day
use a less readable form of checking the data type. Replacing not self.dtype == "datetime64[Y]"
with self.dtype != "datetime64[Y]"
will improve the readability and maintainability of the code.
Apply this diff to refine the logical expression:
- if not self.dtype == "datetime64[Y]":
+ if self.dtype != "datetime64[Y]":
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
@property | |
def year(self): | |
return int(str(self.astype("datetime64[Y]"))) | |
@property | |
def month(self): | |
# if date unit is year, don't return a month (only M/D) | |
if not self.dtype == "datetime64[Y]": | |
return int(str(self.astype("datetime64[M]")).split("-")[-1]) | |
@property | |
def day(self): | |
# only return a day if date unit is in days | |
if self.dtype == "datetime64[D]": | |
return int(str(self.astype("datetime64[D]")).split("-")[-1]) | |
@property | |
def year(self): | |
return int(str(self.astype("datetime64[Y]"))) | |
@property | |
def month(self): | |
# if date unit is year, don't return a month (only M/D) | |
if self.dtype != "datetime64[Y]": | |
return int(str(self.astype("datetime64[M]")).split("-")[-1]) | |
@property | |
def day(self): | |
# only return a day if date unit is in days | |
if self.dtype == "datetime64[D]": | |
return int(str(self.astype("datetime64[D]")).split("-")[-1]) |
Tools
Ruff
68-68: Use
self.dtype != "datetime64[Y]"
instead ofnot self.dtype == "datetime64[Y]"
Replace with
!=
operator(SIM201)
src/undate/undate.py
Outdated
return all( | ||
[ | ||
self.earliest <= other.earliest, | ||
self.latest >= other.latest, | ||
# is precision sufficient for comparing partially known dates? | ||
self.precision > other.precision, | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review precision comparison in __contains__
method.
The comparison of self.precision > other.precision
in the __contains__
method assumes a certain hierarchy or ordering of precision values. This logic might not be intuitive or could lead to incorrect results if the precision levels are not strictly comparable in a linear fashion.
src/undate/undate.py
Outdated
|
||
# Pre 3.10 requires Union for multiple types, e.g. Union[int, None] instead of int | None | ||
from typing import Optional, Dict, Union | ||
from typing import Optional, Dict, Union, Any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused import.
The import for typing.Any
is flagged as unused. Removing unused imports helps maintain the cleanliness and readability of the code.
Proposed removal:
-from typing import Optional, Dict, Union, Any
+from typing import Optional, Dict, Union
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from typing import Optional, Dict, Union, Any | |
from typing import Optional, Dict, Union |
Tools
Ruff
6-6:
typing.Any
imported but unusedRemove unused import:
typing.Any
(F401)
src/undate/undate.py
Outdated
import numpy as np | ||
from numpy.typing import ArrayLike, DTypeLike |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused imports.
The imports for numpy
, ArrayLike
, and DTypeLike
are flagged as unused by static analysis tools. If these are not used in the current implementation, they should be removed to clean up the code.
Proposed removal:
-import numpy as np
-from numpy.typing import ArrayLike, DTypeLike
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
import numpy as np | |
from numpy.typing import ArrayLike, DTypeLike |
Tools
Ruff
8-8:
numpy
imported but unusedRemove unused import:
numpy
(F401)
9-9:
numpy.typing.ArrayLike
imported but unusedRemove unused import
(F401)
9-9:
numpy.typing.DTypeLike
imported but unusedRemove unused import
(F401)
src/undate/dateformat/iso8601.py
Outdated
elif date_portion == "month": | ||
date_parts.append("%02d" % undate.earliest.month) | ||
elif date_portion == "day": | ||
date_parts.append("%02d" % undate.earliest.day) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enhanced date portion handling in to_string
method.
The modifications to explicitly handle "month" and "day" portions improve the method's ability to format dates accurately. The use of string formatting ensures that these portions are always represented with two digits, which enhances consistency.
However, the removal of the previously commented-out code (lines 69-74) reduces the method's flexibility to handle other date portions dynamically. If flexibility in handling various date formats is a requirement, consider reintroducing or refactoring this part of the code.
tests/test_undate.py
Outdated
# assert isinstance(day_duration, timedelta) | ||
assert day_duration.astype("int") == 1 | ||
|
||
january_duration = Undate(2022, 1).duration() | ||
assert january_duration.days == 31 | ||
assert january_duration.astype("int") == 31 | ||
feb_duration = Undate(2022, 2).duration() | ||
assert feb_duration.days == 28 | ||
assert feb_duration.astype("int") == 28 | ||
# next leap year will be 2024 | ||
leapyear_feb_duration = Undate(2024, 2).duration() | ||
assert leapyear_feb_duration.days == 29 | ||
assert leapyear_feb_duration.astype("int") == 29 | ||
|
||
year_duration = Undate(2022).duration() | ||
assert year_duration.days == 365 | ||
assert year_duration.astype("int") == 365 | ||
leapyear_duration = Undate(2024).duration() | ||
assert leapyear_duration.days == 366 | ||
assert leapyear_duration.astype("int") == 366 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated assertions in test_duration
method.
The replacement of timedelta
assertions with astype("int")
reflects a change in the duration
method's return type. This update simplifies the assertions and aligns them with the new method implementation.
However, it would be beneficial to add comments or documentation in the Undate
class to clarify why the duration
method's return type was changed and how it should be used.
Would you like me to help draft the necessary documentation or comments?
Observations on
|
# adapted from https://stackoverflow.com/a/27129510/9706217 | ||
|
||
def __new__(cls, year: int, month: Optional[int] = None, day: Optional[int] = None): | ||
if isinstance(year, np.datetime64): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's my non-existing knowledge about typing in python, but shouldn't year be of type int
per line 24? Or is datetime64
a subclass of int
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my question is if the type of year
should be a different one :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of my comments are in setup.cfg
around NumPy - mostly just thinking through the implications of adding it as a dependency. Overall I think I am sold as it expands the capability of the undate
package quite a bit. It would be useful to add some benchmarks to the release notes for the next version to detail the previous and new installed sizes for undate
, and the previous / new runtimes for some operations to see how numpy
impacts performance and payload.
tests/test_date.py
Outdated
assert d.dtype == "datetime64[M]" | ||
assert str(d) == "2010-05" | ||
|
||
def test_init_year_month(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch
src/undate/undate.py
Outdated
# numpy datetime is stored as 64-bit integer, so min/max | ||
# depends on the time unit; assume days for now | ||
# See https://numpy.org/doc/stable/reference/arrays.datetime.html#datetime-units | ||
max_year = int(2.5e16) | ||
min_year = int(-2.5e16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know that I agree with the suggestion that this be configurable or what the advantages of that would be, if we're initializing a 64bit int I don't think making the min/max smaller would save any memory. If we're committing to NumPy then we should use it consistently.
setup.cfg
Outdated
@@ -41,6 +41,7 @@ python_requires = >=3.8 | |||
install_requires = | |||
python-dateutil | |||
lark | |||
numpy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we merge #88 first, we'll have to move this to the pyproject.toml
dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to limit this to NumPy v2, which was released in May? That was the first major release since 2006. If we do limit it to v2, we should also drop support for Python < 3.10. It looks like NumPy v2 requires Python >= 3.10.
I'm pretty sure that there are no direct Python dependencies, just some optional dependencies for speeding up operations. pip
should provide pre-built wheels.
NumPy looks like it will add 30-80MB to the payload. For most applications that's not an issue. But maybe we should benchmark the current version of undate
and the size with numpy
? The environments that I'm typically space-constrained in are Lambdas. I think there is still a 50MB (zipped) and 250MB (unzipped) limit for Lambda functions, so this could take up a significant chunk of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm still leaning towards using NumPy
given the need to support dates outside of 9999. The other way that we could go is with struct_time
, which is what python-edtf
uses. But it's a little clunky and you'd still have to have a shim class (like you add here) or a conversion function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to make NumPy optional seems like a nightmare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking it would be nice to move to Python 3.10+ anyway, it gets us improved python type hinting and 3.9 is getting old enough at this point that it seems reasonable to drop.
I'll update this and reconcile with the other merged PRs, address the flagged comments about type hints, and then do some testing and documentation of benchmarking as suggested before I merge this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📋 Review Summary
- Number of files reviewed: 7
- Number of comments: 3
- Number of suggestions: 0
📚 File changes
File path | File changes |
---|---|
pyproject.toml | Added configuration for mypy to include numpy typing plugin. |
setup.cfg | Added numpy as a required dependency in the install_requires section. |
src/undate/date.py | Introduced a custom Date class for enhanced date handling with properties for year, month, and day. |
src/undate/dateformat/iso8601.py | Refactored the date formatting logic to directly append formatted strings for year, month, and day instead of using strftime . |
src/undate/undate.py | No summary. |
tests/test_date.py | Added tests for the new Date and DatePrecision classes. |
tests/test_undate.py | Removed the DatePrecision class and its associated tests, and updated duration tests to use astype('int') instead of checking for timedelta . |
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
🧹 Outside diff range and nitpick comments (29)
docs/readme.md (1)
1-4
: Excellent use of content inclusion!The use of the
{include}
directive to embed the main README content is a great approach to maintain consistency between the project's root documentation and the docs folder. This method ensures that updates to the main README are automatically reflected in the documentation.A few suggestions for improvement:
- Consider adding a comment explaining the purpose of this file for future maintainers.
- The
:relative-images:
parameter is empty. If it's intentional, you might want to add a comment explaining why, or remove it if it's not needed.- It might be helpful to add error handling in case the included file is missing, perhaps using a try-except block or a conditional include if your documentation tool supports it.
Here's a suggested improvement:
+# This file includes the main README.md content in the documentation +# to ensure consistency and avoid duplication. ```{include} ../README.md :relative-docs: / -:relative-images: +:relative-images: /This change adds a helpful comment and sets the `:relative-images:` parameter to the root, matching the `:relative-docs:` parameter. If this isn't the intended behavior, please adjust accordingly or remove the parameter if it's not needed. </blockquote></details> <details> <summary>docs/_static/custom.css (1)</summary><blockquote> `1-7`: **LGTM! Consider a minor improvement for consistency.** The CSS rule effectively styles the "powered by" links in the Sphinx sidebar, removing text decoration and bottom borders while centering the text. This creates a clean and unobtrusive appearance for these links. For consistency with common CSS formatting practices, consider adding a space after the colon in each property declaration. Here's a suggested refinement: ```diff div.sphinxsidebar .powered_by a { - text-decoration: none; - border-bottom: none; - text-align: center; + text-decoration: none; + border-bottom: none; + text-align: center; }
This minor change improves readability and aligns with widely-used CSS formatting conventions.
examples/README.md (1)
1-7
: LGTM! Well-structured README with clear content.The README provides a concise and informative overview of the examples directory. The structure is clear, with a title, description, and two well-defined sections for use cases and notebooks. The content aligns well with the purpose of introducing example data and projects using the
undate
library.Consider adding a brief introduction to
undate
.To make this README even more helpful, especially for newcomers, consider adding a brief one-sentence description of what
undate
is at the beginning of the file. This would provide immediate context for readers who might not be familiar with the library.Here's a suggested addition after line 2:
# undate examples + `undate` is a Python library for handling uncertain dates in historical data. Example data and projects with use cases for uncertain date logic and example code notebooks using undate.docs/index.rst (1)
13-16
: LGTM: Improved documentation structure with a minor suggestion.The changes to the table of contents entries enhance the documentation structure:
- Adding CONTRIBUTING and DEVELOPER_NOTES improves guidance for potential contributors.
- Replacing "authors" with CONTRIBUTORS aligns with a more collaborative attribution model.
- Changing "license" to LICENSE maintains consistency with file naming conventions.
These updates align well with best practices for open-source project documentation.
Consider adding a brief description next to each new entry for better clarity, like this:
CONTRIBUTING DEVELOPER_NOTES CONTRIBUTORS LICENSE CONTRIBUTING Guidelines for contributing to the project DEVELOPER_NOTES Additional information for developers CONTRIBUTORS List of project contributors LICENSE Project license informationThis would provide users with a quick overview of each document's content without needing to navigate to it.
.github/workflows/unit_tests.yml (1)
33-33
: Approve the update to pyproject.toml and suggest a minor improvement.The change from
setup.cfg
topyproject.toml
for caching dependencies is a good modernization step, aligning with current Python packaging standards. This change is consistent with the project's transition topyproject.toml
for configuration.Consider adding
poetry.lock
to the cache dependency path if you're using Poetry for dependency management:cache-dependency-path: | **/pyproject.toml **/poetry.lockThis ensures that the cache is invalidated when either file changes, providing more precise dependency caching.
.all-contributorsrc (1)
3-3
: Approved: Good practice to use a separate file for contributorsMoving the contributors list to a separate
CONTRIBUTORS.md
file is a good practice. It keeps the README focused on essential project information while still maintaining a dedicated space for acknowledging contributors.Consider updating the project documentation to reflect this change, ensuring that any references to the contributors list point to the new
CONTRIBUTORS.md
file.CONTRIBUTING.md (2)
11-13
: Good updates to contribution guidelines.The changes to use the
develop
branch for pull requests and emphasize including documentation and tests are excellent improvements to the contribution process. These practices will help maintain code quality and streamline the review process.Consider adding a comma after "barrier" in the sentence "Ideally contributions should include documentation and tests for the proposed changes, but if that is a barrier please let us know when you submit a pull request." This small change would improve readability:
-Ideally contributions should include documentation and tests for the proposed changes, but if that is a barrier please let us know when you submit a pull request. +Ideally contributions should include documentation and tests for the proposed changes, but if that is a barrier, please let us know when you submit a pull request.🧰 Tools
🪛 LanguageTool
[uncategorized] ~11-~11: Possible missing comma found.
Context: ...evelop** branch when you are ready. For details see [this description of the forking Wo...(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~12-~12: A comma might be missing here.
Context: ...comparing-workflows/forking-workflow). Ideally contributions should include documentat...(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)
[typographical] ~13-~13: It seems that a comma is missing.
Context: ...oposed changes, but if that is a barrier please let us know when you submit a pull requ...(IF_PLEASE_COMMA)
20-23
: Excellent addition of Use Cases and Example Data section.The new section encouraging contributors to submit use cases and example data is a valuable addition. This will help in understanding real-world applications of the project and potentially guide future development.
Consider adding a brief explanation of how these examples and use cases will be utilized. This could encourage more contributions. For example:
Example data can be added to the [examples/](https://github.com/dh-tech/undate-python/tree/main/examples/) folder by a pull request. +These examples will be used to improve the project's functionality and documentation, ensuring undate meets real-world needs.
examples/use-cases/shakespeare-and-company-project/README.md (3)
5-9
: Consider enhancing the project link visibility.The overview and date format explanation are well-written and informative. To improve user experience, consider making the project link more prominent, perhaps by placing it on a separate line or using bold formatting.
Suggested change:
-The [Shakespeare and Company Project](https://shakespeareandco.princeton.edu/) publishes an events dataset that documents lending library activities such as subscribing, renewing, and borrowing and returning books. Some of the dates for those events are only partially known. +# [Shakespeare and Company Project](https://shakespeareandco.princeton.edu/) + +This project publishes an events dataset that documents lending library activities such as subscribing, renewing, and borrowing and returning books. Some of the dates for those events are only partially known.
9-11
: Clarify the explanation of the special case for borrow events.The explanation of how event dates are used is informative. However, the description of the special case for borrow events with unknown years could be clearer.
Consider rephrasing the special case explanation for better clarity. For example:
-One special case is borrow events with unknown years: the Project assumes that the return was the same or successive year, so even though the years are unknown the duration can be calculated. +One special case is borrow events with unknown years: the Project assumes that the return occurred in either the same year or the following year. This assumption allows for duration calculation even when the specific years are unknown.🧰 Tools
🪛 LanguageTool
[misspelling] ~11-~11: This word is normally spelled with a hyphen.
Context: ... for subscriptions and borrow events in human readable format and number of days. The [Shakes...(EN_COMPOUNDS_HUMAN_READABLE)
13-15
: Address language issues and improve readability.This section provides valuable context about the project's technical aspects and research applications. However, there are some minor language issues that could be addressed to improve readability.
Consider the following changes:
- Add hyphens to compound adjectives:
-The dataset includes durations for subscriptions and borrow events in human readable format and number of days. +The dataset includes durations for subscriptions and borrow events in human-readable format and number of days.
- Improve sentence structure and clarity:
-Data analysis tasks that take these kinds of dates into account currently require either ignoring partially known dates or coercing them to a specific date, e.g. the first day of the month or year. +Data analysis tasks that consider these types of dates currently require either ignoring partially known dates or coercing them to a specific date (e.g., the first day of the month or year).
- Use more concise language:
-Our hope is that in the future, `undate` will make it possible to work with those partial dates instead of ignoring them. +In the future, we anticipate that `undate` will enable working with partial dates rather than ignoring them.🧰 Tools
🪛 LanguageTool
[misspelling] ~13-~13: This word is normally spelled with a hyphen.
Context: ...rowing events and subscriptions in both human readable and machine readable form (text and cou...(EN_COMPOUNDS_HUMAN_READABLE)
[misspelling] ~13-~13: This word is normally spelled with a hyphen.
Context: ...ubscriptions in both human readable and machine readable form (text and count by days). Data a...(EN_COMPOUNDS_MACHINE_READABLE)
[style] ~15-~15: This phrase might be unnecessarily wordy.
Context: ...koeser/shxco-missingdata-specreading)). Our hope is that in the future,undate
will make ...(MY_HOPE_IS)
pyproject.toml (3)
1-3
: Approved: Good choice of build system.The transition to hatchling as the build system is a positive change. Hatchling is a modern, flexible build system for Python projects.
Consider adding a version constraint to the hatchling requirement:
-requires = ["hatchling"] +requires = ["hatchling>=1.0.0"]This ensures compatibility with future versions of hatchling.
82-87
: Approved: Appropriate sdist build configuration.The inclusion of Python files, .lark files, and tests in the source distribution is appropriate.
Consider adding a
exclude
section to explicitly exclude any files or directories that shouldn't be included in the distribution, such as:exclude = [ "tests/**/__pycache__", "**/*.pyc", "**/*.pyo", "**/*.pyd", ]This helps ensure that no unnecessary files are included in the distribution.
89-98
: Approved: Good test and type checking configurations.The pytest and mypy configurations are well-set for the project's needs. The custom markers for pytest and the numpy typing plugin for mypy are good practices.
Consider adding a
testpaths
option to the pytest configuration to explicitly specify where to look for tests:[tool.pytest.ini_options] testpaths = ["tests"]This can help pytest find your tests more efficiently, especially in larger projects.
docs/conf.py (1)
35-39
: LGTM: Sphinx extension updated to MyST Parser.The change from "m2r2" to "myst_parser" is a good improvement, as MyST Parser is a more modern and feature-rich Markdown parser for Sphinx.
Consider adding the "sphinx.ext.napoleon" extension to support Google-style docstrings, which can improve code documentation readability. You can add it like this:
extensions = [ "sphinx.ext.autodoc", "sphinx.ext.intersphinx", "myst_parser", + "sphinx.ext.napoleon", ]
DEVELOPER_NOTES.md (6)
1-12
: LGTM! Clear and concise introduction and setup instructions.The introduction and setup instructions are well-structured and easy to follow. The use of markdown formatting enhances readability.
Consider changing "checkout" to "check out" in the subheading on line 7 for grammatical correctness:
-### Use git to checkout a copy of the repository +### Use git to check out a copy of the repository🧰 Tools
🪛 LanguageTool
[grammar] ~7-~7: The word “checkout” is a noun. The verb is spelled with a white space.
Context: ... you started. ## Setup ### Use git to checkout a copy of the repository ```sh git clon...(NOUN_VERB_CONFUSION)
13-31
: LGTM! Comprehensive git-flow instructions with a manual alternative.The git-flow instructions are well-explained and cover all necessary steps. The inclusion of a manual alternative is thoughtful and accommodating.
Consider adding a brief explanation of why git-flow is recommended for this project. This could help developers understand the benefits and encourage adoption.
48-55
: LGTM! Clear instructions for setting up pre-commit hooks.The instructions for installing pre-commit hooks are concise and easy to follow. The explanation of why pre-commit is used is helpful for new contributors.
Consider adding a link to the pre-commit documentation (https://pre-commit.com/) for developers who want to learn more about the tool.
🧰 Tools
🪛 LanguageTool
[style] ~48-~48: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...sh pip install -e ".[dev]"
### Install pre-commit hooks ```sh pre-commit inst...(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
56-76
: LGTM! Comprehensive instructions for tests, type checking, and documentation.The instructions for running unit tests, checking Python types, and building documentation are clear and detailed. The inclusion of specific commands and examples is very helpful.
Consider adding a brief explanation of why type checking is important and how it benefits the project. This could help developers understand the value of maintaining proper type annotations.
77-89
: LGTM! Clear instructions for building documentation.The instructions for building documentation with Sphinx are clear and include all necessary steps. The indication of where to find the generated documentation is helpful.
To address the Markdownlint issue, consider specifying the language for the fenced code blocks. For example:
-``` +```sh pip install -e ".[docs]"-
+
sh
sphinx-build docs docs/_build
1-89
: Excellent developer notes! A few minor suggestions for improvement.The DEVELOPER_NOTES.md file provides comprehensive and well-structured instructions for contributing to the undate project. It covers all major aspects of the development process, from initial setup to running tests and building documentation.
Consider the following minor improvements to enhance the document further:
- Add a table of contents at the beginning of the file for easier navigation.
- Include information about the project's coding style guidelines or link to a separate style guide document.
- Mention any specific requirements for commit messages or pull request descriptions.
- Add a section on how to report issues or suggest improvements to the project.
These additions would make the document even more comprehensive and helpful for new contributors.
🧰 Tools
🪛 LanguageTool
[grammar] ~7-~7: The word “checkout” is a noun. The verb is spelled with a white space.
Context: ... you started. ## Setup ### Use git to checkout a copy of the repository ```sh git clon...(NOUN_VERB_CONFUSION)
[style] ~48-~48: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...sh pip install -e ".[dev]"
### Install pre-commit hooks ```sh pre-commit inst...(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 Markdownlint
35-35: null
Fenced code blocks should have a language specified(MD040, fenced-code-language)
CONTRIBUTORS.md (1)
9-45
: LGTM: Well-structured contributors table with a minor suggestionThe contributors table is well-implemented and provides comprehensive information about each contributor, including their avatars, names, GitHub profiles, and contribution types. The use of All Contributors and markdown-specific comments ensures proper rendering and maintenance.
Consider adding a brief legend explaining the contribution type icons (e.g., 💻 for code, 👀 for reviews) to improve clarity for new contributors or visitors unfamiliar with the All Contributors specification.
README.md (6)
Line range hint
1-21
: Excellent improvements to the README introduction!The changes to the title, addition of the logo, and expanded introduction greatly enhance the presentation and clarity of the project. The badges provide valuable information about the project's status, documentation, and code quality.
Consider formatting the warning message for better visibility:
> ⚠️ **WARNING:** This is pre-alpha software and is **NOT** feature complete! Use with caution. ⚠️
This blockquote format will make the warning stand out more prominently.
33-46
: Excellent code examples for creating Undate objects!The code examples effectively demonstrate the various ways to create
Undate
objects, including fully known dates, partially known dates, and dates with labels. This showcases the flexibility of theUndate
class.Consider adding a brief comment to explain the "19XX" and "1X" syntax for partially known dates:
# 'X' represents an unknown digit in year or month partially_known_year = Undate("19XX") # A year in the 1900s partially_known_month = Undate(2022, "1X") # A month in 2022 starting with 1 (e.g., 10, 11, or 12)This will help users understand the syntax for representing partially known dates more quickly.
47-67
: Great examples for string conversion and duration calculation!The examples effectively demonstrate how to convert
Undate
objects to strings and calculate their durations. The inclusion of various date precisions and special cases like leap years provides a comprehensive overview of the library's capabilities.Consider adding an example of custom string formatting for dates, if the library supports it. For instance:
# Example of custom string formatting (if supported) custom_format = "YYYY-MM-DD" print(november7.format(custom_format)) # Output: 2000-11-07This would showcase the flexibility of the library in terms of date representation.
70-118
: Excellent examples for date comparisons and sorting!The examples effectively demonstrate how to check if one date falls within another, compare and sort
Undate
objects, and handle comparisons with Python's built-indatetime.date
objects. The inclusion of error handling for ambiguous comparisons is particularly valuable.Consider adding a brief explanation of how the library determines the earliest and latest possible dates for imprecise dates. For example:
# Explanation of earliest and latest possible dates for imprecise dates partially_known_year = Undate("19XX") print(f"Earliest possible date: {partially_known_year.earliest}") # Output: 1900-01-01 print(f"Latest possible date: {partially_known_year.latest}") # Output: 1999-12-31This would provide users with a clearer understanding of how the library handles imprecise dates in comparisons.
120-154
: Great examples for UndateInterval and date parsing!The examples effectively demonstrate how to create and use
UndateInterval
objects, including open-ended intervals and labeled intervals. The date parsing examples using the ISO8601 formatter showcase the library's flexibility in handling different date string formats.Consider adding an example of how to check if a date falls within an interval:
# Example of checking if a date falls within an interval interval = UndateInterval(Undate(1900), Undate(2000)) date_to_check = Undate(1950) print(date_to_check in interval) # Output: TrueThis would demonstrate another useful feature of the
UndateInterval
class.
28-28
: Minor suggestion for improved readabilityConsider adding a comma after "information" to improve readability:
-Read [Contributors](CONTRIBUTORS.md) for detailed contribution information. +Read [Contributors](CONTRIBUTORS.md) for detailed contribution information,This small change can help separate the two main clauses in the sentence.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~28-~28: Possible missing comma found.
Context: ...ibution information. ## Example Usage Often humanities and cultural data include im...(AI_HYDRA_LEO_MISSING_COMMA)
examples/notebooks/shxco_partial_date_durations.ipynb (1)
44-44
: Approved: Improved data loading with local file and performance optimization.The changes to the data loading process are beneficial:
- Using a relative path to a local copy of the dataset improves portability and reduces dependency on internet access.
- Adding
low_memory=False
helps prevent potential issues with mixed data types in the CSV file.Consider adding a comment explaining why
low_memory=False
is used, and possibly include a link to the original dataset for reference. For example:# Load the local copy of the S&co events dataset (v1.2) # Original dataset: https://doi.org/10.34770/nz90-ym25 events_df = pd.read_csv("../use-cases/shakespeare-and-company-project/SCoData_events_v1.2_2022-01.csv", low_memory=False) # Prevents mixed type inference warningsAlso applies to: 291-291, 299-300
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (6)
_static/undate_logo.png
is excluded by!**/*.png
docs/_static/dhtech-logo.png
is excluded by!**/*.png
docs/_static/dhtech-logo.svg
is excluded by!**/*.svg
docs/_static/logo.png
is excluded by!**/*.png
docs/_static/undate_logo.png
is excluded by!**/*.png
examples/use-cases/shakespeare-and-company-project/SCoData_events_v1.2_2022-01.csv
is excluded by!**/*.csv
📒 Files selected for processing (30)
- .all-contributorsrc (5 hunks)
- .github/workflows/check.yml (2 hunks)
- .github/workflows/unit_tests.yml (1 hunks)
- .gitignore (1 hunks)
- .pre-commit-config.yaml (1 hunks)
- CONTRIBUTING.md (1 hunks)
- CONTRIBUTORS.md (1 hunks)
- DEVELOPER_NOTES.md (1 hunks)
- README.md (2 hunks)
- docs/CONTRIBUTING.md (1 hunks)
- docs/CONTRIBUTORS.md (1 hunks)
- docs/DEVELOPER_NOTES.md (1 hunks)
- docs/LICENSE.md (1 hunks)
- docs/_static/custom.css (1 hunks)
- docs/_templates/sidebar_dhtech.html (1 hunks)
- docs/authors.rst (0 hunks)
- docs/conf.py (3 hunks)
- docs/index.rst (1 hunks)
- docs/license.rst (0 hunks)
- docs/readme.md (1 hunks)
- docs/readme.rst (0 hunks)
- docs/requirements.txt (0 hunks)
- docs/undate.rst (1 hunks)
- examples/README.md (1 hunks)
- examples/notebooks/shxco_partial_date_durations.ipynb (2 hunks)
- examples/use-cases/shakespeare-and-company-project/README.md (1 hunks)
- pyproject.toml (1 hunks)
- pytest.ini (0 hunks)
- setup.cfg (0 hunks)
- setup.py (0 hunks)
💤 Files not reviewed due to no reviewable changes (7)
- docs/authors.rst
- docs/license.rst
- docs/readme.rst
- docs/requirements.txt
- pytest.ini
- setup.cfg
- setup.py
✅ Files skipped from review due to trivial changes (5)
- docs/CONTRIBUTING.md
- docs/CONTRIBUTORS.md
- docs/DEVELOPER_NOTES.md
- docs/LICENSE.md
- docs/undate.rst
🧰 Additional context used
🪛 actionlint
.github/workflows/unit_tests.yml
37-37: property "python-cache" is not defined in object type {}
(expression)
🪛 LanguageTool
CONTRIBUTING.md
[uncategorized] ~5-~5: Possible missing comma found.
Context: ...date Hey there! If you found your way here that probably means you are curious abo...(AI_HYDRA_LEO_MISSING_COMMA)
[style] ~5-~5: Consider a different adverb to strengthen your wording.
Context: ... to this project. This is great! We are always looking for new contributors. If you ca...(ALWAYS_CONSTANTLY)
[uncategorized] ~11-~11: Possible missing comma found.
Context: ...evelop** branch when you are ready. For details see [this description of the forking Wo...(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~12-~12: A comma might be missing here.
Context: ...comparing-workflows/forking-workflow). Ideally contributions should include documentat...(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)
[typographical] ~13-~13: It seems that a comma is missing.
Context: ...oposed changes, but if that is a barrier please let us know when you submit a pull requ...(IF_PLEASE_COMMA)
[uncategorized] ~16-~16: Possible missing preposition found.
Context: ...Requests If you find a bug or can think a feature you would really like to see be...(AI_EN_LECTOR_MISSING_PREPOSITION)
DEVELOPER_NOTES.md
[grammar] ~7-~7: The word “checkout” is a noun. The verb is spelled with a white space.
Context: ... you started. ## Setup ### Use git to checkout a copy of the repository ```sh git clon...(NOUN_VERB_CONFUSION)
[style] ~48-~48: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...sh pip install -e ".[dev]"
### Install pre-commit hooks ```sh pre-commit inst...(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
README.md
[uncategorized] ~28-~28: Possible missing comma found.
Context: ...ibution information. ## Example Usage Often humanities and cultural data include im...(AI_HYDRA_LEO_MISSING_COMMA)
examples/use-cases/shakespeare-and-company-project/README.md
[misspelling] ~11-~11: This word is normally spelled with a hyphen.
Context: ... for subscriptions and borrow events in human readable format and number of days. The [Shakes...(EN_COMPOUNDS_HUMAN_READABLE)
[misspelling] ~13-~13: This word is normally spelled with a hyphen.
Context: ...rowing events and subscriptions in both human readable and machine readable form (text and cou...(EN_COMPOUNDS_HUMAN_READABLE)
[misspelling] ~13-~13: This word is normally spelled with a hyphen.
Context: ...ubscriptions in both human readable and machine readable form (text and count by days). Data a...(EN_COMPOUNDS_MACHINE_READABLE)
[style] ~15-~15: This phrase might be unnecessarily wordy.
Context: ...koeser/shxco-missingdata-specreading)). Our hope is that in the future,undate
will make ...(MY_HOPE_IS)
🪛 Markdownlint
CONTRIBUTORS.md
49-49: null
Link fragments should be valid(MD051, link-fragments)
DEVELOPER_NOTES.md
35-35: null
Fenced code blocks should have a language specified(MD040, fenced-code-language)
examples/use-cases/shakespeare-and-company-project/README.md
22-22: null
Bare URL used(MD034, no-bare-urls)
🔇 Additional comments (40)
docs/_templates/sidebar_dhtech.html (1)
1-8
: Overall assessment: Good addition with room for minor improvementsThe new HTML snippet for the sidebar is a good addition, providing clear attribution to the DHTech Community. The structure is generally sound, and basic accessibility features are in place. With the suggested improvements to semantic structure and accessibility, this will be an excellent enhancement to the documentation sidebar.
docs/index.rst (2)
1-1
: LGTM: Title change improves consistency.The change from "Undate documentation" to "undate documentation" aligns the title with the lowercase project name mentioned in the description. This improves overall consistency in the documentation.
8-8
: Verify the impact of reduced ToC depth.The maximum depth of the table of contents has been reduced from 3 to 2. While this can simplify the documentation structure, it might also hide important subsections from the main table of contents. Please ensure that this change doesn't negatively impact the discoverability of crucial information.
To assess the impact, you can run the following script to check the structure of the linked documents:
This script will help you visualize the heading structure of the linked documents and determine if important subsections are now hidden due to the reduced ToC depth.
.gitignore (1)
43-44
: LGTM! Good addition for Jupyter notebook users.The addition of ".ipynb_checkpoints/" to the .gitignore file is a good practice. This prevents Jupyter notebook checkpoint files from being tracked by Git, which is beneficial because:
- These files are temporary and don't need version control.
- It keeps the repository clean and focused on actual project files.
- It avoids potential conflicts caused by these auto-generated files.
This change will be helpful for any contributors using Jupyter notebooks with this project.
.pre-commit-config.yaml (2)
15-15
: Good addition of trailing-whitespace hook.The inclusion of the trailing-whitespace hook is a positive change. It helps maintain clean code by removing unnecessary whitespace at the end of lines.
16-20
: Excellent addition of mypy for static type checking.The inclusion of mypy (v1.11.2) is a great improvement for ensuring type safety in the codebase. This aligns well with the project's goals and can help catch potential type-related issues early in the development process.
.github/workflows/check.yml (6)
16-19
: Excellent update to Python 3.12!Upgrading to Python 3.12 is a great move. It allows the project to leverage the latest language features, performance improvements, and security updates.
22-23
: Improved package installation step.The renaming of the step to "Install package with development dependencies" and the updated installation command
pip install -e ".[dev]"
provide better clarity and ensure that development dependencies are included. This is a positive change for maintaining a consistent development environment.
Line range hint
1-40
: Overall, these changes significantly improve the workflow.The updates to this GitHub Actions workflow file demonstrate a commitment to using up-to-date tools and practices. Key improvements include:
- Upgrading to Python 3.12
- Shifting to pyproject.toml for project configuration
- Clarifying the package installation process
- Simplifying the documentation build step
- Adding checks for Jupyter notebooks
These changes should lead to a more robust and maintainable CI process. However, please consider the following points:
- Clarify the decision to comment out the
--fail-on-warning
flag in the documentation build step.- Ensure that the project's development documentation is updated to reflect these changes, particularly the use of 'treon' for notebook checking.
Great work on modernizing this workflow!
🧰 Tools
🪛 actionlint
24-24: property "python-cache" is not defined in object type {}
(expression)
38-40
: Great addition of Jupyter notebook checks!Adding a step to check Jupyter notebooks using 'treon' is an excellent practice. This will help ensure that any example notebooks in the project remain functional and up-to-date.
To further improve this addition:
- Consider adding a comment explaining the purpose of this check for future maintainers.
- It might be helpful to document the 'treon' tool and its usage in the project's development documentation.
Let's verify the presence of Jupyter notebooks in the project:
#!/bin/bash # Check for the presence of Jupyter notebooks notebook_count=$(find . -name "*.ipynb" | wc -l) if [ "$notebook_count" -gt 0 ]; then echo "Found $notebook_count Jupyter notebook(s) in the project." echo "Notebook locations:" find . -name "*.ipynb" else echo "No Jupyter notebooks found in the project." echo "Consider removing the treon check if notebooks are not used." fi
32-32
: Updated documentation build process, but consider warning handling.The switch from
tox -e docs
tosphinx-build docs docs/_build
simplifies the documentation build process by directly using Sphinx. This is generally a good change for clarity and directness.However, I noticed that the
--fail-on-warning
flag is commented out.Could you clarify the reasoning behind commenting out this flag? It's often beneficial to fail the build on documentation warnings to maintain high-quality documentation. If there's a specific reason for this change, it might be worth documenting it in a comment.
Let's check if there are any existing documentation warnings:
#!/bin/bash # Run sphinx-build with warnings enabled and capture the output sphinx_output=$(sphinx-build -W -b html docs docs/_build 2>&1) # Check if there were any warnings if echo "$sphinx_output" | grep -q "WARNING"; then echo "Warnings detected in documentation build:" echo "$sphinx_output" | grep "WARNING" else echo "No warnings detected in documentation build." fi
21-21
: Correct update to use pyproject.toml for caching.The change from 'setup.cfg' to 'pyproject.toml' for the cache dependency path is appropriate and aligns with modern Python packaging standards.
Let's verify that the project has indeed moved to using pyproject.toml:
✅ Verification successful
Let's re-verify the presence of
pyproject.toml
and the absence ofsetup.cfg
with a more straightforward script:
Cache dependency path correctly updated to
pyproject.toml
.The presence of
pyproject.toml
and the absence ofsetup.cfg
confirm that the project has transitioned to usingpyproject.toml
for configuration. The cache path update aligns with the current project setup.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify the existence of pyproject.toml and absence of setup.cfg if [ -f "pyproject.toml" ] && [ ! -f "setup.cfg" ]; then echo "Confirmed: pyproject.toml exists and setup.cfg is absent." else echo "Warning: Expected pyproject.toml to exist and setup.cfg to be absent." ls -l pyproject.toml setup.cfg 2>/dev/null fiLength of output: 442
Script:
#!/bin/bash # Verify the existence of pyproject.toml and absence of setup.cfg if [ -f "pyproject.toml" ]; then echo "pyproject.toml exists." else echo "pyproject.toml does not exist." fi if [ -f "setup.cfg" ]; then echo "setup.cfg exists." else echo "setup.cfg does not exist." fiLength of output: 166
.github/workflows/unit_tests.yml (3)
41-41
: Approve the transition to pytest and suggest additional considerations.The change from
tox
to directly usingpytest
for test execution simplifies the CI process. The explicit coverage command for theundate
package is also a good improvement.Consider the following enhancements:
Add pytest configuration:
Create apytest.ini
or add a[tool.pytest.ini_options]
section inpyproject.toml
to centralize pytest configurations.Consistent pytest options:
Consider adding--verbose
and--capture=no
options to both test commands for more detailed output:run: pytest --verbose --capture=norun: pytest --verbose --capture=no --cov=undateCoverage report:
Add a coverage report generation step after the coverage run:- name: Generate coverage report run: | pytest --cov=undate --cov-report=xml coverage report if: ${{ matrix.python == env.COV_PYTHON_VERSION }}These suggestions can enhance the visibility and consistency of your test results.
Also applies to: 46-46
Line range hint
1-51
: Summary of workflow changes and their impact.The updates to this workflow file represent a significant modernization of the testing process:
- Transition from
setup.cfg
topyproject.toml
for dependency management.- Focus on test-specific dependencies instead of broader development dependencies.
- Direct use of pytest instead of tox for test execution.
- Explicit coverage configuration for the
undate
package.These changes should result in a more streamlined, faster, and more maintainable CI process. The workflow now aligns better with modern Python practices and provides a more direct approach to testing.
To fully leverage these improvements, consider implementing the suggested enhancements for pytest configuration, consistent options, and coverage reporting. Also, verify the cache step to ensure smooth execution of the workflow.
Overall, these changes are a positive step towards a more efficient and effective testing process for the project.
🧰 Tools
🪛 actionlint
37-37: property "python-cache" is not defined in object type {}
(expression)
37-37
: Verify the cache step and its outputs.The static analysis tool flagged an undefined property "python-cache". This might be a false positive, as it's common to check cache hits in GitHub Actions. However, it's worth verifying that the cache step is correctly defined earlier in the workflow.
Please run the following script to check for the cache step definition:
#!/bin/bash # Description: Verify the cache step in the workflow # Test: Check for the cache step definition rg 'uses: actions/cache@v\d+' -A 5 .github/workflows/unit_tests.ymlEnsure that there's a step using
actions/cache
with an id ofpython-cache
. If it's missing, consider adding it before the "Install package with dependencies" step:- uses: actions/cache@v3 id: python-cache with: path: ${{ env.pythonLocation }} key: ${{ env.pythonLocation }}-${{ hashFiles('**/pyproject.toml') }}If the cache step is present and correctly defined, you can safely ignore the static analysis hint.
🧰 Tools
🪛 actionlint
37-37: property "python-cache" is not defined in object type {}
(expression)
.all-contributorsrc (5)
17-18
: Approved: Updated contributions for ColeDCrawfordThe addition of "test" and "ideas" to ColeDCrawford's contributions accurately reflects their expanded role in the project. This update ensures proper recognition of their efforts.
42-44
: Approved: Updated contributions for robcastThe addition of "data", "ideas", and "review" to robcast's contributions accurately reflects their expanded role in the project. This update ensures proper recognition of their diverse contributions.
56-57
: Approved: Updated contributions for jdamerowThe addition of "eventOrganizing" and "ideas" to jdamerow's contributions accurately reflects their expanded role in the project. This update ensures proper recognition of their involvement in both practical and conceptual aspects of the project.
Line range hint
1-81
: Overall: Excellent updates to contributor recognitionThe changes made to
.all-contributorsrc
are well-considered and beneficial to the project. They include:
- Moving the contributors list to a dedicated file.
- Updating and expanding contributor roles for several team members.
- Adding an All Contributors badge for visual representation.
These updates improve the project's ability to recognize and showcase its contributors, which is crucial for open-source projects. Great job on maintaining and enhancing the contributor information!
79-80
: Approved: Added All Contributors badge templateThe addition of the "badgeTemplate" property is a good enhancement. It will provide a visual representation of the number of contributors in the project's README, which is excellent for showcasing community involvement.
Please verify that the badge appears correctly in the README. You can use the following script to check:
✅ Verification successful
Verified: All Contributors badge is correctly displayed
The All Contributors badge is present in the
README.md
, correctly showcasing the number of contributors.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the All Contributors badge in the README # Test: Check if the badge is present in the README rg --type md "!\[All Contributors\]" README.md # If the badge is not found in README.md, check CONTRIBUTORS.md if [ $? -ne 0 ]; then echo "Badge not found in README.md. Checking CONTRIBUTORS.md..." rg --type md "!\[All Contributors\]" CONTRIBUTORS.md fiLength of output: 183
CONTRIBUTING.md (3)
1-1
: Title change looks good.The simplified title "Contributing to undate" is more concise and aligns well with the project name. It's a good improvement that maintains clarity while reducing redundancy.
26-26
: Good update to help-seeking guidelines.The recommendation to join the DHTech Slack for help is a positive change. It encourages community engagement and provides a platform for real-time support, which can be more efficient than GitHub issues for many types of questions or problems.
Line range hint
1-30
: Overall, excellent updates to the contribution guidelines.The changes to this file significantly improve the clarity and effectiveness of the contribution process. Key improvements include:
- Simplified title
- Updated setup instructions
- Clear guidelines for submitting changes
- New section for use cases and example data
- Improved help-seeking process
These updates should enhance the contributor experience and promote better quality contributions to the project.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~5-~5: Possible missing comma found.
Context: ...date Hey there! If you found your way here that probably means you are curious abo...(AI_HYDRA_LEO_MISSING_COMMA)
[style] ~5-~5: Consider a different adverb to strengthen your wording.
Context: ... to this project. This is great! We are always looking for new contributors. If you ca...(ALWAYS_CONSTANTLY)
[uncategorized] ~11-~11: Possible missing comma found.
Context: ...evelop** branch when you are ready. For details see [this description of the forking Wo...(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~12-~12: A comma might be missing here.
Context: ...comparing-workflows/forking-workflow). Ideally contributions should include documentat...(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)
[typographical] ~13-~13: It seems that a comma is missing.
Context: ...oposed changes, but if that is a barrier please let us know when you submit a pull requ...(IF_PLEASE_COMMA)
[uncategorized] ~16-~16: Possible missing preposition found.
Context: ...Requests If you find a bug or can think a feature you would really like to see be...(AI_EN_LECTOR_MISSING_PREPOSITION)
[style] ~29-~29: Using many exclamation marks might seem excessive (in this case: 5 exclamation marks for a text that’s 2528 characters long)
Context: ...oining](https://dh-tech.github.io/join/)!(EN_EXCESSIVE_EXCLAMATION)
examples/use-cases/shakespeare-and-company-project/README.md (2)
1-3
: LGTM: Clear title and author information.The title and author information are well-presented, providing clear context for the document.
1-24
: Overall, well-written and informative README with minor suggestions for improvement.This README provides a comprehensive overview of the Shakespeare and Company Project dataset, its usage, and the challenges associated with partial dates. The content is informative and well-structured. The suggested improvements are minor and focus on enhancing readability, clarifying certain points, and improving Markdown formatting.
Key strengths:
- Clear explanation of the dataset and its unique features
- Discussion of data analysis challenges and potential solutions
- Proper citation information and DOI link
Areas for improvement:
- Minor language and formatting enhancements
- Clearer explanation of special cases
- More prominent presentation of important links and sections
These suggestions will further improve an already solid documentation file.
🧰 Tools
🪛 LanguageTool
[misspelling] ~11-~11: This word is normally spelled with a hyphen.
Context: ... for subscriptions and borrow events in human readable format and number of days. The [Shakes...(EN_COMPOUNDS_HUMAN_READABLE)
[misspelling] ~13-~13: This word is normally spelled with a hyphen.
Context: ...rowing events and subscriptions in both human readable and machine readable form (text and cou...(EN_COMPOUNDS_HUMAN_READABLE)
[misspelling] ~13-~13: This word is normally spelled with a hyphen.
Context: ...ubscriptions in both human readable and machine readable form (text and count by days). Data a...(EN_COMPOUNDS_MACHINE_READABLE)
[style] ~15-~15: This phrase might be unnecessarily wordy.
Context: ...koeser/shxco-missingdata-specreading)). Our hope is that in the future,undate
will make ...(MY_HOPE_IS)
🪛 Markdownlint
22-22: null
Bare URL used(MD034, no-bare-urls)
pyproject.toml (3)
79-80
: Approved: Good version management practice.Setting the version to be read from
src/undate/__init__.py
is a good practice for single-sourcing the version number.
43-69
:⚠️ Potential issueConsider refining optional dependencies.
The categorization of optional dependencies is clear and logical. However, there are a few points to consider:
- Including 'undate' as a dev dependency might lead to circular dependencies.
- Some packages are missing version constraints, which could lead to compatibility issues in the future.
Consider applying these changes:
- Remove 'undate' from the dev dependencies to avoid circular dependencies.
- Add version constraints to packages without them.
Example diff:
dev = [ "black>=22.10.0", "pre-commit>=2.20.0", - "twine", - "wheel", - "build", - "mypy", - "treon", - "undate", - "undate[docs]", - "undate[test]", + "twine>=4.0.0", + "wheel>=0.40.0", + "build>=0.10.0", + "mypy>=1.0.0", + "treon>=0.1.3", ]Let's check if 'undate' is used as a dependency elsewhere in the project:
#!/bin/bash # Check for 'undate' usage as a dependency rg "undate" --type python
71-76
:⚠️ Potential issueApproved: Comprehensive project URLs with a minor issue.
Providing these URLs is beneficial for users and contributors. However, there's a small issue with the changelog URL.
The changelog URL appears to have an incorrect path. Please update it as follows:
-Changelog = "https://github.com/dh-tech/undate/main/master/CHANGELOG.md" +Changelog = "https://github.com/dh-tech/undate-python/blob/main/CHANGELOG.md"Let's verify the existence of the CHANGELOG.md file:
#!/bin/bash # Check for the existence of CHANGELOG.md fd -t f "CHANGELOG.md"docs/conf.py (5)
20-21
: LGTM: Project information updated correctly.The project name has been updated to lowercase "undate" for consistency, and the copyright year has been correctly updated to 2024.
49-52
: LGTM: Improved file type parsing configuration.The change from a list to a dictionary for
source_suffix
is a good improvement. It provides more explicit control over how different file types are parsed, which can prevent ambiguities and improve the documentation build process.
59-59
: LGTM: HTML theme updated to Alabaster.The change from "sphinx_rtd_theme" to "alabaster" is fine. Alabaster is the default theme for Sphinx and is known for its clean, minimal design.
Please verify that the new theme meets your documentation's visual requirements. You may want to build the documentation locally to check its appearance.
66-67
: Clarify the logo configuration.The logo configuration has been commented out, but there's a logo setting in the
html_theme_options
below. This inconsistency might lead to confusion.Please clarify the intended logo configuration:
- If you want to use the logo, uncomment this line and ensure it's consistent with the
html_theme_options
.- If you don't want to use the logo, remove this commented line and the logo setting from
html_theme_options
.
69-76
: LGTM: Appropriate Alabaster theme options added.The new
html_theme_options
configuration is well-structured and includes useful settings for the Alabaster theme. The GitHub repository information and sidebar settings will enhance the documentation's usability.DEVELOPER_NOTES.md (2)
32-38
: LGTM! Clear instructions for setting up a Python virtual environment.The instructions for creating a Python virtual environment are concise and easy to follow. The recommendation to use a recent version of Python 3 is appropriate.
🧰 Tools
🪛 Markdownlint
35-35: null
Fenced code blocks should have a language specified(MD040, fenced-code-language)
40-46
: LGTM! Clear instructions for installing the package with development dependencies.The instructions for installing the local version of undate with development dependencies are clear and use the correct pip command. The use of an editable install is appropriate for development.
CONTRIBUTORS.md (2)
1-8
: LGTM: Clear introduction and well-implemented badgeThe introduction and badge section are well-structured and follow the All Contributors specification. The badge provides a visual representation of the number of contributors, which is a great addition to the project.
1-50
: Excellent addition: Comprehensive contributor recognitionThis
CONTRIBUTORS.md
file is an excellent addition to the project. It effectively recognizes all contributors and their various types of contributions, which is crucial for fostering a healthy open-source community. The use of the All Contributors specification ensures a standardized and visually appealing way to present this information.Key strengths:
- Clear introduction explaining the purpose of using All Contributors
- Visual badge showing the number of contributors
- Detailed table with contributor information and contribution types
- Section for related blog posts, providing additional context
The minor issue with the link fragment has been addressed in a previous comment. Once that's fixed, this file will be a valuable asset to the project, encouraging recognition and potentially attracting new contributors.
🧰 Tools
🪛 Markdownlint
49-49: null
Link fragments should be valid(MD051, link-fragments)
README.md (3)
22-25
: Good update to the contributors section!The all-contributors badge and the link to the CONTRIBUTORS.md file improve the visibility of contributors and make it easier for users to find detailed contribution information.
27-31
: Great addition of the Example Usage section!The introduction to the Example Usage section effectively explains the need for handling imprecise temporal information in humanities and cultural data. It provides valuable context for potential users and clearly introduces the
Undate
object and its capabilities.🧰 Tools
🪛 LanguageTool
[uncategorized] ~28-~28: Possible missing comma found.
Context: ...ibution information. ## Example Usage Often humanities and cultural data include im...(AI_HYDRA_LEO_MISSING_COMMA)
157-165
: Good organization of documentation and license information!The documentation section effectively directs users to the ReadTheDocs page and the separate developer notes file. The license section clearly states the project's licensing terms. This organization keeps the README concise while providing easy access to important information for users and potential contributors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Outside diff range and nitpick comments (6)
src/undate/dateformat/iso8601.py (1)
Line range hint
1-69
: Address TODO comments and type checking issuesThere are several TODO comments and type checking issues throughout the file that need attention:
- Line 26: TODO for handling full ISO format dates with time.
- Line 27: TODO for handling invalid formats.
- Line 54: TODO for handling too many parts in date parsing.
- Line 60: TODO for error handling when year and day are known but month is missing.
- Line 76: TODO for fixing a type error with
join()
method.Additionally, there are type checking issues mentioned in comments, such as on line 54.
These TODOs and type issues should be addressed to improve the robustness and type safety of the code.
Would you like assistance in resolving any of these TODOs or type checking issues? I can help generate solutions or open GitHub issues to track these tasks.
src/undate/date.py (1)
56-58
: Evaluate the necessity of theExport
method.The
Export
method currently returnsself
without additional processing. If it does not serve a specific purpose or align with an interface requirement, consider removing it to simplify the class.src/undate/undate.py (4)
267-269
: Add return type annotation tofrom_datetime_date
static method.Adding a return type annotation enhances type checking and code clarity.
Apply this diff to specify the return type:
@staticmethod -def from_datetime_date(dt_date: datetime.date): +def from_datetime_date(dt_date: datetime.date) -> "Undate": """Initialize an :class:`Undate` object from a :class:`datetime.date`""" return Undate(dt_date.year, dt_date.month, dt_date.day)
287-287
: Specify return type annotation for theduration
method inUndate
class.Including the return type improves readability and assists static type checkers.
Apply this diff to add the return type:
-def duration(self): +def duration(self) -> np.timedelta64: """What is the duration of this date? Calculate based on earliest and latest date within range, taking into account the precision of the date even if not all parts of the date are known."""
305-309
: Remove commented-out code to improve code cleanliness.The commented code is unnecessary and can be safely removed to enhance readability.
Apply this diff to remove the commented lines:
latest = Date(self.earliest.year, self.latest.month, self.latest.day) - # latest = datetime.date( - # self.earliest.year, self.latest.month, self.latest.day - # )
396-396
: Add return type annotation toduration
method inUndateInterval
class.Specifying the return type enhances code clarity and type safety.
Apply this diff to include the return type:
-def duration(self): +def duration(self) -> np.timedelta64: """Calculate the duration between two undates.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (16)
- .pre-commit-config.yaml (2 hunks)
- CONTRIBUTORS.md (1 hunks)
- README.md (2 hunks)
- docs/conf.py (3 hunks)
- pyproject.toml (1 hunks)
- src/undate/date.py (1 hunks)
- src/undate/dateformat/base.py (1 hunks)
- src/undate/dateformat/edtf/parser.py (0 hunks)
- src/undate/dateformat/edtf/transformer.py (1 hunks)
- src/undate/dateformat/iso8601.py (2 hunks)
- src/undate/undate.py (9 hunks)
- tests/test_date.py (1 hunks)
- tests/test_dateformat/edtf/test_edtf_parser.py (0 hunks)
- tests/test_dateformat/edtf/test_edtf_transformer.py (1 hunks)
- tests/test_dateformat/test_base.py (0 hunks)
- tests/test_undate.py (3 hunks)
💤 Files not reviewed due to no reviewable changes (3)
- src/undate/dateformat/edtf/parser.py
- tests/test_dateformat/edtf/test_edtf_parser.py
- tests/test_dateformat/test_base.py
✅ Files skipped from review due to trivial changes (3)
- src/undate/dateformat/base.py
- src/undate/dateformat/edtf/transformer.py
- tests/test_dateformat/edtf/test_edtf_transformer.py
🚧 Files skipped from review as they are similar to previous changes (2)
- README.md
- docs/conf.py
🧰 Additional context used
🪛 Markdownlint
CONTRIBUTORS.md
51-51: Expected: 0; Actual: 2
Unordered list indentation(MD007, ul-indent)
50-50: null
Link fragments should be valid(MD051, link-fragments)
🪛 Ruff
tests/test_undate.py
1-1:
datetime.timedelta
imported but unusedRemove unused import:
datetime.timedelta
(F401)
🔇 Additional comments (30)
.pre-commit-config.yaml (4)
3-8
: Excellent choice in transitioning to ruff!The replacement of black with ruff is a great improvement. Ruff combines linting and formatting, which can significantly speed up your pre-commit hooks. The configuration looks well thought out:
- Using
ruff
for linting with import sorting (--select I
) and auto-fixing.- Using
ruff-format
for code formatting, which is compatible with black's style.This setup should provide faster and more comprehensive code quality checks.
17-17
: Good decision to keep the trailing-whitespace hook.Retaining this hook ensures consistent whitespace across the codebase, complementing the ruff configuration.
18-22
: Great addition of mypy for type checking!Adding mypy to the pre-commit hooks is an excellent practice for catching type-related issues early in development. The inclusion of
types-python-dateutil
is appropriate for projects using dateutil.However, regarding the numpy dependency:
As mentioned in a previous comment, consider making numpy an optional dependency. This approach would allow users to install numpy only if they require the extended year range, as mentioned in the PR summary.
You could implement this by:
- Moving the numpy dependency to a separate optional-dependencies section in your project configuration.
- Updating the mypy configuration to conditionally include numpy types when it's installed.
- Using conditional imports in your code to handle cases where numpy isn't installed.
This change would require careful testing to ensure the code works correctly both with and without numpy installed.
Line range hint
1-22
: Overall, excellent improvements to the pre-commit configuration!The changes in this file significantly enhance the project's code quality checks:
- Transitioning from black to ruff combines linting and formatting, potentially speeding up pre-commit hooks.
- Retaining the trailing-whitespace hook ensures consistent whitespace.
- Adding mypy for type checking helps catch type-related issues early.
These improvements will lead to more robust and consistent code across the project. The only consideration is to evaluate making numpy an optional dependency, as discussed in the mypy configuration comment.
tests/test_date.py (10)
1-2
: LGTM: Import statements are correct and necessary.The import statements are now correctly utilized in the test file. The
numpy
import is used for creating adatetime64
object, and bothDate
andDatePrecision
are used throughout the tests.
5-11
: LGTM: TestDatePrecision class looks good.The
TestDatePrecision
class contains two well-structured test methods:
test_str
: Correctly verifies the string representation ofDatePrecision.YEAR
.test_precision_comparison
: Properly checks the ordering of precision levels.These tests provide good coverage for the
DatePrecision
enum functionality.
15-19
: LGTM: test_init_year method is well-structured.The
test_init_year
method effectively tests the initialization of aDate
object with only a year:
- It verifies that the created object is an instance of
Date
.- It checks that the
dtype
is correctly set to "datetime64[Y]".- It ensures the string representation is correct.
These assertions provide good coverage for year-only initialization.
21-25
: LGTM: test_init_year_np_datetime64 method is well-implemented.The
test_init_year_np_datetime64
method effectively tests the initialization of aDate
object with a numpy datetime64 object:
- It correctly creates a numpy datetime64 object for the year 2024.
- It verifies that the created object is an instance of
Date
.- It checks that the
dtype
is correctly set to "datetime64[Y]".- It ensures the string representation is correct.
These assertions provide good coverage for initialization with a numpy datetime64 object.
27-31
: LGTM: test_init_year_month method is correctly implemented.The
test_init_year_month
method effectively tests the initialization of aDate
object with a year and month:
- It verifies that the created object is an instance of
Date
.- It checks that the
dtype
is correctly set to "datetime64[M]".- It ensures the string representation is correct, including both year and month.
These assertions provide good coverage for year-month initialization.
33-37
: LGTM: test_init_year_month_day method is properly implemented.The
test_init_year_month_day
method effectively tests the initialization of aDate
object with a full date (year, month, and day):
- It verifies that the created object is an instance of
Date
.- It checks that the
dtype
is correctly set to "datetime64[D]".- It ensures the string representation is correct, including year, month, and day.
These assertions provide good coverage for full date initialization.
39-42
: LGTM: test_properties_year method covers all scenarios.The
test_properties_year
method effectively tests the 'year' property ofDate
objects:
- It checks the 'year' property for a year-only initialization.
- It verifies the 'year' property for a year-month initialization.
- It tests the 'year' property for a full date initialization.
These assertions ensure that the 'year' property works correctly across all initialization scenarios.
44-47
: LGTM: test_properties_month method is comprehensive.The
test_properties_month
method effectively tests the 'month' property ofDate
objects:
- It correctly checks that the 'month' property is None for a year-only initialization.
- It verifies the 'month' property for a year-month initialization.
- It tests the 'month' property for a full date initialization.
These assertions ensure that the 'month' property works correctly across all initialization scenarios.
49-52
: LGTM: test_properties_day method is well-implemented.The
test_properties_day
method effectively tests the 'day' property ofDate
objects:
- It correctly checks that the 'day' property is None for a year-only initialization.
- It verifies that the 'day' property is None for a year-month initialization.
- It tests the 'day' property for a full date initialization.
These assertions ensure that the 'day' property works correctly across all initialization scenarios. The use of 'is' for None comparisons is the correct practice.
1-52
: Overall, the test file is well-structured and provides comprehensive coverage.The
tests/test_date.py
file effectively tests theDate
andDatePrecision
classes:
- It covers all initialization scenarios for the
Date
class.- It tests all properties (year, month, day) across different precision levels.
- It verifies the behavior of the
DatePrecision
enum.The test methods are well-organized, and the assertions are thorough. This test suite should provide good confidence in the correctness of the
Date
andDatePrecision
implementations.pyproject.toml (5)
1-3
: LGTM: Build system configuration is correct and modern.The build system configuration using
hatchling
is appropriate and follows modern Python packaging practices.
5-45
: LGTM: Comprehensive project metadata.The project metadata is well-structured and provides essential information. The Python version requirement, dependencies, and classifiers are appropriate and informative.
19-19
: Clarify the commented DHTech author line.There's a commented line regarding a DHTech author. Please clarify if this should be included or if the comment can be removed.
#!/bin/bash # Check if DHTech is mentioned elsewhere in the project rg "DHTech" --type-not toml
72-76
: LGTM: Correct hatch configuration.The hatch configuration for versioning and source distribution building is correct and follows best practices.
78-86
: LGTM: Appropriate test and type checking configurations.The pytest and mypy configurations are well-defined:
- The pytest setup includes custom markers and sets the correct Python path.
- The mypy configuration properly includes the numpy typing plugin.
These settings will enhance the project's testing and type checking capabilities.
src/undate/dateformat/iso8601.py (1)
65-68
: Improved ISO8601 date formattingThe changes enhance the
to_string
method by explicitly handling "month" and "day" portions with zero-padding. This improvement ensures consistent two-digit representation for month and day, which aligns perfectly with the ISO8601 standard for date formatting.These modifications resolve the concern raised in the previous review about reduced flexibility. The new implementation maintains the method's ability to handle various date formats while improving consistency and readability.
CONTRIBUTORS.md (3)
1-7
: LGTM: Clear introduction and proper badge implementation.The introduction clearly explains the use of All Contributors, and the badge is correctly implemented using the recommended HTML comment structure.
9-45
: LGTM: Well-structured contributors table.The contributors table is well-implemented, following the All Contributors specification. It clearly displays each contributor's information and contributions, and includes a footer with instructions for adding new contributions.
1-51
: Overall: Well-structured and informative contributors file.This
CONTRIBUTORS.md
file effectively acknowledges project contributors using the All Contributors specification. It includes a clear introduction, a properly implemented badge, a comprehensive contributors table, and a section for related blog posts. With the suggested fixes for the blog posts section, this file will serve as an excellent resource for recognizing and encouraging contributions to the project.🧰 Tools
🪛 Markdownlint
51-51: Expected: 0; Actual: 2
Unordered list indentation(MD007, ul-indent)
50-50: null
Link fragments should be valid(MD051, link-fragments)
tests/test_undate.py (4)
288-302
: Acknowledge duration return type change and request documentationThe changes from
timedelta
assertions toastype("int")
checks are consistent with the project's objective to usenumpy.datetime64
. However, as mentioned in a previous review, it would be beneficial to add comments or documentation in theUndate
class to clarify why theduration
method's return type was changed and how it should be used.Would you like assistance in drafting the necessary documentation or comments for the
Undate
class?
306-316
: LGTM: Consistent updates to partially known duration testsThe changes in this section are consistent with the modifications made in the
test_duration
method. The use ofastype("int")
for assertions maintains consistency across the test file.
396-418
: Approve consistent changes and request clarification on commented assertionThe updates to use
astype("int")
for assertions are consistent with the changes made throughout the file. However, there's a commented-out assertion on line 418 that suggests a change in behavior:# assert month_noyear_duration.astype("int") == 32 # this seems wrong, but we currently count both start and datesThis comment indicates a potential issue or change in how durations are calculated. Could you please clarify:
- Why was this assertion commented out?
- Is the current behavior of counting both start and end dates intentional?
- If this is a known issue, should it be documented or addressed in a future update?
426-426
: LGTM: Consistent update to duration assertionThis change is consistent with the modifications made throughout the file, updating the assertion to use
astype("int")
.src/undate/date.py (2)
70-74
: Incomplete return inmonth
property method.The
month
property does not have a return statement ifself.dtype == "datetime64[Y]"
. This could lead to returningNone
implicitly, which may not be the intended behavior. Consider raising an exception or returning a default value instead.
75-79
: Incomplete return inday
property method.The
day
property does not have a return statement ifself.dtype != "datetime64[D]"
. This could lead to returningNone
implicitly, which may not be the intended behavior. Consider raising an exception or returning a default value instead.src/undate/undate.py (1)
420-421
: Handle potential issues with negative durations.When calculating the duration, the check
duration.astype("int") < 0
assumes that negative durations indicate the need to adjust the end date by addingONE_YEAR
. This logic may not account for all edge cases, such as when dates span multiple years or the precise year is unknown.Consider adding additional validation or handling to ensure that duration calculations are accurate across all scenarios.
Run the following script to identify any occurrences where the
duration
could be negative and assess whether the current logic handles them properly:✅ Verification successful
Negative Duration Handling Verified Successfully.
No instances of
UndateInterval
withearliest
afterlatest
were found, ensuring that negative durations are effectively handled by the existing logic.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Find cases where duration calculation may result in negative values. # Test: Search for usages of `UndateInterval` where `earliest` is after `latest`. # Expect: No such cases should exist. rg --type python 'UndateInterval\(\s*earliest\s*=\s*(.+?),\s*latest\s*=\s*(.+?)\)' -A 5Length of output: 120
Script:
#!/bin/bash # Description: Find cases where duration calculation may result in negative values. # Test: Search for usages of `UndateInterval` where `earliest` is after `latest`. # Expect: No such cases should exist. rg --type py 'UndateInterval\(\s*earliest\s*=\s*(.+?),\s*latest\s*=\s*(.+?)\)' -A 5Length of output: 85
return all( | ||
[ | ||
self.earliest <= other.earliest, | ||
self.latest >= other.latest, | ||
# is precision sufficient for comparing partially known dates? | ||
# checking based on less precise /less granular time unit, | ||
# e.g. a day or month could be contained in a year | ||
# but not the reverse | ||
self.precision < other.precision, | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review precision comparison logic in the __contains__
method.
The comparison self.precision < other.precision
may not reflect the intended logic. In the DatePrecision
enumeration, higher values represent more precise dates (e.g., DAY
is more precise than MONTH
). Therefore, comparing self.precision < other.precision
means that a more precise date is considered to contain a less precise one, which may be counterintuitive.
Consider revising the comparison to self.precision > other.precision
if the intention is that a less precise date (e.g., a year) contains a more precise date (e.g., a specific day within that year).
Apply this diff to adjust the comparison:
return all(
[
self.earliest <= other.earliest,
self.latest >= other.latest,
- self.precision < other.precision,
+ self.precision > other.precision,
]
)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
return all( | |
[ | |
self.earliest <= other.earliest, | |
self.latest >= other.latest, | |
# is precision sufficient for comparing partially known dates? | |
# checking based on less precise /less granular time unit, | |
# e.g. a day or month could be contained in a year | |
# but not the reverse | |
self.precision < other.precision, | |
] | |
return all( | |
[ | |
self.earliest <= other.earliest, | |
self.latest >= other.latest, | |
# is precision sufficient for comparing partially known dates? | |
# checking based on less precise /less granular time unit, | |
# e.g. a day or month could be contained in a year | |
# but not the reverse | |
self.precision > other.precision, | |
] | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment is wrong, but it's causing me to think about it quite a bit which makes me wonder if there's a simpler or more intuitive way to implement this check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rlskoeser, I understand. Let me know if I can assist further.
(_/)
( •_•)
('')('')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (7)
src/undate/date.py (4)
19-20
: Consider removing the Export method if unused.The Export method simply returns self, which doesn't provide any additional functionality. Unless this method is used elsewhere in the codebase for consistency with other classes, consider removing it to simplify the class.
32-37
: LGTM: Well-defined constants for common time intervals.The constants ONE_DAY, ONE_YEAR, and ONE_MONTH_MAX are well-named and their purposes are clear. They provide convenient access to commonly used time intervals using the Timedelta class.
Consider adding a comment for ONE_YEAR to clarify that it represents a non-leap year:
#: timedelta for a single year (365 days, non-leap year) ONE_YEAR = Timedelta(365)
80-81
: Consider removing the Export method if unused.The Export method simply returns self, which doesn't provide any additional functionality. Unless this method is used elsewhere in the codebase for consistency with other classes, consider removing it to simplify the class.
120-140
: LGTM: DatePrecision enum is well-implemented and documented.The DatePrecision enum effectively represents different levels of date precision. The ordering of enum members is clearly explained in the comments, which is excellent for maintainability. The custom str method provides a convenient string representation.
Regarding the note about harmonizing with numpy date units:
Consider creating a separate issue or TODO item to track the potential harmonization with numpy date units. This would ensure that the idea is not lost and can be addressed in future updates.tests/test_undate.py (2)
117-120
: Create an issue to track leap day handling in unknown years.The TODO comment highlights an important edge case that needs to be addressed. To ensure this doesn't get overlooked, it would be beneficial to create a GitHub issue to track this improvement.
Would you like me to create a GitHub issue to track the implementation of leap day handling for unknown years in the
Undate
class?
428-441
: Approve new test cases and suggest future improvement.The new test cases for
UndateInterval
durations with partially known dates are excellent additions:
- They cover important edge cases, such as intervals spanning a year and intervals crossing February in unknown years.
- The comment explaining the non-leap year assumption for unknown years is crucial for understanding the expected behavior.
These additions significantly improve the test coverage and documentation of the
UndateInterval
class behavior.Consider adding a TODO comment to explore handling leap years in unknown year durations in the future. Would you like me to draft a TODO comment or create a GitHub issue for this potential improvement?
src/undate/undate.py (1)
129-131
: Offer assistance for handling Feb 29 with unknown year.There's a TODO comment indicating the need to handle cases where the date is February 29 with an unknown year, requiring the
min_year
andmax_year
to be adjusted to known leap years.Would you like assistance in implementing this special case? I can help by providing code to ensure that when the day is February 29 and the year is unknown, the
min_year
andmax_year
are set to known leap years.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (5)
- examples/notebooks/shxco_partial_date_durations.ipynb (112 hunks)
- src/undate/date.py (1 hunks)
- src/undate/undate.py (10 hunks)
- tests/test_date.py (1 hunks)
- tests/test_undate.py (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- examples/notebooks/shxco_partial_date_durations.ipynb
- tests/test_date.py
🧰 Additional context used
🪛 Ruff
tests/test_undate.py
2-2:
datetime.timedelta
imported but unusedRemove unused import:
datetime.timedelta
(F401)
4-4:
numpy
imported but unusedRemove unused import:
numpy
(F401)
🔇 Additional comments (11)
src/undate/date.py (4)
1-6
: LGTM: Import statements are appropriate and well-documented.The import statements are correct and necessary for the implemented functionality. The comment about Union usage for Python versions before 3.10 is informative and helps maintainers understand the code's compatibility considerations.
9-29
: LGTM: Timedelta class implementation is correct and follows numpy subclassing practices.The Timedelta class effectively extends np.ndarray to provide a more datetime.timedelta-like interface. The new method correctly handles both int and np.timedelta64 inputs, and the days property provides a convenient way to access the number of days.
40-115
: LGTM: Date class implementation is correct and follows numpy subclassing practices.The Date class effectively extends np.ndarray to provide a more datetime.date-like interface. The new method handles various input formats, and the year/month/day properties correctly handle different date precisions. The sub method appropriately returns a Timedelta object when subtracting dates.
1-140
: Overall, the implementation is solid with room for minor improvements.The custom Date and Timedelta classes effectively extend numpy's functionality to provide a more familiar datetime-like interface. The code is well-structured and follows numpy subclassing practices. Here are the main points for improvement:
- Add input validation for month and day in the Date.new method.
- Handle potential None returns in the month and day properties of the Date class.
- Consider removing the Export methods if they're not used elsewhere in the codebase.
- Create a separate issue to track the potential harmonization of DatePrecision with numpy date units.
Addressing these points will further enhance the robustness and usability of the classes.
tests/test_undate.py (5)
295-302
: Approve changes and commend improved test coverage.The updates to the
test_duration
method are well-implemented:
- The assertion now correctly checks for the
Timedelta
type, aligning with the updatedUndate
implementation.- The addition of a leap year test case for February improves the coverage of edge cases.
These changes enhance the robustness of the test suite.
399-401
: Approve addition of minimum year leap year check.The new
test_min_year_non_leapyear
method is a valuable addition to the test suite. It ensures that the minimum allowable year in theUndate
class is not a leap year, which is crucial for consistent date calculations, especially when dealing with partially known dates.This test helps prevent potential issues with leap year calculations at the lower bound of the allowable date range.
406-406
: Approve Timedelta type check in UndateInterval duration test.The update to check if
week_duration
is an instance ofTimedelta
is consistent with the earlier changes in theUndate
class tests. This change ensures that theUndateInterval
class is correctly using the customTimedelta
implementation for duration calculations.
Line range hint
1-441
: Summary of changes and overall approval.The updates to the test suite in
test_undate.py
are well-implemented and align with the project objectives:
- The transition to using the custom
Timedelta
class is consistent throughout the file.- New test cases have been added to cover edge cases, particularly for partially known dates and leap year scenarios.
- The test coverage for both
Undate
andUndateInterval
classes has been improved.- Comments have been added to explain important assumptions and limitations.
These changes enhance the robustness of the test suite and provide better documentation of the expected behavior of the
Undate
andUndateInterval
classes.There are a few minor issues to address, such as removing an unused import and a duplicate assertion, but overall, these updates significantly improve the quality of the test suite.
313-313
:⚠️ Potential issueRemove duplicate assertion.
There appears to be a duplicate assertion in the
test_partiallyknown_duration
method. The same assertion is written twice, which is unnecessary.Please remove one of the duplicate assertions:
- # assert Undate(day=5).duration().days == 1 assert Undate(day=5).duration().days == 1
Likely invalid or redundant comment.
src/undate/undate.py (2)
324-325
: Verify the use ofastype(int)
ondelta
.In the condition
if delta.astype(int) > 31:
, theastype
method is used ondelta
. Ensure thatdelta
is anumpy.timedelta64
object that supports theastype
method and that this operation works as intended.Please confirm that
delta.astype(int)
correctly returns the duration in days as an integer. Ifdelta
is not anumpy.timedelta64
, consider using an alternative method to obtain the integer value of the duration.
263-272
:⚠️ Potential issueCorrect the precision comparison logic in the
__contains__
method.The comparison
self.precision < other.precision
may not reflect the intended logic. Since higherDatePrecision
values represent more precise dates (e.g.,DatePrecision.DAY
is more precise thanDatePrecision.MONTH
), using<
means a more precise date is considered to contain a less precise one, which is counterintuitive.Consider changing the comparison to
self.precision > other.precision
to indicate that a less precise date (e.g., a year) contains a more precise date (e.g., a specific day within that year).Apply this diff to adjust the comparison:
return all( [ self.earliest <= other.earliest, self.latest >= other.latest, - self.precision < other.precision, + self.precision > other.precision, ] )Likely invalid or redundant comment.
@ColeDCrawford I was hoping I could just clean this PR up and merge without further review, but an error in the S&co example notebook caused me to find some problems with how I was handling duration logic in the new numpy-based implementation, which resulted in a bit more substantial change (and effort!) than I'd anticipated. I'm going to tag you for review on the updates. I've resolved all the coderabbitai comments that I've either addressed or don't care about; one interesting/tricky one left open for now. It flagged a lot of things, which I have mixed feelings about - glad to have things cleaner and in better shape, but it felt like it resulted in more work for me. FWIW I also went ahead and switched from black to ruff in the pre-commit hook. It doesn't seem to be correcting unused imports automatically, so if you have a set of options you prefer maybe you could open a PR to add them after this is merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again and sorry for leaving this one hanging for ages!
DEVELOPER_NOTES.md
Outdated
python3 -m venv undate | ||
source undate/bin/activate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default it looks like undate
isn't gitignored. Do we want to change this to .venv
, or add undate
to .gitignore
? uv
uses .venv
by default and that's in most .gitignore
files (including ours).
It might also be helpful to either specify a Python version here (e.g. python3.12 -m venv undate
) or include a .pythonversion
file so folks using pyenv
would install a virtual environment with our preferred version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for flagging this. I typically put my virtualenvs elsewhere and I know other tools have specific preferences.
I like the suggestions to switch to .venv
and create a pythonversion file as a default for development.
DEVELOPER_NOTES.md
Outdated
|
||
### Check python types | ||
|
||
Python typing is currently enforced on pull requests as part of a GitHub Actions Continuous Integration check using `mypy`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And also during pre-commit
hooks if those are installed, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, probably added after this was documented - I'll update, thanks for flagging
README.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Splitting out the development config into a separate file makes this a lot cleaner and more approachable for end users 👍
_static/undate_logo.png
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, documenting the dataset is definitely helpful for any researchers trying to understand why they might use the package
setup.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good riddance
src/undate/date.py
Outdated
# custom properties to access year, month, day | ||
|
||
@property | ||
def year(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want the properties to have return types?
src/undate/date.py
Outdated
# NOTE: values MUST be ordered based on the relative size or | ||
# precison of the time unit. That is, the smaller the unit, the more precise | ||
# it is: a day is more precise than a month, a month is more precise than a year, | ||
# (DatePrecision.year < DatePrecision.month) | ||
|
||
#: year | ||
YEAR = 1 | ||
#: month | ||
MONTH = 2 | ||
#: day | ||
DAY = 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One potential issue with ordering it this way is that it's going to be harder to extend. I guess we could do DECADE = 0
? My understanding is that we can't get precise than DAY
(since we're not recording times) but there are potentially future cases where we might want to be less precise than YEAR
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the numbers would change as long as they are sequenced properly. The numbers only need to be consistent within a given installation of undate, right? (we're not storing them in a db or anything).
We could add decade and century now if you think it would be helpful, but was waiting until we had use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was mostly thinking of the conversation earlier today where we heard that even decade precision is still very useful and even century is much better than nothing.
If we were to update this kind of core config it should probably be in at least a minor release? Also - is this "the PR" for v1.0, or at least v0.3? Looks like the last release was April?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add century and decade to try to future proof a little bit, but I don't think the actual numbers should matter - they should never be referenced as numbers, only as constants, and they are only defined within the context of an installed version of undate. Am I thinking about that wrong?
src/undate/dateformat/base.py
Outdated
@@ -15,9 +15,8 @@ | |||
import importlib | |||
import logging | |||
import pkgutil | |||
from typing import Dict, Type | |||
from functools import lru_cache # functools.cache not available until 3.9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use this now, right? We dropped 3.8 support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for catching - although not sure why this was in the change set, sorry for such an enormous PR!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (11)
src/undate/dateformat/base.py (2)
Line range hint
42-65
: Consider a more robust way to handle base module exclusionThe current implementation uses string matching (
not modname.endswith(".base")
) to exclude the base module. This approach could be fragile if the module structure changes.Consider using a more explicit approach:
- if not modname.endswith(".base"): + # Get the module name without the package prefix + module_name = modname[len(formatter_prefix):] + # Explicitly exclude the current module + if module_name != "base": importlib.import_module(modname) import_count += 1
Line range hint
29-35
: Document numpy datetime64 support in method docstringsSince this PR introduces numpy datetime64 support, it would be helpful to document the expected types in the method docstrings, even if type hints can't be added due to circular imports.
Add documentation like this:
def parse(self, value: str): - # can't add type hint here because of circular import - # should return an undate or undate interval + """Parse a string value into an Undate or UndateInterval. + + The implementation should support both datetime.date and numpy.datetime64 + based dates to maintain compatibility. + + Args: + value: String representation of the date + + Returns: + Undate or UndateInterval object + """ raise NotImplementedErrorpyproject.toml (3)
31-44
: Add numpy-related classifiers.Since numpy is now a core dependency, consider adding relevant classifiers:
classifiers = [ "Development Status :: 2 - Pre-Alpha", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.9", "Programming Language :: Python :: 3.10", "Programming Language :: Python :: 3.11", "Programming Language :: Python :: 3.12", "Intended Audience :: Developers", "License :: OSI Approved :: Apache Software License", "Operating System :: OS Independent", "Topic :: Software Development :: Libraries :: Python Modules", "Topic :: Utilities", "Typing :: Typed", + "Topic :: Scientific/Engineering", + "Intended Audience :: Science/Research", ]
12-12
: Consider making numpy an optional dependency.As discussed in the PR objectives, consider moving numpy to optional dependencies to allow users who don't need the extended year range functionality to avoid the dependency. This would make the package lighter for web-based projects.
-dependencies = ["python-dateutil", "lark", "numpy"] +dependencies = ["python-dateutil", "lark"] [project.optional-dependencies] +extended-dates = ["numpy"] docs = ["sphinx>=7.0.0", "alabaster", "myst-parser", "myst-parser[linkify]"]
79-82
: Fix typo in pytest marker description.There's a typo in the "first" marker description:
markers = [ "last : run marked tests after all others", - "first : run marked tests efore all others", + "first : run marked tests before all others", ]DEVELOPER_NOTES.md (4)
7-11
: Fix grammar: "checkout" should be "check out"The word "checkout" is being used as a verb here, which should be written as two words: "check out".
-### Use git to checkout a copy of the repository +### Use git to check out a copy of the repository🧰 Tools
🪛 LanguageTool
[grammar] ~7-~7: The word “checkout” is a noun. The verb is spelled with a white space.
Context: ... you started. ## Setup ### Use git to checkout a copy of the repository ```sh git clon...(NOUN_VERB_CONFUSION)
28-31
: Fix numbering in manual git flow instructionsThe step numbering jumps from 1 to 3, which appears to be a typo.
If you cannot or prefer not to install git flow, you can do the equivalent manually. 1. Check out the develop branch: `git checkout develop` -3. Create new feature manually from the develop branch: `git checkout -b feature/xxx-name` +2. Create new feature manually from the develop branch: `git checkout -b feature/xxx-name`
37-40
: Add language specification to code blockFor better syntax highlighting and consistency, specify the shell language for this code block.
-``` +```sh python3 -m venv .venv source .venv/bin/activate<details> <summary>🧰 Tools</summary> <details> <summary>🪛 Markdownlint</summary> 37-37: null Fenced code blocks should have a language specified (MD040, fenced-code-language) </details> </details> --- `60-67`: **Consider adding code coverage information** The testing instructions are clear, but it might be helpful to add information about viewing code coverage reports, especially since the PR objectives mention benchmarking and testing requirements. Consider adding: ```diff To test cases by method name, use `-k`: `pytest -k test_str` + +To run tests with coverage reporting: +```sh +pytest --cov=undate --cov-report=html +``` +The coverage report will be generated in the `htmlcov` directory.
src/undate/date.py (2)
10-11
: Typo in Docstring: Capitalize 'Python'In the docstring, 'python' should be capitalized to 'Python' to adhere to proper noun conventions.
127-127
: Typo in Comment: Correct 'precison' to 'precision'There's a typographical error in the comment; 'precison' should be corrected to 'precision'.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (6)
.github/workflows/unit_tests.yml
(2 hunks).pythonversion
(1 hunks)DEVELOPER_NOTES.md
(1 hunks)pyproject.toml
(1 hunks)src/undate/date.py
(1 hunks)src/undate/dateformat/base.py
(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- .pythonversion
🧰 Additional context used
🪛 actionlint
.github/workflows/unit_tests.yml
37-37: property "python-cache" is not defined in object type {}
(expression)
🪛 LanguageTool
DEVELOPER_NOTES.md
[grammar] ~7-~7: The word “checkout” is a noun. The verb is spelled with a white space.
Context: ... you started. ## Setup ### Use git to checkout a copy of the repository ```sh git clon...
(NOUN_VERB_CONFUSION)
[style] ~50-~50: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... sh pip install -e ".[dev]"
### Install pre-commit hooks We use [pre-commit](h...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 Markdownlint
DEVELOPER_NOTES.md
37-37: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (8)
.github/workflows/unit_tests.yml (5)
14-14
: LGTM! Python version update aligns with the check workflow.
The update to Python 3.12 for coverage reporting is consistent with the suggestion to use 3.12 as the default version.
33-33
: LGTM! Cache configuration updated correctly.
The cache dependency path has been properly updated to reflect the move to pyproject.toml.
41-41
: LGTM! Test execution simplified and coverage properly scoped.
The changes from tox to direct pytest commands simplify the workflow, and the coverage is correctly scoped to the 'undate' package.
Also applies to: 46-46
36-36
: Verify [test] extra dependencies.
The installation command now uses .[test]
instead of .[dev]
. Let's verify the test dependencies are properly defined.
#!/bin/bash
# Description: Verify test dependencies configuration
# Test 1: Check if [test] extra exists in pyproject.toml
rg '\[project\.optional-dependencies\]' -A 10 pyproject.toml
# Test 2: Verify pytest and pytest-cov are included
rg '(pytest|pytest-cov)' pyproject.toml
21-21
: Verify numpy compatibility with Python 3.13.
Since this PR introduces numpy as a dependency, we should verify its compatibility with Python 3.13 (currently in alpha).
src/undate/dateformat/base.py (1)
18-18
: LGTM: Good modernization using cache
The switch from lru_cache
to cache
is a good modernization step, as cache
is a simpler alternative introduced in Python 3.9 that's equivalent to lru_cache(maxsize=None)
.
pyproject.toml (1)
84-85
: Verify numpy type hints usage.
The mypy numpy plugin is correctly configured. Let's verify that numpy type hints are being used in the codebase:
✅ Verification successful
Let me gather more information about numpy type hints usage by checking for additional numpy type annotation patterns.
NumPy type hints are properly used in the codebase
The codebase demonstrates appropriate usage of NumPy type hints:
- Custom classes
Date
andTimedelta
properly extendnp.ndarray
- Type annotations use
np.datetime64
andnp.timedelta64
correctly - Function parameters and return types are properly annotated with NumPy types combined with Python's typing module (Union)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for numpy type hint usage in the codebase
rg "numpy\.(datetime64|ndarray|typing)" src/
Length of output: 128
Script:
#!/bin/bash
# Check for more numpy type hint patterns
rg "np\.(ndarray|datetime64|typing)" src/
# Check for type annotations with numpy
rg "(:|->)\s*(np\.|numpy\.)" src/
# Look at the content of date.py to understand the context
cat src/undate/date.py
Length of output: 4974
DEVELOPER_NOTES.md (1)
42-56
: LGTM! Clear installation instructions
The installation instructions are well-structured and include both package installation and pre-commit hooks setup as discussed in the previous review.
🧰 Tools
🪛 LanguageTool
[style] ~50-~50: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... sh pip install -e ".[dev]"
### Install pre-commit hooks We use [pre-commit](h...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
.github/workflows/unit_tests.yml
Outdated
- name: Install package with dependencies | ||
run: | | ||
pip install -e ".[dev]" | ||
python -m pip install tox tox-gh-actions | ||
pip install -e ".[test]" | ||
if: steps.python-cache.outputs.cache-hit != 'true' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix undefined step output reference.
The condition steps.python-cache.outputs.cache-hit
references an undefined step. The step for caching is using actions/setup-python@v4 which doesn't create this output.
Remove the condition since the cache is handled by actions/setup-python:
- if: steps.python-cache.outputs.cache-hit != 'true'
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if: steps.python-cache.outputs.cache-hit != 'true' |
🧰 Tools
🪛 actionlint
37-37: property "python-cache" is not defined in object type {}
(expression)
ref #77
datetime.date
previous notes
I implemented this as spike to see how hard it would be to use numpy's datetime64 instead of python's builtin datetime.date. Using
np.datetime64
gets us way past the year limitation ofdatetime.date
([2.5e16 BC, 2.5e16 AD] for day-precision dates).We can make this an optional dependency so you would only install numpy if you need the expanded year range. Making it optional makes the code and the testing more complicated. Do you think numpy as a dependency would be a concern for folks doing e.g web-based projects?
(type check is failing because I haven't bothered updating the types yet)
Summary by CodeRabbit
New Features
DEVELOPER_NOTES.md
file providing comprehensive contribution guidelines and setup instructions.Timedelta
andDate
classes for enhanced date and time handling.Documentation
DEVELOPER_NOTES.md
with detailed setup and testing instructions..pythonversion
file for specifying the Python version.Chores
hatchling
as the build system and updated dependency management.