-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize space usage of ExplorationReport before saving #279
Conversation
Signed-off-by: zjgemi <[email protected]>
for more information, see https://pre-commit.ci
📝 WalkthroughWalkthroughThe pull request introduces modifications to several exploration report classes across the Changes
Sequence DiagramsequenceDiagram
participant Report as ExplorationReport
participant SubReport as SubclassReport
Report->>Report: Define abstract no_candidate()
SubReport->>SubReport: Implement no_candidate()
SubReport->>SubReport: Calculate candidate state
SubReport-->>Report: Return boolean indicating candidates
This sequence diagram illustrates the new 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
dpgen2/exploration/report/report_trust_levels_random.py (1)
49-49
: Good memory optimization with clear parameter!The addition of the
clear
parameter allows for better memory management. However, the comment about single usage should be moved to the method's docstring for better visibility.Consider moving the comment to the method's docstring:
def get_candidate_ids( self, max_nframes: Optional[int] = None, clear: bool = True, ) -> List[List[int]]: + """Get indexes of candidate configurations + + Note: This method should only be called once as it may clear memory when clear=True. + + Parameters + ---------- + max_nframes : Optional[int], optional + The maximal number of frames of candidates. + clear : bool, optional + Whether to free memory after getting candidates, by default True + + Returns + ------- + List[List[int]] + The frame indices of candidate configurations. + """ ntraj = len(self.traj_nframes) id_cand = self._get_candidates(max_nframes) id_cand_list = [[] for ii in range(ntraj)] for ii in id_cand: id_cand_list[ii[0]].append(ii[1]) - # free the memory, this method should only be called once if clear: self.clear() return id_cand_listAlso applies to: 56-58
dpgen2/exploration/report/report_trust_levels_max.py (1)
49-49
: Good memory optimization with clear parameter!The addition of the
clear
parameter allows for better memory management. However, the comment about single usage should be moved to the method's docstring for better visibility.Consider moving the comment to the method's docstring, similar to the suggestion for
report_trust_levels_random.py
.Also applies to: 56-58
dpgen2/exploration/report/report_adaptive_lower.py (1)
377-386
: Add parameter documentation for better clarity.The
clear
parameter's purpose and behavior should be documented in the method's docstring.def get_candidate_ids( self, max_nframes: Optional[int] = None, clear: bool = True, ) -> List[List[int]]: + """ + Parameters + ---------- + max_nframes : Optional[int] + The maximal number of frames to return + clear : bool + If True, frees memory by clearing internal state after retrieving candidates + """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
dpgen2/exploration/report/report.py
(1 hunks)dpgen2/exploration/report/report_adaptive_lower.py
(3 hunks)dpgen2/exploration/report/report_trust_levels_base.py
(3 hunks)dpgen2/exploration/report/report_trust_levels_max.py
(1 hunks)dpgen2/exploration/report/report_trust_levels_random.py
(1 hunks)tests/exploration/test_report_adaptive_lower.py
(2 hunks)
🔇 Additional comments (9)
dpgen2/exploration/report/report.py (1)
63-66
: Well-structured abstract method addition!The new
no_candidate()
method is properly defined with correct type annotations and documentation.dpgen2/exploration/report/report_trust_levels_base.py (3)
67-70
: Good optimization: Caching computed values!Adding cached attributes for frequently accessed values is a good optimization that prevents redundant calculations.
140-149
: Well-structured ratio calculations!The ratio calculations are properly moved to the
record()
method where the data is first processed. This is more efficient than calculating on-demand.
222-234
: Efficient use of cached values!The ratio methods now return cached values instead of recalculating, which is more efficient.
dpgen2/exploration/report/report_adaptive_lower.py (3)
130-133
: LGTM! Good optimization for caching computed values.The addition of these instance variables helps avoid redundant calculations by caching the computed values.
281-284
: LGTM! Efficient caching of computed values.The calculations are performed at the right time, after all processing is complete, and the results are properly cached.
357-372
: LGTM! Efficient use of cached values.The ratio methods now return pre-calculated values instead of computing them on each call, and the new
no_candidate
method provides a clean interface for checking candidate availability.tests/exploration/test_report_adaptive_lower.py (2)
91-91
: LGTM! Test properly verifies the new clear parameter.The test correctly sets
clear=False
to ensure state is preserved for subsequent assertions.
224-225
: LGTM! Improved test reliability.Moving the mock.patch to encompass the get_candidate_ids call ensures reliable mocking behavior.
Signed-off-by: zjgemi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
dpgen2/exploration/report/report_trust_levels_random.py (1)
44-46
: LGTM! Consider adding a more descriptive assertion message.The type assertion is a good addition for runtime type safety.
Consider making the assertion message more descriptive:
- assert isinstance(accurate_ratio, float) + assert isinstance(accurate_ratio, float), f"accurate_ratio must be float, got {type(accurate_ratio)}"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
dpgen2/exploration/report/report_trust_levels_random.py
(1 hunks)
🔇 Additional comments (1)
dpgen2/exploration/report/report_trust_levels_random.py (1)
51-51
: 🛠️ Refactor suggestionConsider improving memory management design and documentation.
While the memory optimization is valuable, there are several concerns to address:
- The method is documented as "should only be called once" but there's no mechanism to enforce this.
- The memory clearing operation is tightly coupled with candidate retrieval.
- The new
clear
parameter lacks documentation.Consider these improvements:
def get_candidate_ids( self, max_nframes: Optional[int] = None, - clear: bool = True, + clear: bool = True, ) -> List[List[int]]: + """Get candidate IDs for each trajectory. + + Parameters + ---------- + max_nframes : Optional[int] + Maximum number of frames to return + clear : bool, default=True + If True, clears internal state after retrieval. + Note: This method is designed to be called only once. + + Returns + ------- + List[List[int]] + List of candidate IDs for each trajectory + """ + if hasattr(self, '_candidates_retrieved'): + raise RuntimeError("get_candidate_ids should only be called once") ntraj = len(self.traj_nframes) id_cand = self._get_candidates(max_nframes) id_cand_list = [[] for ii in range(ntraj)] for ii in id_cand: id_cand_list[ii[0]].append(ii[1]) - # free the memory, this method should only be called once if clear: self.clear() + setattr(self, '_candidates_retrieved', True) return id_cand_listLet's verify if this method is called multiple times in the codebase:
Also applies to: 58-60
Signed-off-by: zjgemi <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #279 +/- ##
=======================================
Coverage 84.32% 84.33%
=======================================
Files 104 104
Lines 6003 6031 +28
=======================================
+ Hits 5062 5086 +24
- Misses 941 945 +4 ☔ View full report in Codecov by Sentry. |
Summary by CodeRabbit
New Features
no_candidate()
method across multiple exploration report classes to check candidate availability.get_candidate_ids()
method with optionalclear
parameter for memory management.Improvements
Technical Updates