Optimize space usage of ExplorationReport before saving #279

zjgemi · 2025-01-07T07:14:21Z

Summary by CodeRabbit

New Features
- Added no_candidate() method across multiple exploration report classes to check candidate availability.
- Enhanced get_candidate_ids() method with optional clear parameter for memory management.
Improvements
- Optimized ratio calculations in exploration report classes.
- Introduced more efficient state tracking for candidate configurations.
Technical Updates
- Updated method signatures in exploration report classes to include new parameters.
- Refined candidate selection and reporting mechanisms.

Signed-off-by: zjgemi <[email protected]>

for more information, see https://pre-commit.ci

coderabbitai · 2025-01-07T07:15:29Z

📝 Walkthrough

Walkthrough

The pull request introduces modifications to several exploration report classes across the dpgen2/exploration/report/ directory. Key changes include the addition of a new abstract method no_candidate() to the ExplorationReport class, the introduction of new attributes for tracking candidate states in the ExplorationReportAdaptiveLower and ExplorationReportTrustLevels classes, and updates to the get_candidate_ids() method signatures to include an optional clear parameter for memory management. These changes enhance the flexibility and efficiency of exploration reporting mechanisms in the codebase.

Changes

File	Change Summary
`dpgen2/exploration/report/report.py`	Added abstract method `no_candidate()` to `ExplorationReport` class
`dpgen2/exploration/report/report_adaptive_lower.py`	- Added new attributes: `_no_candidate`, `_failed_ratio`, `_accurate_ratio`, `_candidate_ratio` - Updated `record()` method to calculate new attributes - Added `no_candidate()` method - Modified `get_candidate_ids()` to include `clear` parameter
`dpgen2/exploration/report/report_trust_levels_base.py`	- Added new private attributes: `_no_candidate`, `_failed_ratio`, `_accurate_ratio`, `_candidate_ratio` - Updated `record()` method to compute new attributes - Added `no_candidate()` method
`dpgen2/exploration/report/report_trust_levels_max.py`	- Updated `get_candidate_ids()` method to include `clear` parameter - Added type assertion in `converged()` method
`dpgen2/exploration/report/report_trust_levels_random.py`	- Updated `get_candidate_ids()` method to include `clear` parameter - Added type assertion in `converged()` method
`tests/exploration/test_report_adaptive_lower.py`	Updated test cases to accommodate new `get_candidate_ids()` method signature

Sequence Diagram

sequenceDiagram
    participant Report as ExplorationReport
    participant SubReport as SubclassReport
    
    Report->>Report: Define abstract no_candidate()
    SubReport->>SubReport: Implement no_candidate()
    SubReport->>SubReport: Calculate candidate state
    SubReport-->>Report: Return boolean indicating candidates

This sequence diagram illustrates the new no_candidate() method's role in the exploration report hierarchy, showing how subclasses implement the method to determine their candidate state.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between af8ee2f and 48a5c47.

📒 Files selected for processing (1)

dpgen2/exploration/report/report_trust_levels_max.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

dpgen2/exploration/report/report_trust_levels_max.py

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

dpgen2/exploration/report/report_trust_levels_random.py (1)
49-49: Good memory optimization with clear parameter!

The addition of the clear parameter allows for better memory management. However, the comment about single usage should be moved to the method's docstring for better visibility.

Consider moving the comment to the method's docstring:
    def get_candidate_ids(
        self,
        max_nframes: Optional[int] = None,
        clear: bool = True,
    ) -> List[List[int]]:
+        """Get indexes of candidate configurations
+
+        Note: This method should only be called once as it may clear memory when clear=True.
+
+        Parameters
+        ----------
+        max_nframes : Optional[int], optional
+            The maximal number of frames of candidates.
+        clear : bool, optional
+            Whether to free memory after getting candidates, by default True
+
+        Returns
+        -------
+        List[List[int]]
+            The frame indices of candidate configurations.
+        """
        ntraj = len(self.traj_nframes)
        id_cand = self._get_candidates(max_nframes)
        id_cand_list = [[] for ii in range(ntraj)]
        for ii in id_cand:
            id_cand_list[ii[0]].append(ii[1])
-        # free the memory, this method should only be called once
        if clear:
            self.clear()
        return id_cand_list
Also applies to: 56-58
dpgen2/exploration/report/report_trust_levels_max.py (1)

49-49: Good memory optimization with clear parameter!

The addition of the clear parameter allows for better memory management. However, the comment about single usage should be moved to the method's docstring for better visibility.

Consider moving the comment to the method's docstring, similar to the suggestion for report_trust_levels_random.py.

Also applies to: 56-58
dpgen2/exploration/report/report_adaptive_lower.py (1)
377-386: Add parameter documentation for better clarity.

The clear parameter's purpose and behavior should be documented in the method's docstring.
     def get_candidate_ids(
         self,
         max_nframes: Optional[int] = None,
         clear: bool = True,
     ) -> List[List[int]]:
+        """
+        Parameters
+        ----------
+        max_nframes : Optional[int]
+            The maximal number of frames to return
+        clear : bool
+            If True, frees memory by clearing internal state after retrieving candidates
+        """

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fc72c85 and 2bcb683.

📒 Files selected for processing (6)

dpgen2/exploration/report/report.py (1 hunks)
dpgen2/exploration/report/report_adaptive_lower.py (3 hunks)
dpgen2/exploration/report/report_trust_levels_base.py (3 hunks)
dpgen2/exploration/report/report_trust_levels_max.py (1 hunks)
dpgen2/exploration/report/report_trust_levels_random.py (1 hunks)
tests/exploration/test_report_adaptive_lower.py (2 hunks)

🔇 Additional comments (9)

dpgen2/exploration/report/report.py (1)

63-66: Well-structured abstract method addition!

The new no_candidate() method is properly defined with correct type annotations and documentation.

dpgen2/exploration/report/report_trust_levels_base.py (3)

67-70: Good optimization: Caching computed values!

Adding cached attributes for frequently accessed values is a good optimization that prevents redundant calculations.

140-149: Well-structured ratio calculations!

The ratio calculations are properly moved to the record() method where the data is first processed. This is more efficient than calculating on-demand.

222-234: Efficient use of cached values!

The ratio methods now return cached values instead of recalculating, which is more efficient.

dpgen2/exploration/report/report_adaptive_lower.py (3)

130-133: LGTM! Good optimization for caching computed values.

The addition of these instance variables helps avoid redundant calculations by caching the computed values.

281-284: LGTM! Efficient caching of computed values.

The calculations are performed at the right time, after all processing is complete, and the results are properly cached.

357-372: LGTM! Efficient use of cached values.

The ratio methods now return pre-calculated values instead of computing them on each call, and the new no_candidate method provides a clean interface for checking candidate availability.

tests/exploration/test_report_adaptive_lower.py (2)

91-91: LGTM! Test properly verifies the new clear parameter.

The test correctly sets clear=False to ensure state is preserved for subsequent assertions.

224-225: LGTM! Improved test reliability.

Moving the mock.patch to encompass the get_candidate_ids call ensures reliable mocking behavior.

Signed-off-by: zjgemi <[email protected]>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

dpgen2/exploration/report/report_trust_levels_random.py (1)
44-46: LGTM! Consider adding a more descriptive assertion message.

The type assertion is a good addition for runtime type safety.

Consider making the assertion message more descriptive:
-        assert isinstance(accurate_ratio, float)
+        assert isinstance(accurate_ratio, float), f"accurate_ratio must be float, got {type(accurate_ratio)}"

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2bcb683 and af8ee2f.

📒 Files selected for processing (1)

dpgen2/exploration/report/report_trust_levels_random.py (1 hunks)

🔇 Additional comments (1)

dpgen2/exploration/report/report_trust_levels_random.py (1)

51-51: 🛠️ Refactor suggestion

Consider improving memory management design and documentation.

While the memory optimization is valuable, there are several concerns to address:

The method is documented as "should only be called once" but there's no mechanism to enforce this.
The memory clearing operation is tightly coupled with candidate retrieval.
The new clear parameter lacks documentation.

Consider these improvements:

     def get_candidate_ids(
         self,
         max_nframes: Optional[int] = None,
-        clear: bool = True,
+        clear: bool = True,
     ) -> List[List[int]]:
+        """Get candidate IDs for each trajectory.
+
+        Parameters
+        ----------
+        max_nframes : Optional[int]
+            Maximum number of frames to return
+        clear : bool, default=True
+            If True, clears internal state after retrieval.
+            Note: This method is designed to be called only once.
+
+        Returns
+        -------
+        List[List[int]]
+            List of candidate IDs for each trajectory
+        """
+        if hasattr(self, '_candidates_retrieved'):
+            raise RuntimeError("get_candidate_ids should only be called once")
         ntraj = len(self.traj_nframes)
         id_cand = self._get_candidates(max_nframes)
         id_cand_list = [[] for ii in range(ntraj)]
         for ii in id_cand:
             id_cand_list[ii[0]].append(ii[1])
-        # free the memory, this method should only be called once
         if clear:
             self.clear()
+        setattr(self, '_candidates_retrieved', True)
         return id_cand_list

Let's verify if this method is called multiple times in the codebase:

Also applies to: 58-60

Signed-off-by: zjgemi <[email protected]>

codecov · 2025-01-07T08:51:28Z

Codecov Report

Attention: Patch coverage is 87.50000% with 5 lines in your changes missing coverage. Please review.

Project coverage is 84.33%. Comparing base (fc72c85) to head (48a5c47).
Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
...gen2/exploration/report/report_trust_levels_max.py	40.00%	3 Missing ⚠️
dpgen2/exploration/report/report.py	50.00%	1 Missing ⚠️
dpgen2/exploration/report/report_adaptive_lower.py	93.33%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #279   +/-   ##
=======================================
  Coverage   84.32%   84.33%           
=======================================
  Files         104      104           
  Lines        6003     6031   +28     
=======================================
+ Hits         5062     5086   +24     
- Misses        941      945    +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

zjgemi and others added 2 commits January 7, 2025 15:13

Optimize space usage of ExplorationReport before saving

2ef45ab

Signed-off-by: zjgemi <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

2bcb683

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Jan 7, 2025

View reviewed changes

zjgemi added 2 commits January 7, 2025 15:43

fix pyright

c411a40

Signed-off-by: zjgemi <[email protected]>

Merge branch 'opt-report' of github.com:zjgemi/dpgen2 into opt-report

af8ee2f

coderabbitai bot reviewed Jan 7, 2025

View reviewed changes

fix pyright

48a5c47

Signed-off-by: zjgemi <[email protected]>

wanghan-iapcm approved these changes Jan 8, 2025

View reviewed changes

wanghan-iapcm merged commit d60b4e9 into deepmodeling:master Jan 8, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize space usage of ExplorationReport before saving #279

Optimize space usage of ExplorationReport before saving #279

zjgemi commented Jan 7, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 7, 2025 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot left a comment

codecov bot commented Jan 7, 2025 •

edited

Loading

Optimize space usage of ExplorationReport before saving #279

Optimize space usage of ExplorationReport before saving #279

Conversation

zjgemi commented Jan 7, 2025 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Jan 7, 2025 • edited Loading

Walkthrough

Changes

Sequence Diagram

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

codecov bot commented Jan 7, 2025 • edited Loading

Codecov Report

zjgemi commented Jan 7, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 7, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

codecov bot commented Jan 7, 2025 •

edited

Loading