Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequence plot #30

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft

Sequence plot #30

wants to merge 5 commits into from

Conversation

jcharkow
Copy link
Collaborator

@jcharkow jcharkow commented Sep 23, 2024

User description

This is a continuation from #29 because I was having difficulty pushing to that branch.

This is the first attempt at integrating the sequence plot into the pyopenms_viz framework. Currently, only matplotlib backend is supported.
The original script can be found in sequenceplot.py

Example usage:

import pandas as pd
df = pd.read_csv("../test/test_data/TestSpectrumDf.tsv", sep="\t")

df.loc[0, "ion_annotation"] = "a3+"
df.loc[2, "ion_annotation"] = "c2+"
df.plot(kind='sequence', backend='ms_matplotlib')


# old code
# fig, ax = plt.subplots()
# sp.plot(ax, df)

PR Type

enhancement, documentation


Description

  • Introduced a new SequencePlot class for plotting peptide sequences with matched fragments.
  • Added support for sequence plots in Bokeh, Matplotlib, and Plotly backends.
  • Implemented the MATPLOTLIBSequencePlot class with detailed plotting logic.
  • Provided a placeholder implementation for sequence plots in Bokeh and Plotly.
  • Updated documentation to reflect new parameter names and usage.

Changes walkthrough 📝

Relevant files
Enhancement
__init__.py
Add Bokeh support for sequence plot type                                 

pyopenms_viz/_bokeh/init.py

  • Added BOKEHSequencePlot to the imports.
  • Updated PLOT_CLASSES to include sequence plot type.
  • +2/-0     
    core.py
    Implement Bokeh sequence plot class                                           

    pyopenms_viz/_bokeh/core.py

  • Added SequencePlot to the imports.
  • Implemented BOKEHSequencePlot class with a placeholder for plot
    method.
  • +27/-13 
    _core.py
    Introduce SequencePlot class for peptide plotting               

    pyopenms_viz/_core.py

  • Added sequence to _common_kinds.
  • Implemented SequencePlot class for peptide sequence plotting.
  • +117/-32
    __init__.py
    Add Matplotlib support for sequence plot type                       

    pyopenms_viz/_matplotlib/init.py

  • Added MATPLOTLIBSequencePlot to the imports.
  • Updated PLOT_CLASSES to include sequence plot type.
  • +2/-0     
    core.py
    Implement Matplotlib sequence plot class                                 

    pyopenms_viz/_matplotlib/core.py

  • Added SequencePlot to the imports.
  • Implemented MATPLOTLIBSequencePlot class with plotting logic.
  • +152/-43
    __init__.py
    Add Plotly support for sequence plot type                               

    pyopenms_viz/_plotly/init.py

  • Added PLOTLYSequencePlot to the imports.
  • Updated PLOT_CLASSES to include sequence plot type.
  • +2/-0     
    core.py
    Implement Plotly sequence plot class                                         

    pyopenms_viz/_plotly/core.py

  • Added SequencePlot to the imports.
  • Implemented PLOTLYSequencePlot class with a placeholder for plot
    method.
  • +103/-52
    sequenceplot.py
    Add sequence plotting function using Matplotlib                   

    sequenceplot.py

  • Added function to plot peptide sequences with fragment annotations.
  • Utilizes Matplotlib for visualization.
  • +83/-0   

    💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    bittremieux and others added 4 commits September 23, 2024 09:25
    Create SequencePlot object, a basic plot for plotting sequences
    
    Add implementation for SequencePlot in matplotlib
    
    rename x and y to x_pos and y_pos to avoid naming conflict
    
    add variables for column names so they are not hard-coded
    
    replace fragment fontsize with annotation_fontsize
    add documentation with new parameter names
    This is probably not the best place to put the documentation but will do
    for now. Also have to reformat as google docstring instead of numpy
    @codiumai-pr-agent-pro codiumai-pr-agent-pro bot added documentation Improvements or additions to documentation enhancement New feature or request labels Sep 23, 2024
    Copy link
    Contributor

    PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

    PR Reviewer Guide 🔍

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Key issues to review

    Code Duplication
    The SequencePlot class in _core.py and the plot function in sequenceplot.py have very similar logic. Consider consolidating this code to avoid duplication.

    Error Handling
    The MATPLOTLIBSequencePlot class does not include any error handling for cases where the required columns are missing from the input data.

    Unimplemented Feature
    The PLOTLYSequencePlot class is not implemented and raises a NotImplementedError. This should be implemented to provide consistent functionality across backends.

    Copy link
    Contributor

    codiumai-pr-agent-pro bot commented Sep 23, 2024

    PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

    PR Code Suggestions ✨

    CategorySuggestion                                                                                                                                    Score
    Maintainability
    Extract bin assignment logic into a separate function

    In the _bin_peaks method, consider extracting the bin assignment logic into a
    separate function for better readability and maintainability.

    pyopenms_viz/_core.py [806-813]

    -def assign_bin(value):
    -    for low, high in self.num_x_bins:
    +def assign_bin(value, bins):
    +    for low, high in bins:
             if low <= value <= high:
                 return f"{low:.4f}-{high:.4f}"
         return nan  # For values that don't fall into any bin
     
     # Apply the binning
    -data[x] = data[x].apply(assign_bin)
    +data[x] = data[x].apply(lambda v: assign_bin(v, self.num_x_bins))
     
    • Apply this suggestion
    Suggestion importance[1-10]: 9

    Why: Extracting the bin assignment logic into a separate function significantly improves code readability and maintainability, making it easier to understand and modify.

    9
    Enhancement
    Use a dictionary to map ion types to y-axis positions for improved maintainability and extensibility

    Consider using a dictionary to map ion types to their corresponding y-axis positions
    instead of multiple if-elif statements. This would make the code more maintainable
    and easier to extend.

    pyopenms_viz/_matplotlib/core.py [468-476]

    -if ion_type in "ax":
    -    y_i = 2 * self.frag_len
    -elif ion_type in "by":
    -    y_i = self.frag_len
    -elif ion_type in "cz":
    -    y_i = 3 * self.frag_len
    -else:
    +ion_type_to_y = {
    +    "a": 2 * self.frag_len,
    +    "x": 2 * self.frag_len,
    +    "b": self.frag_len,
    +    "y": self.frag_len,
    +    "c": 3 * self.frag_len,
    +    "z": 3 * self.frag_len
    +}
    +y_i = ion_type_to_y.get(ion_type)
    +if y_i is None:
         # Ignore unknown ion types.
         continue
     
    • Apply this suggestion
    Suggestion importance[1-10]: 8

    Why: The suggestion improves code maintainability and extensibility by replacing multiple if-elif statements with a dictionary, making it easier to add or modify ion types and their corresponding values.

    8
    Rename the tuple to better describe its contents

    Consider using a more descriptive name for the _common_kinds tuple. For example,
    _common_plot_types would be more explicit about what these strings represent.

    pyopenms_viz/_core.py [26]

    -_common_kinds = ("line", "vline", "scatter", "sequence")
    +_common_plot_types = ("line", "vline", "scatter", "sequence")
     
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    Why: The suggestion to rename _common_kinds to _common_plot_types improves code readability by making the purpose of the tuple more explicit, although it is not crucial for functionality.

    7
    Implement a basic plot method for the PLOTLYSequencePlot class

    In the PLOTLYSequencePlot class, consider implementing a basic plot method instead
    of raising a NotImplementedError. This would provide at least some functionality for
    sequence plotting.

    pyopenms_viz/_plotly/core.py [530-532]

     class PLOTLYSequencePlot(PLOTLYPlot, SequencePlot):
         def plot(self):
    -        raise NotImplementedError("Sequence plot is not implemented for Plotly")
    +        fig = go.Figure()
    +        # Add basic sequence plotting logic here
    +        # For example, you could plot the sequence as text
    +        fig.add_trace(go.Scatter(
    +            x=list(range(len(self.data[self.seq_col].iloc[0]))),
    +            y=[0] * len(self.data[self.seq_col].iloc[0]),
    +            text=list(self.data[self.seq_col].iloc[0]),
    +            mode="text",
    +        ))
    +        self.fig = fig
    +        return fig
     
    • Apply this suggestion
    Suggestion importance[1-10]: 6

    Why: While implementing a basic plot method provides some functionality, the suggestion is not critical as the class is intended to be a placeholder, and the implementation may not meet all use cases.

    6
    Use a list comprehension for creating the ys list in C-terminal fragmentation case

    Consider using a list comprehension instead of a for loop to create the xs and ys
    lists. This can make the code more concise and potentially more efficient.

    pyopenms_viz/_matplotlib/core.py [479-490]

     if ion_type in "abc":
         xs = [x_i, x_i, x_i - self.spacing / 2]
         ys = [self.y_pos, self.y_pos + y_i, self.y_pos + y_i]
         nterm = True
    -# C-terminal fragmentation.
     elif ion_type in "xyz":
         xs = [x_i + self.spacing / 2, x_i, x_i]
    -    ys = [self.y_pos - y_i, self.y_pos - y_i, self.y_pos]
    +    ys = [self.y_pos - y_i] * 2 + [self.y_pos]
         nterm = False
     else:
         # Ignore unknown ion types.
         continue
     
    • Apply this suggestion
    Suggestion importance[1-10]: 6

    Why: The suggestion makes the code slightly more concise and potentially more efficient by using a list comprehension, although the improvement is minor.

    6
    Use a ternary operator for setting the vertical alignment in the text annotation

    Consider using a ternary operator for setting the va (vertical alignment) parameter
    in the self.fig.text() call. This can make the code more concise and easier to read.

    pyopenms_viz/_matplotlib/core.py [497-505]

     self.fig.text(
         x_i,
         self.y_pos + (1.05 if nterm else -1.05) * y_i,
         annot,
         color=color,
         fontsize=self.annotation_font_size,
         ha="right" if nterm else "left",
         transform=self.fig.transAxes,
    -    va="top" if not nterm else "bottom",
    +    va="bottom" if nterm else "top",
     )
     
    • Apply this suggestion
    Suggestion importance[1-10]: 5

    Why: The suggestion slightly improves code readability by using a ternary operator for setting the vertical alignment, but the impact is minimal as the original code is already clear.

    5
    Best practice
    Use type hinting and a default value for the seq_col parameter

    Consider using a default value for seq_col in the SequencePlot class initialization.
    This would make it clearer that "sequence" is the expected default column name for
    the sequence data.

    pyopenms_viz/_core.py [471]

    -seq_col="sequence",
    +seq_col: str = "sequence",
     
    • Apply this suggestion
    Suggestion importance[1-10]: 8

    Why: Adding a type hint for seq_col enhances code clarity and maintainability, aligning with best practices for modern Python code.

    8

    💡 Need additional feedback ? start a PR chat

    @jcharkow jcharkow removed the documentation Improvements or additions to documentation label Sep 23, 2024
    Copy link
    Collaborator

    @singjc singjc left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Looks good. I just made a few comments of questions/thoughts I had, but would be okay with merging as is.

    Another overall thought I had, was how these sequence plots would be used? Would they be used an individual standalone figures, or would we want to extend, for example, the Spectrum plots, to allow insetting of the sequence plot inside the spectrum plot?

    Similarly with the molecule compound plots.

    pyopenms_viz/_core.py Show resolved Hide resolved
    pyopenms_viz/_core.py Show resolved Hide resolved
    pyopenms_viz/_matplotlib/core.py Show resolved Hide resolved
    pyopenms_viz/_matplotlib/core.py Show resolved Hide resolved
    @jcharkow jcharkow marked this pull request as draft September 23, 2024 20:54
    @jcharkow
    Copy link
    Collaborator Author

    Looks good. I just made a few comments of questions/thoughts I had, but would be okay with merging as is.

    Another overall thought I had, was how these sequence plots would be used? Would they be used an individual standalone figures, or would we want to extend, for example, the Spectrum plots, to allow insetting of the sequence plot inside the spectrum plot?

    Similarly with the molecule compound plots.

    I was thinking about this as well and I am not sure what is best. I think realistically one would not plot the sequence on its own and it would usually be coupled with a spectrum.

    I decided on the current implementation which allows for placing as its own plot because this would allow for inserting it into SpectrumPlot or ChromatogramPlot so it would not be specific to the spectrum backend.

    However this is kind of messy because it does inherit from basePlot and a lot of the methods and parameters there do not really make sense as this is more of an annotation. Also, the usage of dataframe in does not make a whole lot of sense either.
    Currently, I placed it as its own plot because I was not sure if it would only be useful for spectra or if it is useful for inserting in other plot types as well such as chromatograms.

    There are 2 options which I can think of now for refactoring this:

    1. Create a method in SpectrumPlot for adding the sequence annotation
      Pro: I think this would be a cleaner implementation and also more intuitive for the user.
      Con: Displaying sequence annotations would be limited to spectrum plots.

    2. Create a method in MSPlot() to add the sequence annotation
      Pro: Would allow for adding in Spectrum, Chromatograms, PeakMaps, etc.
      Con: May have to define new parameters for Chromatogram plots
      Con: Unclear where would place in PeakMap plots? Would this be an option?

      I think I am currently in favour of option 2 however not sure what would be best.

    @singjc
    Copy link
    Collaborator

    singjc commented Sep 24, 2024

    I was thinking about this as well and I am not sure what is best. I think realistically one would not plot the sequence on its own and it would usually be coupled with a spectrum.

    I decided on the current implementation which allows for placing as its own plot because this would allow for inserting it into SpectrumPlot or ChromatogramPlot so it would not be specific to the spectrum backend.

    However this is kind of messy because it does inherit from basePlot and a lot of the methods and parameters there do not really make sense as this is more of an annotation. Also, the usage of dataframe in does not make a whole lot of sense either. Currently, I placed it as its own plot because I was not sure if it would only be useful for spectra or if it is useful for inserting in other plot types as well such as chromatograms.

    There are 2 options which I can think of now for refactoring this:

    1. Create a method in SpectrumPlot for adding the sequence annotation
       Pro: I think this would be a cleaner implementation and also more intuitive for the user.
       Con: Displaying sequence annotations would be limited to spectrum plots.
    
    2. Create a method in MSPlot() to add the sequence annotation
       Pro: Would allow for adding in Spectrum, Chromatograms, PeakMaps, etc.
       Con: May have to define new parameters for Chromatogram plots
       Con: Unclear where would place in PeakMap plots? Would this be an option?
       I think I am currently in favour of option 2 however not sure what would be best.
    

    mm, I think the SequencePlot would mostly only be useful for the SpectrumPlot? But I do like option 2, to have it available across the other MSPlots if wanted.

    @jcharkow
    Copy link
    Collaborator Author

    I was thinking about this as well and I am not sure what is best. I think realistically one would not plot the sequence on its own and it would usually be coupled with a spectrum.
    I decided on the current implementation which allows for placing as its own plot because this would allow for inserting it into SpectrumPlot or ChromatogramPlot so it would not be specific to the spectrum backend.
    However this is kind of messy because it does inherit from basePlot and a lot of the methods and parameters there do not really make sense as this is more of an annotation. Also, the usage of dataframe in does not make a whole lot of sense either. Currently, I placed it as its own plot because I was not sure if it would only be useful for spectra or if it is useful for inserting in other plot types as well such as chromatograms.
    There are 2 options which I can think of now for refactoring this:

    1. Create a method in SpectrumPlot for adding the sequence annotation
       Pro: I think this would be a cleaner implementation and also more intuitive for the user.
       Con: Displaying sequence annotations would be limited to spectrum plots.
    
    2. Create a method in MSPlot() to add the sequence annotation
       Pro: Would allow for adding in Spectrum, Chromatograms, PeakMaps, etc.
       Con: May have to define new parameters for Chromatogram plots
       Con: Unclear where would place in PeakMap plots? Would this be an option?
       I think I am currently in favour of option 2 however not sure what would be best.
    

    mm, I think the SequencePlot would mostly only be useful for the SpectrumPlot? But I do like option 2, to have it available across the other MSPlots if wanted.

    Personally I do not visualize sequence plots with chromatograms either so maybe then for maintainability we go with option 1?

    Would be curious to know other's thoughts though

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    3 participants