Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add annotations to FeatureExtractor #761

Merged
merged 3 commits into from
May 23, 2019
Merged

Add annotations to FeatureExtractor #761

merged 3 commits into from
May 23, 2019

Conversation

zurk
Copy link
Contributor

@zurk zurk commented Apr 22, 2019

This PR adds Annotation tool to use instead of VirtualNode class (issue #521).
PR contains:

  • All Annotation classes necessary for format analyzer.
  • AnnotationManager class which will be a replacement for the sequence of VirtualNode-s
  • Most of the methods in the FeatureExtractor rewritten using AnnotationManager.
  • Functions for backward compatibility. They are used to transform AnnotationManager back to vnode sequence. They are going to be deleted after we finish refactoring.

I also run a regression test to see if there any difference in old vnodes and new (restored) vnodes for quality report.
I did not include the test to the PR. It can be found here: zurk@3aed7d1
I see some improvements like there were a couple of bugs related to column position or mixed intention and the rest is the same. Also, the quality was almost the same for the first 15 repos from the report.

The comparison with v0.2.0 report

# Train Report comparison

|               repo |      precision |         recall |    full_recall |             f1 |        full_f1 |           ppcr |          support |     full_support |   Rules Number |   Average Rule Len |
|-------------------:|---------------:|---------------:|---------------:|---------------:|---------------:|---------------:|-----------------:|-----------------:|---------------:|-------------------:|
|          telescope | 1.000 (+0.000) | 1.000 (+0.000) | 0.210 (+0.000) | 1.000 (+0.000) | 0.348 (+0.000) | 0.210 (+0.000) |    203 (     +0) |    965 (     +0) |       2 (  +0) |         2.0 (+0.0) |
|              carlo | 0.934 (-0.043) | 0.934 (-0.043) | 0.623 (+0.053) | 0.934 (-0.043) | 0.747 (+0.027) | 0.667 (+0.083) |   5398 (   +670) |   8095 (     -3) |      34 ( +25) |         5.9 (+0.2) |
|          reveal.js | 0.965 (-0.010) | 0.965 (-0.010) | 0.655 (-0.014) | 0.965 (-0.010) | 0.781 (-0.012) | 0.679 (-0.006) |  10068 (    -97) |  14829 (     +0) |      19 (  -2) |         9.1 (-0.7) |
|   create-react-app | 0.977 (+0.000) | 0.977 (+0.000) | 0.474 (-0.001) | 0.977 (+0.000) | 0.639 (+0.000) | 0.485 (-0.002) |  11230 (    -28) |  23140 (     +0) |      14 (  +0) |         6.4 (+0.0) |
|              axios | 0.980 (+0.000) | 0.980 (+0.000) | 0.760 (+0.000) | 0.980 (+0.000) | 0.856 (+0.000) | 0.776 (+0.000) |  21335 (     +0) |  27482 (     +0) |      14 (  +0) |         7.4 (+0.0) |
|              redux | 0.972 (+0.000) | 0.972 (+0.000) | 0.571 (-0.003) | 0.972 (+0.000) | 0.719 (-0.003) | 0.588 (-0.002) |  23450 (   -104) |  39893 (     +0) |      17 (  +0) |         6.6 (-0.1) |
|              citgm | 0.962 (+0.000) | 0.962 (+0.000) | 0.825 (+0.000) | 0.962 (+0.000) | 0.888 (+0.000) | 0.857 (+0.000) |  20523 (     +0) |  23944 (     +0) |      59 (  +0) |         7.0 (+0.0) |
|          evergreen | 0.974 (+0.001) | 0.974 (+0.001) | 0.827 (+0.000) | 0.974 (+0.001) | 0.894 (+0.000) | 0.849 (-0.001) |  44342 (    -53) |  52233 (     +0) |     197 ( -20) |        11.0 (+0.0) |
| 30-seconds-of-code | 0.948 (+0.000) | 0.948 (+0.000) | 0.889 (+0.000) | 0.948 (+0.000) | 0.918 (+0.000) | 0.938 (+0.000) |  64537 (    -23) |  68832 (     +1) |      48 (  -3) |         7.9 (-0.2) |
|            express | 0.977 (+0.000) | 0.977 (+0.000) | 0.854 (+0.000) | 0.977 (+0.000) | 0.911 (+0.000) | 0.874 (+0.000) |  75137 (     +0) |  85947 (     +0) |      27 (  +0) |         7.8 (+0.0) |
|       freeCodeCamp | 0.960 (+0.000) | 0.960 (+0.000) | 0.828 (+0.000) | 0.960 (+0.000) | 0.889 (+0.000) | 0.863 (+0.000) | 105318 (     +0) | 122044 (     +0) |     153 (  -3) |        12.5 (+0.0) |
|          storybook | 0.973 (+0.000) | 0.973 (+0.000) | 0.832 (+0.000) | 0.973 (+0.000) | 0.897 (+0.000) | 0.855 (+0.000) | 163452 (     +4) | 191183 (     +0) |      57 (  +0) |         9.5 (-0.1) |
|             jquery | 0.977 (-0.002) | 0.977 (-0.002) | 0.885 (+0.006) | 0.977 (-0.002) | 0.929 (+0.003) | 0.907 (+0.009) | 202535 (  +1987) | 223414 (    +30) |     175 (+142) |        12.2 (+2.5) |
|            core-js | 0.985 (+0.000) | 0.985 (+0.000) | 0.959 (-0.001) | 0.985 (+0.000) | 0.972 (+0.000) | 0.974 (-0.001) | 302419 (   -305) | 310633 (     +0) |      66 (  -1) |         8.8 (+0.0) |
|               atom | 0.988 (+0.017) | 0.988 (+0.017) | 0.765 (-0.149) | 0.988 (+0.017) | 0.863 (-0.079) | 0.775 (-0.166) | 460543 ( +34428) | 594570 (+141858) |      62 (-326) |         9.2 (-1.7) | 

# Test Report comparison
|               repo |      precision |         recall |    full_recall |             f1 |        full_f1 |           ppcr |         support |    full_support |   Rules Number |   Average Rule Len |
|-------------------:|---------------:|---------------:|---------------:|---------------:|---------------:|---------------:|----------------:|----------------:|---------------:|-------------------:|
|          telescope | 0.826 (+0.000) | 0.826 (+0.000) | 0.457 (+0.000) | 0.826 (+0.000) | 0.588 (+0.000) | 0.553 (+0.000) |   172 (     +0) |   311 (     +0) |       2 (  +0) |         2.0 (+0.0) |
|              carlo | 0.883 (-0.064) | 0.883 (-0.064) | 0.756 (-0.005) | 0.883 (-0.064) | 0.814 (-0.030) | 0.856 (+0.052) |  1705 (   +104) |  1992 (     +0) |      34 ( +25) |         5.9 (+0.2) |
|          reveal.js | 0.949 (-0.005) | 0.949 (-0.005) | 0.774 (-0.004) | 0.949 (-0.005) | 0.853 (-0.004) | 0.816 (+0.001) |  8813 (    +10) | 10799 (     +0) |      19 (  -2) |         9.1 (-0.7) |
|   create-react-app | 0.912 (-0.002) | 0.912 (-0.002) | 0.746 (+0.002) | 0.912 (-0.002) | 0.821 (+0.001) | 0.818 (+0.003) |  3519 (    +14) |  4303 (     +0) |      14 (  +0) |         6.4 (+0.0) |
|              axios | 0.968 (+0.000) | 0.968 (+0.000) | 0.876 (+0.000) | 0.968 (+0.000) | 0.920 (+0.000) | 0.905 (+0.000) |  5166 (     +0) |  5707 (     +0) |      14 (  +0) |         7.4 (+0.0) |
|              redux | 0.924 (-0.001) | 0.924 (-0.001) | 0.828 (-0.012) | 0.924 (-0.001) | 0.874 (-0.006) | 0.896 (-0.012) |  4892 (    -64) |  5461 (     +0) |      17 (  +0) |         6.6 (-0.1) |
|              citgm | 0.921 (+0.000) | 0.921 (+0.000) | 0.910 (+0.000) | 0.921 (+0.000) | 0.915 (+0.000) | 0.988 (+0.000) |  5047 (     +0) |  5107 (     +0) |      59 (  +0) |         7.0 (+0.0) |
|          evergreen | 0.926 (+0.001) | 0.926 (+0.001) | 0.864 (+0.000) | 0.926 (+0.001) | 0.894 (+0.001) | 0.932 (-0.002) | 19934 (    -44) | 21381 (     +0) |     197 ( -20) |        11.0 (+0.0) |
| 30-seconds-of-code | 0.973 (+0.000) | 0.973 (+0.000) | 0.973 (+0.000) | 0.973 (+0.000) | 0.973 (+0.000) | 1.000 (+0.000) | 11493 (     +0) | 11493 (     +0) |      48 (  -3) |         7.9 (-0.2) |
|            express | 0.958 (+0.000) | 0.958 (+0.000) | 0.890 (+0.000) | 0.958 (+0.000) | 0.922 (+0.000) | 0.929 (+0.000) | 14453 (     +0) | 15557 (     +0) |      27 (  +0) |         7.8 (+0.0) |
|       freeCodeCamp | 0.929 (+0.000) | 0.929 (+0.000) | 0.894 (+0.000) | 0.929 (+0.000) | 0.911 (+0.000) | 0.962 (+0.000) | 23738 (     +0) | 24670 (     +0) |     153 (  -3) |        12.5 (+0.0) |
|          storybook | 0.957 (+0.000) | 0.957 (+0.000) | 0.899 (+0.000) | 0.957 (+0.000) | 0.927 (+0.000) | 0.939 (+0.000) | 33969 (     -1) | 36185 (     +0) |      57 (  +0) |         9.5 (-0.1) |
|             jquery | 0.969 (-0.008) | 0.969 (-0.008) | 0.933 (+0.002) | 0.969 (-0.008) | 0.950 (-0.003) | 0.963 (+0.010) | 46390 (   +478) | 48173 (     +0) |     175 (+142) |        12.2 (+2.5) |
|            core-js | 0.982 (+0.000) | 0.982 (+0.000) | 0.974 (-0.001) | 0.982 (+0.000) | 0.978 (-0.001) | 0.992 (-0.001) | 65767 (    -69) | 66293 (     +0) |      66 (  -1) |         8.8 (+0.0) |
|               atom | 0.961 (+0.012) | 0.961 (+0.012) | 0.937 (-0.007) | 0.961 (+0.012) | 0.949 (+0.002) | 0.975 (-0.019) | 88392 ( +70634) | 90636 ( +72776) |      62 (-326) |         9.2 (-1.7) |

I am going to run a full report for the final version after we merge.

@zurk zurk requested a review from vmarkovtsev April 23, 2019 09:24
@zurk zurk force-pushed the annotations branch 2 times, most recently from cda9f04 to 6538220 Compare April 24, 2019 12:53
@zurk
Copy link
Contributor Author

zurk commented May 14, 2019

@vmarkovtsev ping

lookout/style/format/annotations/annotations.py Outdated Show resolved Hide resolved
lookout/style/format/annotations/annotations.py Outdated Show resolved Hide resolved
lookout/style/format/annotations/annotations.py Outdated Show resolved Hide resolved
lookout/style/format/annotations/annotations.py Outdated Show resolved Hide resolved
lookout/style/format/annotations/annotations.py Outdated Show resolved Hide resolved
lookout/style/format/feature_extractor.py Outdated Show resolved Hide resolved
lookout/style/format/feature_extractor.py Outdated Show resolved Hide resolved
lookout/style/format/feature_extractor.py Outdated Show resolved Hide resolved
lookout/style/format/feature_extractor.py Outdated Show resolved Hide resolved
lookout/style/format/feature_extractor.py Outdated Show resolved Hide resolved
@zurk
Copy link
Contributor Author

zurk commented May 20, 2019

Travis failed because of modelforge issues that should be fixed in src-d/modelforge#98

lookout/style/format/annotations/annotated_data.py Outdated Show resolved Hide resolved
lookout/style/format/annotations/annotated_data.py Outdated Show resolved Hide resolved
lookout/style/format/annotations/annotated_data.py Outdated Show resolved Hide resolved
lookout/style/format/annotations/annotated_data.py Outdated Show resolved Hide resolved
lookout/style/format/annotations/annotated_data.py Outdated Show resolved Hide resolved
lookout/style/format/annotations/annotated_data.py Outdated Show resolved Hide resolved
lookout/style/format/annotations/annotated_data.py Outdated Show resolved Hide resolved
lookout/style/format/annotations/annotated_data.py Outdated Show resolved Hide resolved
lookout/style/format/annotations/annotated_data.py Outdated Show resolved Hide resolved
lookout/style/format/annotations/annotations.py Outdated Show resolved Hide resolved
@zurk
Copy link
Contributor Author

zurk commented May 22, 2019

@vmarkovtsev all good now?

lookout/style/format/feature_extractor.py Outdated Show resolved Hide resolved
lookout/style/format/feature_extractor.py Outdated Show resolved Hide resolved
format.

The function is created for backwards capability and should be removed after refactoring is \
finished.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring should explain how it is different from file_to_old_parse_file_format: the input and output types are exactly the same.

lookout/style/format/feature_extractor.py Outdated Show resolved Hide resolved
lookout/style/format/feature_extractor.py Outdated Show resolved Hide resolved
lookout/style/format/annotations/annotated_data.py Outdated Show resolved Hide resolved
lookout/style/format/annotations/annotated_data.py Outdated Show resolved Hide resolved
lookout/style/format/annotations/annotated_data.py Outdated Show resolved Hide resolved
lookout/style/format/annotations/annotated_data.py Outdated Show resolved Hide resolved
lookout/style/format/annotations/annotated_data.py Outdated Show resolved Hide resolved
@zurk zurk force-pushed the annotations branch 2 times, most recently from 78822d7 to d8793b4 Compare May 22, 2019 15:22
Copy link
Collaborator

@vmarkovtsev vmarkovtsev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inspect=lambda span_start, span_stop: self._check_spans_overlap(
start, stop, span_start, span_stop) if start == stop else
span_start <= start and stop <= span_stop)
if start == stop:
Copy link
Collaborator

@vmarkovtsev vmarkovtsev May 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is performant ofc, but the following is simpler:

def check_cover(span_start, span_stop):
    if start == stop:
        return self._check_spans_overlap(start, stop, span_start, span_stop)
    return span_start <= start and stop <= span_stop

@zurk
Copy link
Contributor Author

zurk commented May 22, 2019

About #761 (comment) I add a comment. if you read both docstrings you should see the difference.

d8793b4#diff-39b3b19d4276f053126096271b399f7fR862

@vmarkovtsev
Copy link
Collaborator

@zurk

The docstring should explain how it is different from file_to_old_parse_file_format: the input and output types are exactly the same.

I do not see the magic keywords: "This function is different from file_to_old_parse_file_format because...". I do not want to diff two docstrings, that's too hard for me :)

@zurk
Copy link
Contributor Author

zurk commented May 23, 2019

ok, I add it.
tell me if there is something more.

If not, let me squash all review commits and rebase on current master before the merge.

zurk added 3 commits May 23, 2019 11:57
Signed-off-by: Konstantin Slavnov <[email protected]>
Signed-off-by: Konstantin Slavnov <[email protected]>
Signed-off-by: Konstantin Slavnov <[email protected]>
@zurk zurk merged commit 5e063e6 into src-d:master May 23, 2019
@zurk
Copy link
Contributor Author

zurk commented May 23, 2019

Finally! 🎉

@zurk
Copy link
Contributor Author

zurk commented May 23, 2019

Now I need to build a new report for the current master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants