Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measure quality metrics on file triplets #520

Open
vmarkovtsev opened this issue Jan 10, 2019 · 3 comments
Open

Measure quality metrics on file triplets #520

vmarkovtsev opened this issue Jan 10, 2019 · 3 comments
Assignees
Labels
enhancement New feature or request large Large size refactor

Comments

@vmarkovtsev
Copy link
Collaborator

Given a modified file and a ground truth fixed file, we apply the model, generate a predicted file, throw away everything except those three files and measure quality metrics.

@vmarkovtsev vmarkovtsev added enhancement New feature or request refactor labels Jan 10, 2019
@vmarkovtsev vmarkovtsev added this to the Refactoring January 2019 milestone Jan 10, 2019
@vmarkovtsev vmarkovtsev added the large Large size label Jan 10, 2019
@EgorBu
Copy link

EgorBu commented Jan 11, 2019

Some discussion is required.

A bit of context:
We have 3 types of reports:

  • quality report
    • measure how well analyzer can reconstruct initial source code (minus of this approach - if analyzer does nothing - it's a perfect reconstruction). Could be used as some kind of measurment of source code consistency.
  • quality report noisy
    • Evaluate how well a given model is able to fix style mistakes randomly added in a repository. & noise is introduced by hands so far
    • count how many errors were corrected by analyzer
    • precision-recall curve with logic that should help to estimate cut-off for rules
  • quality report smoke SmokeEvalFormatAnalyzer consist of 2 parts
    • generate_smoke.py - module to generate several types of mutations in JS files using regexp
    • evaluate_smoke.py - misdetection, undetected, detected_wrong_fix, detected_correct_fix

Quality report noisy requires usage of model (to check quality with different number of rules and so on) - so it looks like that it can't be used with triplets.
Opposite situation with quality report smoke & quality report - they need the model only to generate new contents -> so it can be refactored to use file triplets (withFileFix).

TODO list may look like this:

  1. refactor quality report smoke to utilize ReportAnalyzer(FormatAnalyzerSpy)
    1.1) OPTIONAL: utilize tokenizer and checking for UAST breaking changes to generate new mutants (example) instead of regexp - variety of violations will be much bigger and number of mutants will be equal to n1 * n2 * ... * nk (where n is number of mutants that don't change UAST, the index shows the place - every Noop & every space|newline|tab|quote) - and it's huge.
  2. add noise introduction step or use [ptr_ground_truth, ptr_mutant] instead of 1 ptr right now
# noise could be introduced before providing data_service
for file_fix in self.generate_file_fixes(data_service, changes):
    filepath = file_fix.head_file.path

WDYT @vmarkovtsev , @zurk ?

@zurk
Copy link
Contributor

zurk commented Jan 17, 2019

  1. Quality report noisy requires usage of model ... so it looks like that it can't be used with triplets.

I think the noisy report should not require a model. Because all we need are rules confidence and we have them in the model. To build the curves we just should use many file triplets instead of one. It is the easiest solution and can be time-consuming but as soon as we do not run it frequently should be fine I think.
And in case some information is missing in the model we can consider adding it.

  1. refactor quality report smoke to utilize ReportAnalyzer(FormatAnalyzerSpy).

Yeah, I think this should be done. One issue with ReportAnalyzer that I discover is that in case you want an arbitrary number of reports, you should modify a lot of code. So, first, I'd prefer to polish and make more usable ReportAnalyzer itself.

  1. 1.1) OPTIONAL: utilize tokenizer and checking for UAST breaking changes to generate new mutants

The idea of a smoke dataset is to have a dataset of styles in the first place. if you do random style mutations you lose an overall idea. Random style mutations are closer to the noisy dataset (even by name) or can be used as an independent one. I see big potential for this idea from the early beginning but let's not mix it with the smoke dataset.

  1. I suggest separating this issue (about creating a uniform tool/class to measure quality metrics on file triplets) from noise introduction step because they are about different things. Noise introduction step is a new feature and we should avoid to add more functionality during the refactoring stage.
    Let's create another issue for it and focus on the main problem here.

@vmarkovtsev
Copy link
Collaborator Author

vmarkovtsev commented Jan 17, 2019

(no time to read for now)

If this grows into a huge task, we should do it after resolving the other, smaller issues which were planned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request large Large size refactor
Projects
None yet
Development

No branches or pull requests

3 participants