Improve analysis of files with consistent indentation where offset is guessed wrongly due to merging #68

NateEag · 2023-06-14T15:50:03Z

dtrt-indent (version 20230302.2151) guesses the following file has an offset of 4, when 2 is what I'd expect:

declare namespace NodeJS {
  interface ProcessEnv {
    NODE_ENV: string;

    FOOBAR: string;
    BAR: string;
    BAZ: string;

    TEST?: string;
    TEST2?: string;
    TEST3?: string;

    OPTION: string;
    ANOTHER_OPTION: string;
    MORE_OPTIONS: string;
  }
}

Output of dtrt-indent-diagnosis:

Guessing offset for /Users/nathaneagleson/schedule-engine/servicechannel-integration/test.ts

Elapsed time for analysis: 0.000 seconds

Total relevant lines: 12 out of 18 (limit: 5000)

Histogram:

     2x   2 spaces
    10x   4 spaces

Analysis:

  offset 2 works for 100.00% of relevant lines, matching 2 distinct offsets - merged with offset 4 (16.67% deviation, limit 20.00%)
  offset 4 works for  83.33% of relevant lines, matching 1 distinct offsets - CONSIDERED
  offset 3 works for   0.00% of relevant lines, matching 0 distinct offsets - rejected: too few distinct matching offsets (1 required)
  offset 5 works for   0.00% of relevant lines, matching 0 distinct offsets - rejected: too few distinct matching offsets (1 required)
  offset 6 works for   0.00% of relevant lines, matching 0 distinct offsets - rejected: too few distinct matching offsets (1 required)
  offset 7 works for   0.00% of relevant lines, matching 0 distinct offsets - rejected: too few distinct matching offsets (1 required)
  offset 8 works for   0.00% of relevant lines, matching 0 distinct offsets - rejected: too few distinct matching offsets (1 required)

Summary:

  Best guess is offset 4 with 83.33% matching lines (80.00% required)
  Hard tab percentage: 0.00% (0 lines), -100.00% superior to soft tabs (threshold 300.00%)
  Soft tab percentage: 100.00% (12 lines), inf% superior to hard tabs (threshold 300.00%)

Conclusion:

  Guessed offset 4 with 83% confidence.
  Change indent-tab-setting: yes, to nil

The text was updated successfully, but these errors were encountered:

rrthomas · 2023-06-15T09:26:27Z

This is dtrt-indent working as intended (and the diagnosis tells you what's going on). It can't distinguish between the case where there are a few odd 2-indents somewhere in the middle of a long file (where an indent of 4 is what you really want), and the case where there are a few symmetric ones at each end (as here).

I thought this issue seemed familiar, and indeed, see #29. I'd rather not make dtrt-indent cleverer. On the other hand, decreasing the value of dtrt-indent-max-merge-deviation only works up to a certain point: for arbitrarily long JSON files you'd have to set it to zero.

I can't think of a simple analytical solution to this problem in general that doesn't involve understanding the syntax of the file.

One way to do it by trial and error: run indent-buffer with the first-guessed setting. If the buffer doesn't change, then keep the setting. This could be done in dtrt-indent-analyze when considering whether to merge two settings.

NateEag · 2023-06-15T17:53:05Z

Thanks for the explanation, @rrthomas . That makes sense, and it seemed like a maybe-expected result of the design, but it also surprised me, hence my filing the issue.

I didn't find dtrt-indent-max-merge-deviation when I skimmed for something I could tweak that might help, so thanks, that looks useful.

I don't see a robust, correct analytical answer to the problem either.

Maybe an (opt-in?) dumb heuristic could be good enough. Before merging a lower indent level into a compatible higher indent level, check whether the file's first and last indents match the lower level?

If they do, on average, I bet the lower indent level would be correct.

Your trial-and-error suggestion also makes sense.

rrthomas changed the title ~~dtrt-indent-mode guesses offset of 4 for file where offset of 2 is better~~ Improve analysis of files with consistent indentation where offset is guessed wrongly due to merging Jun 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve analysis of files with consistent indentation where offset is guessed wrongly due to merging #68

Improve analysis of files with consistent indentation where offset is guessed wrongly due to merging #68

NateEag commented Jun 14, 2023

rrthomas commented Jun 15, 2023

NateEag commented Jun 15, 2023

Improve analysis of files with consistent indentation where offset is guessed wrongly due to merging #68

Improve analysis of files with consistent indentation where offset is guessed wrongly due to merging #68

Comments

NateEag commented Jun 14, 2023

rrthomas commented Jun 15, 2023

NateEag commented Jun 15, 2023