Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make --ignore-dir operate the entire path, not just relative #329

Open
jwdevel opened this issue Dec 30, 2020 · 18 comments
Open

Make --ignore-dir operate the entire path, not just relative #329

jwdevel opened this issue Dec 30, 2020 · 18 comments
Labels

Comments

@jwdevel
Copy link

jwdevel commented Dec 30, 2020

I'm not sure the best place to put my comments, so making this issue.
Feel free to close if you feel it's all well enough represented in other places.

I was tracking the trail of ack2#330#7 →split into #88 and #89, and didn't find explicit reference to the notion of matching full directory paths for --ignore-dir in the still-open issues.

I did find the IGNORE.md document, which gives me the impression the path forward is not entirely clear.

Anyway, just wanted to make sure something tracks that feature request, since it's one I'd certainly like (:

@petdance
Copy link
Collaborator

petdance commented Jan 1, 2021

I'm not following. What exactly is it you're wanting ack to do?

@jwdevel
Copy link
Author

jwdevel commented Jan 1, 2021

Sorry, the feature I'm referring to is: using --ignore-dir (or --ignore-path in ack3?) to match against the full path, not just the last trailing component of the path.

For instance, with a dir tree like this:

foo/
storage/
    project_1/
        foo/ ...
        config/ ...
    project_2
        foo/ ...
        config/ ...
    ...

I'd like to be able to write: ack --ignore-dir=match:!storage/[^/]*/foo! and in this way exclude the foo/ dir for each project, while still finding hits inside each of the config/ dirs as well as the top-level foo/ dir.

@petdance
Copy link
Collaborator

petdance commented Jan 1, 2021

Please note that you don't need the ! delimiters when specifying the pattern.

@petdance petdance changed the title commentary on --ignore-dir matching paths Make --ignore-dir operate the entire path, not just relative Jan 1, 2021
@petdance
Copy link
Collaborator

petdance commented Jan 1, 2021

So if I'm understanding correctly, given this setup:

$ mkdir -p foo storage/project_{1,2}/{foo,config}

$ find . -type f

$ find . -type d
.
./storage
./storage/project_2
./storage/project_2/config
./storage/project_2/foo
./storage/project_1
./storage/project_1/config
./storage/project_1/foo
./foo

$ touch foo/thing.txt storage/project_{1,2}/{foo,config}/thing.txt

$ ack -f
foo/thing.txt
storage/project_1/config/thing.txt
storage/project_1/foo/thing.txt
storage/project_2/config/thing.txt
storage/project_2/foo/thing.txt

You want:

$ ack -f --ignore-dir=match:/foo/
foo/thing.txt
storage/project_1/config/thing.txt
storage/project_2/config/thing.txt

@jwdevel
Copy link
Author

jwdevel commented Jan 2, 2021

That's correct, as long as I read the /foo/ to mean / is a path separator (you mentioned regex delimiter no longer needed, so I assume so).

In truth, I'm more interested in excluding via elaborate regex matches rather than including, but I imagine the syntax would be largely the same.

There is also the issue of relative paths and putting such expressions in .ackrc, which is mentioned in that IGNORE.md file I mentioned. Personally, it makes sense to me that such paths would be relative to that .ackrc file, but that's just my 2¢.

@Jimbolino
Copy link

I've had the same problem.
I cannot find a way to exclude files or directories based on the full path:

--ignore-dir=match:/.*\/vendor\/aws\/aws-sdk-php\/src\/data/
--ignore-file=match:/.*\/vendor\/aws\/aws-sdk-php\/src\/data\/.*/

Maybe something like --ignore-path would be possible without braking things?
Same as the find command

@ittayd
Copy link

ittayd commented Jun 19, 2022

+1 from me. In my use case, the top folder of the source tree is 'build', which I want to ignore (and everything under it), but 'build' is such a common name that I don't want to risk someone creating it in the source folders.

@petdance
Copy link
Collaborator

I'm open to suggestions on how this feature would work without breaking existing usage. I'm talking about discussion of how the feature would work, not actual code yet.

@ittayd
Copy link

ittayd commented Jun 20, 2022

If you introduce a new switch, --ignore-path, then it won't affect existing usage

@petdance
Copy link
Collaborator

So let's talk this out in specifics. You have the tree you showed above:

foo/
storage/
    project_1/
        foo/ ...
        config/ ...
    project_2
        foo/ ...
        config/ ...
    ...

So then you'd call ack foo --ignore-path=storage/project_1/config/ ? And it would be relative to... what? The directory you're calling from? The directory that you specify?

Because what if you want to search storage/ and what's below it and ignore project_1/config, how would you call that? ack foo storage/ --ignore-path=project_1/config? Or ack foo storage/ --ignore-path=storage/project_1/config? Is the ignoring relative to the directory you search, or the one that you invoke ack from? And what if you specify multiple paths to start at? I think you would have to make it relative to the current directory, and NOT the target directory you're searching, because there could be multiples.

@ittayd
Copy link

ittayd commented Jun 20, 2022

--ignore-path will be a pattern that would match against the paths that -f would return (including if a folder is specified).

One apprach can ge: if it starts with '/', then it is absolute (same as adding a '^' in a regular expression), otherwise it matches an subpath. So --ignore-path=/storage/project_1/config will ignore that particular folder and --ignore-path=project_1/config will ignore any folder config whose parent is project_1. Then support for * and ** would be good.

Another approach is to support regex. Then the user decides if they want to use '^' or not

@jwdevel
Copy link
Author

jwdevel commented Jun 20, 2022

I think you would have to make it relative to the current directory, and NOT the target directory you're searching, because there could be multiples.

I agree with this, and it's consistent with other unix tools, such as find.

As mentioned earlier, it might make sense to have paths in .ackrc to be relative to the .ackrc file itself, since presumably they are meant to be "project-wide" options, and not dependent on the current working directory, unlike commandline args.

Regarding absolute paths: it might be worthwhile to include support for these, if reasonable. None of my personal use cases for this feature require absolute paths, but I could see it mattering to some people.

One small nuance, here: There are at least 3 different "start of path" anchor points for pattern matching that are worth considering:

  • start of absolute path, as mentioned
  • anywhere in the path (start of any dir)
  • the current working directory - this is akin to rsync's "root of transfer" concept. See the manpage section "INCLUDE/EXCLUDE PATTERN RULES". Personally, I find this concept more important than real absolute paths. Anyone wanting absolute paths could just invoke ack from the root dir (/) to achieve that effect, but the reverse is not true — if abs. paths were supported, but "relative to working dir" were not, then some use cases are lost.

So, to be concrete, in the example hierarchy (with some additions), here are some test cases:

# initial state

$ find .
.
./foo
./foo/storage
./foo/storage/a.txt
./foo/storage/project_foobar
./foo/storage/project_foobar/x.txt
./foo/storage/project_1
./foo/storage/project_1/foo
./foo/storage/project_1/foo/a.txt
./foo/storage/project_1/b.txt
./foo/storage/project_1/foobar
./foo/storage/project_1/foobar/y.txt
./foo/storage/project_1/config
./foo/storage/project_1/config/c.txt
./foo/storage/project_2
./foo/storage/project_2/foo
./foo/storage/project_2/foo/e.txt
./foo/storage/project_2/d.txt
./foo/storage/project_2/config
./foo/storage/project_2/config/f.txt
# Exclude 'foo' dirs that are under 'storage' - but NOT the top-level
# 'foo' dir, nor the 'project_foobar' dir, nor the 'foobar' dir.
# (assuming a regex syntax)
# Note: similar to: find . | grep -v 'storage/[^/]*/foo/'

$ ack -f --ignore-path 'storage/[^/]*/foo/'
./foo/storage/a.txt
./foo/storage/project_foobar/x.txt
./foo/storage/project_1/b.txt
./foo/storage/project_1/foobar/y.txt
./foo/storage/project_1/config/c.txt
./foo/storage/project_2/d.txt
./foo/storage/project_2/config/f.txt
# Exclude any dir that has 'foo' in it, but not the toplevel one
# Note: similar to: find foo | grep -v '/[^/]*foo[^/]*/'

# Note: we search 'foo' not '.', so paths are "foo/..." not "./foo/..."
$ ack -f --ignore-path '/[^/]*foo[^/]*/' foo
foo/storage/a.txt
foo/storage/project_1/b.txt
foo/storage/project_1/config/c.txt
foo/storage/project_2/d.txt
foo/storage/project_2/config/f.txt
# Same as the above example, but we give the starting path '.' instead of 'foo':
$ ack -f --ignore-path '/[^/]*foo[^/]*/' .

# <empty results>

For @ittayd's example of a build dir, I think the options would be:

  1. Put in .ackrc the option: `--ignore-path "^build/", and that would be relative to the .ackrc file.
  2. Use that same option when invoking ack manually from the top-level dir
  3. If invoking from a lower dir, like ack ../.. --ignore-path '^\.\./\.\./build', but the downside there is you need to change the pattern to match the number of .. of the target dir, which is annoying — part of the reason I like the .ackrc option.

Note: None of the above examples use true "absolute path". They all treat the invocation directory as the base dir (akin to find's output). Again, there might be some real use case for true absolute paths, so might be worth some extra syntax to support it, but as a mitigation, people can still just invoke from / to achieve that effect.

Some open questions, in my mind:

  • Support a glob-style syntax (eg: * and **)? Makes typing easier for common cases, but not as powerful as regex.
  • "anchored" regex by default? I.e. do people need to include ^ and $? I kind of like Mercurial's approach here, where paths in .hgignore are not anchored, but paths on the commandline are anchored to the invocation dir.

@jwdevel
Copy link
Author

jwdevel commented Jun 20, 2022

Oh, and to specifically answer this bit, with the above in mind:

Because what if you want to search storage/ and what's below it and ignore project_1/config, how would you call that? ack foo storage/ --ignore-path=project_1/config? Or ack foo storage/ --ignore-path=storage/project_1/config? Is the ignoring relative to the directory you search, or the one that you invoke ack from? And what if you specify multiple paths to start at? I think you would have to make it relative to the current directory, and NOT the target directory you're searching, because there could be multiples.

I would imagine, from the top dir: ack storage --ignore-path=storage/project_1/config

Or if you wanted to ignore all config dirs: ack storage --ignore-path=/config/ (exact syntax TBD, of course - this assumes a "non-anchored" regex)

So, going back to a high-level description, my leanings are:

  • patterns match paths which are relative to the invocation dir by default (akin to the output of find).
    • Maybe some syntax for true abs dir?
    • .ackrc patterns are relative to the dir containing the .ackrc file
  • patterns are non-anchored by default; user can add ^ or $ if they want
    • maybe patterns inside .ackrc files are anchored? No strong opinion, here.
  • Support regex, but maybe also support simpler glob-style * and **?

@ittayd
Copy link

ittayd commented Jun 20, 2022

I'm not sure why this needs to be so complicated. Ack already has an established way of discovering paths. '-f' will output them. All --ignroe-path needs to do is to match against that list. Having it be a regex seems inconvenient as users would need to pepper ''[^/]". Seems to me that either globbing, or a set of regexs separated by '/' will be OK. (For the latter it means the pattern 'f.*o/b.*r' will be split to 'f.*o' and 'b.*r' which will be patched on consecutive path elements which of course makes the whole thing more complex/expensive (unless there's a nice way to create an FSM that treats only verbatim slashes)

@petdance
Copy link
Collaborator

I'm not sure why this needs to be so complicated.

I hope it's not, but past experience has shown that what seems simple is rarely so.

Matching --ignore-path vs. what shows up in -f makes sense, if we're OK with that --ignore-path matching everything that ack matches, even if there are multiple starting points. It would certainly be simplest.

ack does not do globbing at all in its matching.

@jwdevel
Copy link
Author

jwdevel commented Jun 20, 2022

I'm not sure why this needs to be so complicated.

Sorry, I didn't mean my long discussion to make it seem hairier than it is (:

I think the essence of the examples I wrote is "make it behave much like find . | grep -v .... Which is I think what you are saying, too (replace find with ack -f). But, there are a handful of edge cases to consider.

FWIW, I am fine with no globbing (regex is strictly more powerful anyway). I do think the "relative to the .ackrc file" concept is useful, when invoking ack from various subdirs of an over-arching project. But not strictly necessary, either.

@petdance
Copy link
Collaborator

Thing is that there can be multiple .ackrc files. And even if there's only one, there's no rule that says it has to be in the root of the project.

"make it behave much like find . | grep -v ..." is a good principle, since that's basically how ack -g works. It's ack -f | ack pattern.

@jwdevel
Copy link
Author

jwdevel commented Jun 20, 2022

there can be multiple .ackrc files. And even if there's only one, there's no rule that says it has to be in the root of the project.

Yes, I understand. But I don't think it is necessarily a problem? Conceptually, I imagine each "--ignore-path=..." option found in an .ackrc file would get "normalized" to be relative to that file, when the final matching happens.

For instance, when ack finally does its matching work, I am imagining it has gathered a list of (pattern, basePath). Then, to decide if a dir matches, it would (1) resolve that path relative to basePath, then (2) try to apply the pattern. In the default case (not coming from an .ackrc file), basePath is just the current working directory. Maybe this is too inefficient and/or ack's code is not set up to deal with this — fine by me if the answer is "not worth the complexity"; I'm just trying to describe the case clearly, is all (:

And also, just as I'm thinking about it, I could imagine cases where people do want to put paths in their .ackrc that "follow you around" as you change dirs, rather than being relative to the .ackrc file. Then perhaps it is getting into "too complicated" territory, where there would need to be different syntax to support both cases...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants