Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Title as first element causes shell problems due to leading -- #396

Open
mankoff opened this issue Jul 18, 2024 · 17 comments
Open

Title as first element causes shell problems due to leading -- #396

mankoff opened this issue Jul 18, 2024 · 17 comments
Labels
discussion documentation Improvements or additions to documentation

Comments

@mankoff
Copy link

mankoff commented Jul 18, 2024

I'd like to use this (setq denote-file-name-components-order '(title signature keywords)) as my file name scheme. I note that if identifier is not included or not first, it adds @@ as a field separator, but does not include this if it is the first field.

I suggest when title is the first field, it should also drop the -- separator. Currently, files are named, for example,
--foo@@20240717T222108.org, which is a problem for a lot of bash shell commands.

Is it possible to remove the leading -- ?

@mankoff
Copy link
Author

mankoff commented Jul 18, 2024

I note a comment here #361 (comment) says

I don't see a way around this. If you have a file "some-file@@20240505T111111.org", is "some-file" a title, a keyword or a signature? We could allow one of the component to drop its delimiter if it is in the first position, but this cannot be done for all components at the same time.

But it seems like you could determine what 'some-file' is from the variable denote-file-name-components-order. But even without that introspection, I think I'm asking if the feature suggested above, drop delimiter from first position, exists.

@mankoff
Copy link
Author

mankoff commented Jul 18, 2024

And one more comment. in #332 all of the examples of title-first have no leading --. It makes me think I'm doing something wrong, but have not found mention of this in the manual.

@protesilaos
Copy link
Owner

protesilaos commented Jul 18, 2024 via email

@mankoff
Copy link
Author

mankoff commented Jul 18, 2024

Does it work if you quote the file names?

No. Both "bar" and 'bar' produce --bar.

we cannot know what the previous preference was and if any files were created using that one.

You stress the importance of never changing it enough that one option would be to leave it to the user to adjust files if they change order. Or provide a convenience function to assist in renaming. If the old and new orders are provided, I think this is trivial. If I'm missing something about the complexity here, I vote for adding support for only TITLE. Seems fairly elegant to me.

For shell scripts, this get trickier because the file name alone does not tell us what the component is and then we need some other heuristic.

I'm not even trying to do anything that complicated at the shell. Just grep fails.

@protesilaos
Copy link
Owner

protesilaos commented Jul 18, 2024 via email

@mankoff
Copy link
Author

mankoff commented Jul 18, 2024 via email

@MirkoHernandez
Copy link

I'm not familiar with all the technicalities of the naming convention but here is a suggestion that could help solve this (or related) issues. The approach is to create the more complex regular expressions using rx from basic patterns like denote-id-regexp .

(setq file "20240802T184947--example-file-name__keyword.org")
(setq file2 "--example-file-name__keyword@@20240802T184947.org")
(setq file3 "example-file-name__keyword@@20240802T184947.org")

;; denote-title-text-regexp recreated in rx. 
(setq test-denote-title-regexp
      (rx  (seq (literal "--")
		(group (regexp "[^.]*?"))
		(or (regexp "==.*")
		    (regexp "__.*")
		    (seq (literal "@@")
			 (regexp denote-id-regexp))))))

;; version that captures the title
(setq test-denote-title-regexp2
      (rx  (or (seq (zero-or-one (literal "--"))
		    (group (regexp "[^.]*?"))
		    (zero-or-one (regexp "==.*"))
		    (zero-or-one (regexp "__.*"))
		    (seq (literal "@@")
			 (regexp denote-id-regexp)))
	       (seq (literal "--")
		    (group-n 1 (regexp "[^.]*?"))
		    (or (regexp "==.*")
			(regexp "__.*")
			(seq (literal "@@")
			     (regexp denote-id-regexp)))))))

(and (string-match test-denote-title-regexp2 file3)
 (match-string-no-properties 1 file3))

(and (string-match test-denote-title-regexp2 file2)
 (match-string-no-properties 1 file2))

(and (string-match test-denote-title-regexp2 file)
 (match-string-no-properties 1 file))


;; same performance between denote-title-regexp and rx version
(benchmark 10000
	   (and
	    (string-match
	     test-denote-title-regexp 
	     file)
	    (match-string-no-properties 1 file)))

(benchmark 10000
	   (and
	    (string-match
	     denote-title-regexp 
	     file)
	    (match-string-no-properties 1 file)))

;; No noticeable performance difference for the regex that captures the leading title 
(benchmark 10000
	   (and
	    (string-match
	     test-denote-title-regexp2
	     file3)
	    (match-string-no-properties 1 file3)))

@protesilaos
Copy link
Owner

protesilaos commented Aug 5, 2024 via email

@MirkoHernandez
Copy link

From what I understand, 'rx' is a way to write regular expressions in a
more Lispy way than some long string. But the end result should always
be the same, right?

Yes, exactly.

I have not tried this yet. Can you tell me what difference does it make?

It allows the composition of regular expressions from basic patterns. Since the new denote file name convention allows many combinations of valid file names I though It would be useful to specify these using rx.

A secondary benefit is that many different regular expressions can be bench-marked programmatically.

@protesilaos
Copy link
Owner

protesilaos commented Aug 6, 2024 via email

@protesilaos protesilaos added documentation Improvements or additions to documentation discussion labels Aug 6, 2024
@MirkoHernandez
Copy link

MirkoHernandez commented Aug 6, 2024

On the point of this issue though, 'rx' will not change the status quo,
meaning that users will still need to escape a leading "-" in file names.

A clarification on the rx example. It allows to easily specify "conditions" in regular expressions. The following example matches 3 possible positions for the title (leading title, leading '--' then the title, '--' and title after some other construct). This would have to be repeated for the other components, although basic patterns could be written as variables ("==.", "__.").

I'm not understanding why the regular expression approach is not enough. Lets say there is a leading signature, then the title will have a leading '--', if there is a leading title the signature will have a leading '=='. It seems to me that a complex regex can match all these examples.

(setq test-denote-title-regexp2
      (rx  (or (seq (zero-or-one (literal "--"))
		    (group (regexp "[^.]*?"))
		    (zero-or-one (regexp "==.*"))
		    (zero-or-one (regexp "__.*"))
		    (seq (literal "@@")
			 (regexp denote-id-regexp)))
	       (seq (literal "--")
		    (group-n 1 (regexp "[^.]*?"))
		    (or (regexp "==.*")
			(regexp "__.*")
			(seq (literal "@@")
			     (regexp denote-id-regexp)))))))

@mentalisttraceur
Copy link

Side tip: most shell commands accept an -- argument as an "end of options" indicator: for example ls -- --my-file-name.md.

@mentalisttraceur
Copy link

By the way, I think this is a good example towards what @jeanphilippegg said in an earlier discussion:

I don't think it would be hard to allow titles to drop their delimiters. I had some code to support this, but I removed it before the final pull request. We can see how this is used in practice and let things stabilize.

@mentalisttraceur
Copy link

@MirkoHernandez re:

I'm not understanding why the regular expression approach is not enough. Lets say there is a leading signature, then the title will have a leading '--',

Titles can contain nested -- (you can blame me for that one: #271 ).

So if signatures are allowed to drop leading identifier, there's an ambiguity: foo--qux.md is either titled "foo--qux", or titled "qux" with signature "foo".

(We could look inside the file's frontmatter to disambiguate, but some of us use Denoted naming for things that don't have frontmatter - PDFs, pictures, media, directories, ...; and one of the big benefits of denoted naming so far is that code doesn't need to open the file to know which part of the name is what.)

After surmounting that problem, we still have ambiguities - given foo.md, is "foo" a title, signature, or even ID (if ID is allowed to be anything other than ISO8601 datetime - I don't remember if that's already implemented but it has been discussed favorably).

@protesilaos
Copy link
Owner

protesilaos commented Sep 9, 2024 via email

@mentalisttraceur
Copy link

💡 idea!

  1. We expose a user-configurable regexp (or list of regexps), which defaults to just the current ISO8601 identifier format, to check if an ambiguous first component is an identifier or a title.

  2. Then we can allow dropping the leading -- when title is first (a leading -- could still be supported, providing an escape hatch for the rare cases where the title matches the ID regexp).

  3. Users who want to use Luhmann-style naming can have their Luhmann identifier as their Denote identifier by adding a regexp such as [a-z]([0-9][a-z])*[0-9]?. (This is probably more helpful to those users than having it in the signature - they can use all of Denote's linking machinery for Luhmann-style IDs.)

  4. Ditto for users who have any other ID pattern that can be matched by a regexp.

@jeanphilippegg
Copy link
Contributor

To keep things simple, this is the rule I would implement:

All file components always appear with their delimiters, except in two cases:
- A title as first component
- An identifier **matching the current id format** as first component.

Titles can drop their delimiter as first component, identifiers can become any string, there is no ambiguity and we remain backward-compatible.

But, as you said, once it is done, we cannot allow signatures/keywords to drop their delimiter, as "foo.org" would be ambiguous.

(I did not mention the case of someone who wants to make a title that looks like "20240505T050505". I would ignore this case. It may not even be worth mentioning as a limitation...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

5 participants