Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dcl.init.string] Clarify syntactic forms of initializers that are considered as string literals #661

Open
frederick-vs-ja opened this issue Dec 27, 2024 · 7 comments

Comments

@frederick-vs-ja
Copy link

frederick-vs-ja commented Dec 27, 2024

Full name of submitter (unless configured in github; will be published with the issue): Jiang An

Reference (section label): [dcl.init.general], [dcl.init.string]

[dcl.init.general] p16.3:

If the destination type is an array of characters, an array of char8_t, an array of char16_t, an array of char32_t, or an array of wchar_t, and the initializer is a string-literal, see [dcl.init.string].

[dcl.init.string] p1

An array of ordinary character type ([basic.fundamental]), char8_t array, char16_t array, char32_t array, or wchar_t array may be initialized by an ordinary string literal, UTF-8 string literal, UTF-16 string literal, UTF-32 string literal, or wide string literal, respectively, or by an appropriately-typed string-literal enclosed in braces ([lex.string]). [...]

The syntactic forms of such initializers are not very clear and there's implementation divergence.

"Good" forms:

  • char a[] = "foo"; - accepted by all known implementations, and clarified to be valid in notes;
  • char a[]("foo"); - accepted by all known implementations;
  • char a[]{"foo"}; - accepted by all known (C++11 and later) implementations;
  • char a[] = {"foo"}; - accepted by all known (C++11 and later) implementations.

Controversial forms with additonal parentheses:

Weird forms with additional braces:

  • char arr[]{{"foo"}}; - accepted by MSVC and rejected by others;
  • char arr[] = {{"foo"}}; - same as above.

It seems that we should clarify the permitted syntactic forms, which should include "good" and controversial forms and exclude weird forms.

Link to reflector thread (if any):

Issue description:

Suggested resolution:

@jensmaurer
Copy link
Member

The rules seem pretty clear to me; [dcl.init.general] p16.3 refers to string-literal (grammar non-terminal), which leaves no room for parentheses. Similarly, [dcl.init.list] p3.3 handles the brace-enclosed case, also referring to the grammar non-terminal string-literal. I'm not seeing how this could be read as allowing parentheses.

@shafik
Copy link

shafik commented Dec 27, 2024

@jensmaurer I do agree, it looks like it is not allowed, but it does seem like most of the implementations accept the "controversial" forms as an extension. It does indeed look like it is used in the wild:

https://sourcegraph.com/search?q=context:global+lang:c%2B%2B+/%5C%5B%5C%5D%5Cs%2B%5C%3D%5Cs%2B%5C%28%22.*%22%5C%29/+count:11000&patternType=keyword&sm=0

I am not coming up with much background on where this extension came from though.

@jensmaurer
Copy link
Member

jensmaurer commented Dec 27, 2024

@shafik , half of those hits (approximately) appear to be the same (cloned) code from the clang test suite. Also, C23 doesn't mention optional parentheses, either.

It would be good if we could align the C++ answer with the C answer, I think.

@Halalaluyafail3
Copy link

The rules seem pretty clear to me; [dcl.init.general] p16.3 refers to string-literal (grammar non-terminal), which leaves no room for parentheses. Similarly, [dcl.init.list] p3.3 handles the brace-enclosed case, also referring to the grammar non-terminal string-literal. I'm not seeing how this could be read as allowing parentheses.

Shouldn't the wording in [expr.prim.paren] apply here?

A parenthesized expression (E) is a primary expression whose type, result, and value category are identical to those of E. The parenthesized expression can be used in exactly the same contexts as those where E can be used, and with the same meaning, except as otherwise indicated.

Here is an example of this being applied elsewhere:

A null pointer constant is an integer literal ([lex.icon]) with value zero or a prvalue of type std​::​nullptr_t.

All compilers I know of accept code like void*p=(0); even though this paragraph references the term "integer literal" and (0) is not a single integer literal token.

C23 also has similar text which states that a parenthesized expression has the same semantics as the expression which it parenthesizes (see 6.5.2p6). Also as I discovered earlier while making a bug report to GCC about this, GCC accepts using _Generic in this context even though it doesn't accept parentheses.

@frederick-vs-ja
Copy link
Author

The rules seem pretty clear to me; [dcl.init.general] p16.3 refers to string-literal (grammar non-terminal), which leaves no room for parentheses. Similarly, [dcl.init.list] p3.3 handles the brace-enclosed case, also referring to the grammar non-terminal string-literal. I'm not seeing how this could be read as allowing parentheses.

Thanks. But it seems that when the grammar non-terminal initializer corresponds to ("foo") or = "foo", it's not clear that only "foo" is considered to be the initializer in [dcl.init.general] p16.3. Other bullets in [dcl.init.general] p16 seemingly consider outermost enclosing ( ) (for direct-initialization) and leading = (for copy-initialization) as parts of the initializer.

@jensmaurer
Copy link
Member

jensmaurer commented Dec 28, 2024

Right. We had wording issues in that area in the past. So, p16.3 has a genuine bug as far as an initializer can never be a string-literal, grammatically.

This argument does not apply for a brace-enclosed string-literal, though; I think those are properly handled by dcl.init.list p3.3.

@jensmaurer
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants