This proposal specifies lexical rules for constant strings in Carbon.
We wish to provide a syntax for writing literals containing human-readable text.
Note that "human-readable text" here should be understood broadly: such text may be subject to further processing, and may in some cases be intended to be interpreted by a computer rather than by a human (such as a regular expression, program source code, or a C++ mangled name), but broadly represents a sequence of characters rather than arbitrary binary data.
Such text is typically represented in an encoding, which is a bidirectional mapping between a sequence of characters in text and a sequence of bounded integer values known as code units, suitable for storage, transmission, and processing. For example, the Russian word углерод (carbon) is encoded in the UTF-8 encoding as D1168316 D016B316 D016BB16 D016B516 D1168016 D016BE16 D016B416.
See Comparison of programming languages (strings) on Wikipedia.
Simple string literals are specified in most programming languages as text
delimited by double-quote characters, "like this"
. Such string literals
usually are restricted to begin and end on the same source line. Three
additional features are commonly seen:
-
Escape sequences, which permit string literals to include characters that are difficult to type, are ambiguous for the reader, or that would be problematic in some way (such as whitespace characters, characters that are invisible, and characters that change how other characters are rendered), and also to include arbitrary code units. One common convention is to use
\
to introduce an escape sequence, where, for example:\n
represents a newline character,\u1234
represents the Unicode character U+1234,\xAB
represents the code unit AB16,\"
represents a single"
character and does not terminate the string literal,\\
represents a single\
character,
and so on.
The set of single-letter escape sequences has a lot of commonality between languages, with some variation between older and newer languages:
- C++ and Python allow
\a
,\b
,\f
,\n
,\r
,\t
,\v
for bell, backspace, form feed, new line, carriage return, tab, and vertical tab, respectively. - JavaScript drops support for
\a
. - Java additionally drops support for
\v
. - Rust and Swift additionally drop support for
\b
and\f
, leaving only\n
,\r
, and\t
.
The rules for numeric escape sequences differ between kinds of escape sequence and between languages. The rules in C++, JavaScript, Python, Rust, and Swift are as follows:
\123
is interpreted as a octal code unit value, and up to three octal digits are consumed. In JavaScript and C++, values greater than 3778 (25510) are invalid (assuming an 8-bit character type). In Python, values greater than 3778 are interpreted modulo 256. Rust and Swift do not allow octal escapes in general, but do allow\0
as a special case.\xAB
is interpreted as a hexadecimal code unit value. In C++, any nonzero number of hexadecimal digits can follow as part of the escape sequence. In Python, JavaScript, and Rust, exactly two digits are required. In Rust, the value must be less than or equal to 7F16 except forb
-prefixed strings (byte array literals). Swift does not support this form of escape sequence.\uABCD
is interpreted as a hexadecimal code point value. In C++, Python, and JavaScript, exactly four hexadecimal digits can follow, but JavaScript allows any nonzero number of digits to be specified using\u{ABCDE}
notation. Rust and Swift support only the\u{ABCDE}
notation.\U0010FFFD
is interpreted as a hexadecimal code point value in C++ and Python, but not in JavaScript, Rust, or Swift, and permits exactly eight hexadecimal digits.
-
Raw string literals, in which escape sequences are not recognized. These are often used in situations where escape sequences are undesirable, but in which the escape character or regular string terminator is used frequently. Such literals are useful when embedding one machine-readable language in another, when those languages share some escaping conventions. Such functionality may also provide a way to customize the string delimiters.
- In Python, raw string literals have an
r
prefix:r"li\ngo"
is a six character string whose third character is\
. - In C++, raw string literals have an
r
prefix, along with a custom delimiter (which may be empty):r"DELIM(li\ngo)DELIM"
is a six character string (plus a nul terminator). - In Rust, raw string literals have an
r
prefix and any matching number of#
characters enclose the string contents: the third character ofr"lo\ng"
is\
, and the second character ofr#" " "#
is"
. - In Swift, raw string literals are a generalization of regular string
literals: literals are introduced by any number, N, of
#
characters followed by a"
, terminated by"
followed by N#
characters, and escape sequences are introduced by a\
followed by N#
characters:#" " # \n \#n \#\# "#
has the same contents as the C++ string literal" \" # \\n \n \\# "
. - In Java, raw string literals are delimited by a sequence of one or more
backticks instead of double quotes: the fourth character of
``foo`\bar``
is a backtick and the fifth is a backslash.
- In Python, raw string literals have an
-
Multiline string literals provide a mechanism for a string to easily span more than one line of source text.
- In C++, Rust, and Java, raw string literals are used to represent multiline string literals.
- In Python, different delimiters (
"""
or, in Python,'''
instead of"
or'
) are used to represent multiline string literals, and plain"
and""
can thereby appear in the string contents, but these literals otherwise behave the same as regular string literals. - In Swift, different delimiters are used (
"""
instead of"
), but unlike in Python, the string content cannot be on the same line as the delimiters, and the resulting mandatory leading and trailing newlines are not included in the string. Internal newlines can be removed by preceding them with backslashes. - In JavaScript, backtick-delimited strings can contain newlines; this syntax also allows string interpolation as described below.
In addition, some languages, primarily scripting languages, also provide a
mechanism for string interpolation, wherein a string value is formed by
including the formatted values of some variables in a given format string. For
example, "Hello, $planet."
might produce a string value including the
formatted value of the variable named planet
. Such interpolation facilities
are outside the scope of this proposal.
-
Single-line string literals are delimited by
"
s:"hello"
-
Multi-line string literals are introduced by a
"""
followed by a newline and terminated by a line beginning with a"""
. The indentation of the terminating line is removed from all preceding lines:var String: henry_vi = """ The winds grow high; so do your stomachs, lords. How irksome is this music to my heart! When such strings jar, what hope of harmony? I pray, my lords, let me compound this strife. -- History of Henry VI, Part II, Act II, Scene 1, W. Shakespeare """;
Only the final line of this string literal begins with whitespace. The opening newline is not part of the string's contents, but the trailing newline is; the first character of this example string is
T
and the last character is a newline. -
The opening
"""
of a multi-line string literal can be followed by a file type indicator, to assist tooling in understanding the intent of the string. This indicator has no effect on the meaning of the program.var String: cpp_snippet = """c++ #include <iostream> int main() { std::cout << "hello world" << std::endl; } """;
-
Escape sequences are introduced with a
\
character; the most common C and C++ escape sequences are supported:"hello\nworld"
. Octal escapes (\177
),\a
,\b
,\f
and\v
are removed.\uNNNN
and\U00NNNNNN
are replaced by\u{NNNNNN}
. An escape sequence\<newline>
is permitted in multi-line string literals, and results in no string contents. -
Raw string literals are supported, for both the single and multi-line case, following the Swift convention: they are introduced by prefixing the opening delimiter with one or more
#
s, and suffixing the closing delimiter with a matching number of#
s:#"foo\s*bar"#
or#"foo"bar"#
. Escape sequences can be introduced in a raw string literal by inserting a matching number of#
s after the\
character:#"foo\#nbar"#
contains a newline character. -
Unlike in C and C++, adjacent string literals are not implicitly concatenated.
A simple string literal is formed of a sequence of
- characters other than backslashes, double quotation marks, and vertical whitespace
- escape sequences
enclosed in double quotation marks ("
). Each escape sequence is replaced with
the corresponding character sequence or code unit sequence.
var String: lucius = "The strings, my lord, are false.";
A block string literal starts with three double quotation marks, followed by
an optional file type indicator, followed by a newline, and ends at the next
instance of three double quotation marks whose first "
is not part of a \"
escape sequence. The closing """
shall be the first non-whitespace characters
on that line. The lines between the opening line and the closing line
(exclusive) are content lines. The content lines shall not contain \
characters that do not form part of an escape sequence.
The indentation of a block string literal is the sequence of horizontal
whitespace preceding the closing """
. Each non-empty content line shall begin
with the indentation of the string literal. The content of the literal is formed
as follows:
- The indentation of the closing line is removed from each non-empty content line.
- All trailing whitespace on each line, including the line terminator, is replaced with a single line feed (U+000A) character.
- The resulting lines are concatenated.
- Each escape sequence is replaced with the corresponding character sequence or code unit sequence.
A content line is considered empty if it contains only whitespace characters.
var String: w = """
This is a string literal. Its first character is 'T' and its last character is
a newline character. It contains another newline between 'is' and 'a'.
""";
// This string literal is invalid because the """ after 'closing' terminates
// the literal, but is not at the start of the line.
var String: invalid = """
error: closing """ is not on its own line.
""";
A file type indicator is any sequence of non-whitespace characters other than
"
or #
. The file type indicator has no semantic meaning to the Carbon
compiler, but some file type indicators are understood by the language tooling
(for example, syntax highlighter, code formatter) as indicating the structure of
the string literal's content.
// This is a block string literal. Its first two characters are spaces, and its
// last character is a line feed. It has a file type of 'c++'.
var String: starts_with_whitespace = """c++
int x = 1; // This line starts with two spaces.
int y = 2; // This line starts with two spaces.
""";
The file type indicator might contain semantic information beyond the file type itself, such as instructions to the code formatter to disable formatting for the code block.
Open question: This proposal does not suggest any concrete set of recognized file type indicators. It would be useful to informally specify a set of well-known indicators, so that tools have a common understanding of what those indicators mean, perhaps in a best practices guide.
Within a string literal, the following escape sequences are recognized:
Escape | Meaning |
---|---|
\t |
U+0009 CHARACTER TABULATION |
\n |
U+000A LINE FEED |
\r |
U+000D CARRIAGE RETURN |
\" |
U+0022 QUOTATION MARK (" ) |
\' |
U+0027 APOSTROPHE (' ) |
\\ |
U+005C REVERSE SOLIDUS (\ ) |
\0 |
Code unit with value 0 |
\xHH |
Code unit with value HH16 |
\u{HHHH...} |
Unicode code point U+HHHH... |
\<newline> |
No string literal content produced (block literals only) |
This includes all C++ escape sequences except:
\?
, which was historically used to escape trigraphs in string literals, and no longer serves any purpose.\ooo
octal escapes, which are removed because Carbon does not support octal literals;\0
is retained as a special case, which is expected to be important for C interoperability.\uABCD
, which is replaced by\u{ABCD}
.\U0010FFFF
, which is replaced by\u{10FFFF}
.\a
(bell),\b
(backspace),\v
(vertical tab), and\f
(form feed).\a
and\b
are obsolescent, and\f
and\v
are largely obsolete. These characters can be expressed with\x07
,\x08
,\x0B
, and\x0C
respectively if needed.
Note that this is the same set of escape sequences supported by
Swift
and Rust, except that, unlike
in Swift, support for \xHH
is provided.
While this proposal takes a firm stance on not permitting octal escape
sequences, the decision to not allow \1
..\7
, and more generally to not treat
\DDDD
as a decimal escape sequence, is experimental.
In the above table, H
represents an arbitrary hexadecimal character, 0
-9
or A
-F
(case-sensitive). Unlike in C++, but like in Python, \x
expects
exactly two hexadecimal digits. As in JavaScript, Rust, and Swift, Unicode code
points can be expressed by number using \u{10FFFF}
notation, which accepts any
number of hexadecimal characters. Any numeric code point in the ranges
016-D7FF16 or E00016-10FFFF16 can be
expressed this way.
Open question: Some programming languages (notably Python) support a
\N{unicode character name}
syntax. We could add such an escape sequence, but
this proposal does not include one. Future proposals considering adding such
support should pay attention to work done by C++'s Unicode study group in this
area.
The escape sequence \0
shall not be followed by a decimal digit. In cases
where a null byte should be followed by a decimal digit, \x00
can be used
instead: "foo\x00123"
. The intent is to preserve the possibility of permitting
decimal escape sequences in the future.
A backslash followed by a line feed character is an escape sequence that
produces no string contents. This escape sequence is experimental, and can
only appear in multi-line string literals. This escape sequence is processed
after trailing whitespace is replaced by a line feed character, so a \
followed by horizontal whitespace followed by a line terminator removes the
whitespace up to and including the line terminator. Unlike in Rust, but like in
Swift, leading whitespace on the line after an escaped newline is not removed,
other than whitespace that matches the indentation of the terminating """
.
A character sequence starting with a backslash that doesn't match any known
escape sequence is invalid. Whitespace characters other than space and, for
block string literals, new line optionally preceded by carriage return are
disallowed. All other characters (including non-printable characters) are
preserved verbatim. Because all Carbon source files are required to be valid
sequences of Unicode characters, code unit sequences that are not valid UTF-8
can only be produced by \x
escape sequences.
The choice to disallow raw tab characters in string literals is experimental.
var String: fret = "I would 'twere something that would fret the string,\n" +
"The master-cord on's \u{2764}\u{FE0F}!";
// This string contains two characters (prior to encoding in UTF-8):
// U+1F3F9 (BOW AND ARROW) followed by U+0032 (DIGIT TWO)
var String: password = "\u{1F3F9}2";
// This string contains no newline characters.
var String: type_mismatch = """
Shall I compare thee to a summer's day? Thou art \
more lovely and more temperate.\
""";
var String: trailing_whitespace = """
This line ends in a space followed by a newline. \n\
This line starts with four spaces.
""";
In order to allow strings whose contents include backslashes and double quotes,
the delimiters of string literals can be customized by prefixing the opening
delimiter with N #
characters. A closing delimiter for such a string is only
recognized if it is followed by N #
characters, and similarly, escape
sequences in such string literals are recognized only if the \
is also
followed by N #
characters. A \
, "
, or """
not followed by N #
characters has no special meaning.
Opening delimiter | Escape sequence introducer | Closing delimiter |
---|---|---|
" / """ |
\ (for example, \n ) |
" / """ |
#" / #""" |
\# (for example, \#n ) |
"# / """# |
##" / ##""" |
\## (for example, \##n ) |
"## / """## |
###" / ###""" |
\### (for example, \###n ) |
"### / """### |
... | ... | ... |
For example:
var String: x = #"""
This is the content of the string. The 'T' is the first character
of the string.
""" <-- This is not the end of the string.
"""#;
// But the preceding line does end the string.
// OK, final character is \
var String: y = #"Hello\"#;
var String: z = ##"Raw strings #"nesting"#"##;
var String: w = #"Tab is expressed as \t. Example: '\#t'"#;
Note that both a single-line raw string literal and a multi-line raw string
literal can begin with #"""
. These cases can be distinguished by the presence
or absence of additional "
s later in the same line:
- In a single-line raw string literal, there must be a
"
and one or more#
s later in the same line terminating the string. - In a multi-line raw string literal, the rest of the line is a file type
indicator, which can contain neither
"
nor#
.
// This string is a single-line raw string literal.
// The contents of this string start and end with exactly two "s.
var String: ambig1 = #"""This is a raw string literal starting with """#;
// This string is a block raw string literal with file-type 'This',
// whose contents start with "is a ".
var String: ambig2 = #"""This
is a block string literal with file type 'This', first character 'i',
and last character 'X': X\#
"""#;
// This is a single-line raw string literal, equivalent to "\"".
var String: ambig3 = #"""#;
A string literal results in a sequence of 8-bit bytes. Like Carbon source files,
string literals are encoded in UTF-8. This proposal includes no mechanism to
request that any other encoding is used. The expectation is that if another
encoding is needed, the string literal can be transcoded from UTF-8 during
compilation. There is no guarantee that the string is valid UTF-8, however,
because arbitrary byte sequences can be inserted by way of \xHH
escape
sequences.
This decision is experimental, and should be revisited if we find sufficient
motivation for directly expressing string literals in other encodings.
Similarly, as library support for a string type evolves, we should consider
including string literal syntax (perhaps as the default) that guarantees the
string content is a valid UTF-8 encoding, so that valid UTF-8 can be
distinguished from an arbitrary string in the type system. In such string
literals, we should consider rejecting \xHH
escapes in which HH is greater
than 7F16, as in Rust.
We could avoid including a block string literal in general, and instead construct multi-line strings by string concatenation, with either C-style juxtaposition or with an explicit concatenation operator. But doing so would be more verbose and would make the expression of the source code be further from the programmer's intent.
We could use raw string literals to provide block string literal syntax, as C++ does. However, this couples two orthogonal choices: whether escape sequences should be recognized and whether the string is intended to span multiple lines. In C++ code, the inability to use escape sequences in multi-line string literals sometimes awkward. For example:
std::string make_rule = "%s: %s\n\t$(CC) -c -o $@ $< $(CFLAGS)\n\n"
"main:\n\t$(CC) %s -o %s\n";
can be written under this proposal as
var String: make_rule = """make
%s: %s
\t$(CC) -c -o $@ $< $(CFLAGS)
main:
\t$(CC) %s -o %s
""";
improving readability while still making the semantically-meaningful presence of tabs visible even in editors / code browsers that do not distinguish tabs from spaces.
Block string literals could use explicit characters in the body to indicate the amount of leading whitespace to be removed:
var String: x = """
| starts with two spaces.
""";
This would allow the correct indentation to be determined as soon as the first
line after the opening """
is seen. However, this adds lexical complexity, and
harms the ability to copy-paste string contents into other contexts.
We could choose to exclude the trailing newline (like in Swift). Informal surveys suggest that expectations for whether to include or exclude the trailing newline vary.
The intended use case for block string literals is to represent multi-line strings. When forming a single larger string from concatenation of multi-line string literals, including the trailing newline but not the leading newline -- or, more generally, including a newline at the end of each line in the string -- provides the best alignment between the source-level syntax and the result. For example:
fn Run() {
print("""c++
class X {
public:
X() {}
""");
for (var String: decl in GetMemberDecls()) {
print decl;
}
print("""c++
private:
""");
print(GetFields());
print("""c++
};
""");
}
As is, the output printed by this example can be visualized by ignoring all
lines other than those between the """
s. If we excluded the trailing newline,
additional blank lines would be required at the end of each string literal,
harming the readability of the example.
We could support octal escape sequences, as many C family languages do. However,
they are considered antiquated in C++ code, and supporting them would be
inconsistent with our decision to not support octal numeric literals. A quick
informal poll suggests that many C++ programmers do not realize that \123
is
an octal escape sequence, not a decimal one.
We could support \123
as a decimal escape sequence. However, doing so may lead
to surprise when migrating C++ code to Carbon. This possibility should be
revisited once Carbon matures and we have a better idea of how the migration
process is expected to proceed.
We could allow arbitrary-length \x
escape sequences, as C++ does, and include
some explicit mechanism to terminate such a sequence. For example, we could
treat \<whitespace>
for an arbitrary whitespace character in the same way we
treat \<newline>
, and use "\xAB\ C"
to terminate an escape sequence
prematurely. However, this is unnecessarily inventive, and the Python approach
of requiring exactly two hexadecimal characters after \x
is adequate, assuming
we do not intend to support string literal element types other than 8-bit bytes.
If we do find we want to support wider element types in future (for example, if
we want to add a UTF-16 or UTF-32 string literal), \x{ABCD}
can be used.
We could permit \<newline>
even in non-block string literals to allow them to
be line-wrapped, but there seems to be little benefit to doing so, as a block
string literal can always be used instead.
We could adopt Python's \N{unicode character name}
syntax. But there is no
pressing need to add such a syntax imminently, and concerns have been raised
both over the exact ways in which characters are named and over compatibility
with upcoming C++ language extensions in this area, so this syntax is not being
proposed at this time.
We could allow an \e
escape sequence for the U+001C ESCAPE character. This
character is a common extension in C and C++ compilers, and appears to primarily
be used to hardcode ANSI terminal escape sequences, such as
"\e[32mgreen text\e[0m"
. This proposal doesn't explicitly reject this idea.
However, if we consider adopting such an extension in the future, we should
consider whether a library facility for simple terminal operations would be a
more valuable addition than this escape sequence.
We could retain the \uNNNN
escape sequence to give a terser notation for the
common case of a Unicode code point that is most naturally written as four
hexadecimal digits -- that is, all code points in the Basic Multilingual Plane.
This is the approach taken by JavaScript. However, following Swift and Rust in
permitting only \u{NNNN}
is simpler and avoids redundancy. We expect explicit
\u
escapes to be rare: we expect regular, printable Unicode characters that
are normalized in NFC to be written directly in the source file, rather than
spelled with \u
escapes. Explicit \u
escapes would be useful where the code
point value is important -- for example, in test data -- or where special
characters such as directionality markers or non-normalized characters are
desired, but for such uses, the longer \u{NNNN}
form seems adequate.
The approach to raw string literals in this proposal is based on Swift's raw strings facility.
We could use a different mechanism other than a sequence of #
s to support
nesting raw string literals. For example, we could adopt something like C++'s
semi-arbitrary delimiters R"foo(string contents)foo"
. However, this level of
customizability seems unwarranted: raw string literals are unlikely to nest more
than one or two levels deep, so using #"..."#
, ##"..."##
, ###"..."###
for
successive nesting levels seems unproblematic, and removes the need for the
programmer to make an arbitrary choice.
We could use a delimiter other than #
to demarcate raw string literals. At
this stage in Carbon's development, we don't know exactly which characters will
be useful in operators, but it seems reasonable to assume a mostly C++-like
operator set, which gives us a variety of characters that cannot appear
immediately before a string literal: at least @
#
$
)
]
}
\
.
all appear likely to be available. Of these, \
is likely to be problematic due
to its use in escape sequences, closing brackets followed by string literals
might one day be useful in some grammar constructs, and .
seems a little too
close to resembling designator. That leaves @
, #
, and $
, and multiple
existing languages have used #
for this purpose.
We could disallow use of N #
s as a delimiter if a lower value of N would
work. However, this would make the language brittle under maintenance: removing
the last nested string literal from a quoted block of code would require
changing the delimiters.
We could use Rust-style raw strings, which add a leading r
, permit zero #
s
to be used in raw strings, and do not provide a facility for escape sequences in
raw strings. There are several reasons to prefer Swift-style raw strings:
- Swift raw strings are not a distinct language feature; rather, they are a
generalization of non-raw strings -- non-raw strings are simply the case
where the number of
#
characters in the delimiters and escape sequence introducer is zero. - Permitting escape sequences even in raw strings means that there is no loss of functionality when using a raw string, and code changes under maintenance that would require use of a facility only available by way of an escape sequence (such as inclusion of trailing whitespace in a block string literal) do not force a reversion to a non-raw string literal.
There are also several reasons to prefer the Rust-style approach:
- In the most common case, a Rust raw string will be one character shorter.
- Less lexical space is used: Swift-style raw strings remove the possibility
of using
#
as a prefix operator, whereas the leadingr
in Rust-style raw strings does not. - Certain character sequences are hard to express in Swift-style raw strings.
Specifically, string literals such as
"\\################"
cannot readily be expressed as a raw string. Such string literals are extremely rare, but not nonexistent, in one large sample C++ corpus.
If the final issue is concerning, we have a path to address it, by specifying
that \
followed by N+1 or more #
s is left alone, just like \
followed by N-1 or fewer #
s is left alone. Formally, this can be
accomplished by defining \#
as an escape sequence that expands to itself, that
is, to a backslash followed by N+1 #
characters.
We could preserve trailing whitespace in at least raw block string literals, and perhaps in all block string literals. However, this would mean that visually identical programs could have different meanings, and even that transformations performed automatically on save by some editors (removing trailing whitespace) could change the meaning of a program. It might also mean that raw string literals are no longer a generalization of non-raw string literals.
Under this proposal, trailing whitespace can be included in a block string
literal by following it with \n\
:
var String: authors = """markdown
*Authors*: \n\
Me <[email protected]> \n\
Someone Else <[email protected]>
""";
In a single-#
raw string literal, the same can be accomplished with the
more-verbose terminator \#n\#
, and so on.
Raw block string literals could preserve the form of vertical whitespace used to terminate each line. This would allow uncommon forms of vertical whitespace (for example, vertical tab and form feed) to be included in raw string literals, but would create a risk that the meaning of a program would be different when the source code is checked out on an operating system that uses line feed as a line terminator versus when the source code is checked out on an operating system that uses a different line terminator (such as carriage return followed by line feed). This would also mean that raw string literals are no longer a generalization of non-raw string literals.
We could allow raw tab characters in string literals. However, raw tab
characters harm the readability of the program, and we would like to encourage
the use of \t
escapes instead in situations where they are available, even if
this means that the more verbose form \#t
needs to be used in raw string
literals.
This proposal supports the goal of making Carbon code easy to read, understand, and write, by ensuring that essentially every kind of string content can be represented in a Carbon string literal, in a way that is natural, toolable, and easy to read:
- Multi-line strings are supported by multi-line string literals, and the rules for stripping leading indentation enhance readability by allowing those literals to avoid visually disrupting the indentation structure of the code.
- Strings that make extensive use of
\
and"
are supported by raw string literals. - Treating raw versus ordinary and single-line versus multi-line as orthogonal allows Carbon to support all 4 combinations while keeping the language simple.
- The handling of
\#
within raw string literals makes it possible to use escape sequences within raw string literals when necessary, for example to embed arbitrary byte values or Unicode data. This ensures that the programmer is never prevented from using a raw string literal, or forced to assemble a single logical string by concatenating ordinary and raw literals (with the negligible and fixable exception of strings like"\\################"
, as noted in the proposal). - "File type indicators" make it easier for tooling to understand the contents of literals, in order to provide features such as syntax highlighting, automated formatting, and potentially even certain kinds of static analysis, for code that's embedded in string literals.
- Support for non-Unicode strings by way of
\x
ensures "support for software outside the primary use case". - Avoids unnecessary invention, following Rust and particularly Swift.