-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add feature flag for emitting string literals into data section as UTF8 #76036
base: main
Are you sure you want to change the base?
Conversation
16f5293
to
43aa343
Compare
43aa343
to
e9e701d
Compare
error CS8103: Combined length of user strings used by the program exceeds allowed limit. Try to decrease use of string literals. | ||
``` | ||
|
||
By turning on the feature flag `utf8-string-encoding`, string literals (where possible) are instead emitted as UTF-8 data into a different section of the PE file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
namespace Microsoft.CodeAnalysis.CodeGen; | ||
|
||
internal sealed class DataStringHolder : DefaultTypeDef, Cci.INamespaceTypeDefinition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
public DataStringHolder( | ||
CommonPEModuleBuilder moduleBuilder, | ||
string dataHash, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// </param> | ||
/// <returns>The field. This may have been newly created or may be an existing field previously created for the same data and alignment.</returns> | ||
internal Cci.IFieldReference CreateDataField(ImmutableArray<byte> data, ushort alignment) | ||
internal Cci.ITypeReference GetOrAddProxyType(int length, ushort alignment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider renaming _proxyTypes
as well
|
||
Debug.Assert(_mappedFields.IsEmpty, "We need to generate unique field names if we are synthesizing more than one."); | ||
|
||
(_, StringField stringField) = _mappedFields.GetOrAdd((data, Alignment: 1), key => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jjonescz Please add details how do we achieve the goal: what artifacts are getting generated, what APIs are needed and how they are used, what compilation phases/layers are involved and how, etc. Basically, I think we need something like an implementation speclet for the compiler feature. |
|
||
fieldsBuilder.Sort(PrivateImplementationDetails.FieldComparer.Instance); | ||
|
||
_orderedSynthesizedFields = fieldsBuilder.ToImmutableAndFree(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Call `<PrivateImplementationDetails>.BytesToString(byte*, int)`. | ||
bytesToStringHelper ??= GetOrSynthesizeBytesToStringHelper(diagnostics); | ||
ilBuilder.EmitOpCode(ILOpCode.Call, -1); | ||
ilBuilder.EmitToken(bytesToStringHelper, null, diagnostics); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
namespace Microsoft.CodeAnalysis.CodeGen; | ||
|
||
internal sealed class DataStringHolder : DefaultTypeDef, Cci.INamespaceTypeDefinition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I stopped reviewing the PR because I think first we need to agree on the implementation strategy and have a speclet to review against. |
public override IEnumerable<DataStringHolder> GetFrozenDataStringHolders() | ||
{ | ||
Debug.Assert(_lazyDataStringHolders is null || _dataStringHoldersFrozen == 1); | ||
return _lazyDataStringHolders?.Values ?? SpecializedCollections.EmptyEnumerable<DataStringHolder>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this opt-in feature flag is enabled by users, string literals with length over the configured threshold are emitted into data section of the PE file, hence allowing users to overcome the "error CS8103: Combined length of user strings used by the program exceeds allowed limit."
VB support is not implemented in this PR, could be added in the future if there's demand.
I've manually tried this on our unit tests (e.g., Emit2 compiled standalone resulted in significantly smaller file size due to the use of UTF8 instead of UTF16; run time was the same; combining Emit2 and Emit3 into one assembly worked without emit errors). It was also tested on aspnetcore benchmarks and that didn't reveal any significant runtime/build perf changes.
Main motivation for this feature are Razor projects which can accumulate lots of string literals over time (due to how
.razor
file compilation works) and then customers are one day hit with this error without any prior warnings (imagine they are fixing a small bug and suddenly they cannot compile anymore without refactoring their codebase into multiple projects).