Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature flag for emitting string literals into data section as UTF8 #76036

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jjonescz
Copy link
Member

@jjonescz jjonescz commented Nov 22, 2024

When this opt-in feature flag is enabled by users, string literals with length over the configured threshold are emitted into data section of the PE file, hence allowing users to overcome the "error CS8103: Combined length of user strings used by the program exceeds allowed limit."

VB support is not implemented in this PR, could be added in the future if there's demand.

I've manually tried this on our unit tests (e.g., Emit2 compiled standalone resulted in significantly smaller file size due to the use of UTF8 instead of UTF16; run time was the same; combining Emit2 and Emit3 into one assembly worked without emit errors). It was also tested on aspnetcore benchmarks and that didn't reveal any significant runtime/build perf changes.

Main motivation for this feature are Razor projects which can accumulate lots of string literals over time (due to how .razor file compilation works) and then customers are one day hit with this error without any prior warnings (imagine they are fixing a small bug and suddenly they cannot compile anymore without refactoring their codebase into multiple projects).

@dotnet-issue-labeler dotnet-issue-labeler bot added Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Nov 22, 2024
@jjonescz jjonescz marked this pull request as ready for review November 25, 2024 15:25
@jjonescz jjonescz requested a review from a team as a code owner November 25, 2024 15:25
error CS8103: Combined length of user strings used by the program exceeds allowed limit. Try to decrease use of string literals.
```

By turning on the feature flag `utf8-string-encoding`, string literals (where possible) are instead emitted as UTF-8 data into a different section of the PE file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utf8-string-encoding

"utf8-string-literal-encoding"?


namespace Microsoft.CodeAnalysis.CodeGen;

internal sealed class DataStringHolder : DefaultTypeDef, Cci.INamespaceTypeDefinition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataStringHolder

I think this type deserves a comment


public DataStringHolder(
CommonPEModuleBuilder moduleBuilder,
string dataHash,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataHash

Consider adding a comment

/// </param>
/// <returns>The field. This may have been newly created or may be an existing field previously created for the same data and alignment.</returns>
internal Cci.IFieldReference CreateDataField(ImmutableArray<byte> data, ushort alignment)
internal Cci.ITypeReference GetOrAddProxyType(int length, ushort alignment)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetOrAddProxyType

The name looks meaningless to me. "GetOrAddDataFieldType"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider renaming _proxyTypes as well


Debug.Assert(_mappedFields.IsEmpty, "We need to generate unique field names if we are synthesizing more than one.");

(_, StringField stringField) = _mappedFields.GetOrAdd((data, Alignment: 1), key =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_mappedFields

Why do we need a ConcurrentDictionary if we create at most one field?

@AlekseyTs
Copy link
Contributor

AlekseyTs commented Nov 27, 2024

@jjonescz Please add details how do we achieve the goal: what artifacts are getting generated, what APIs are needed and how they are used, what compilation phases/layers are involved and how, etc. Basically, I think we need something like an implementation speclet for the compiler feature.


fieldsBuilder.Sort(PrivateImplementationDetails.FieldComparer.Instance);

_orderedSynthesizedFields = fieldsBuilder.ToImmutableAndFree();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_orderedSynthesizedFields = fieldsBuilder.ToImmutableAndFree();

It looks like we always synthesize two fields. It feels like all the code around fieldsBuilder is unnecessary complicated.

// Call `<PrivateImplementationDetails>.BytesToString(byte*, int)`.
bytesToStringHelper ??= GetOrSynthesizeBytesToStringHelper(diagnostics);
ilBuilder.EmitOpCode(ILOpCode.Call, -1);
ilBuilder.EmitToken(bytesToStringHelper, null, diagnostics);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bytesToStringHelper

Given that we only ever have one field to initialize, it doesn't feel like we need this helper method. We can simply inline what it is doing


namespace Microsoft.CodeAnalysis.CodeGen;

internal sealed class DataStringHolder : DefaultTypeDef, Cci.INamespaceTypeDefinition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataStringHolder

I could be missing something, but it feels like we don't really need this type. Everything what it is doing can be done by PrivateImplementationDetail

@AlekseyTs
Copy link
Contributor

I stopped reviewing the PR because I think first we need to agree on the implementation strategy and have a speclet to review against.

public override IEnumerable<DataStringHolder> GetFrozenDataStringHolders()
{
Debug.Assert(_lazyDataStringHolders is null || _dataStringHoldersFrozen == 1);
return _lazyDataStringHolders?.Values ?? SpecializedCollections.EmptyEnumerable<DataStringHolder>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_lazyDataStringHolders?.Values

Is this going to produce items in a deterministic order?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants