perldelta for utf8_to_uv() family

Perl · Dec 2, 2024 · cffb5af · cffb5af
1 parent ae865e7
commit cffb5af
Show file tree

Hide file tree

Showing 2 changed files with 39 additions and 3 deletions.
diff --git a/pod/perldelta.pod b/pod/perldelta.pod
@@ -406,6 +406,42 @@ well.
 
 =item *
 
+New API functions are introduced to convert strings encoded in UTF-8 to
+their ordinal code point equivalent.  These are safe to use by default,
+and generally more convenient to use than the existing ones.
+
+L<perlapi/C<utf8_to_uv>> replaces L<perlapi/C<utf8_to_uvchr>> (which is
+retained for backwards compatibility), but you should convert to use the
+new form, as likely you aren't using the old one safely.
+
+There are also two new functions, L<perlapi/C<strict_utf8_to_uv>> and
+L<perlapi/C<c9strict_utf8_to_uv>> which do the same thing except when
+the input string represents a code point that Unicode doesn't accept as
+legal for interchange, using either the strict original definition
+(C<strict_utf8_to_uv>), or the looser one given by
+L<Unicode Corrigendum #9|https://www.unicode.org/versions/corrigendum9.html>
+(C<c9strict_utf8_to_uv>).  When the input string represents one of the
+restricted code points, these functions return the Unicode
+C<REPLACEMENT CHARACTER> instead.
+
+Also L<perlapi/C<extended_utf8_to_uv>> is a synonym for C<utf8_to_uv>, for use
+when you want to emphasize that the entire range of Perl extended UTF-8
+is acceptable.
+
+There are also replacement functions for the three more specialized
+conversion functions that you are unlikely to need to use.  Again, the
+old forms are kept for backwards compatibility, but you should convert
+to use the new forms.
+
+L<perlapi/C<utf8_to_uv_flags>> replaces L<perlapi/C<utf8n_to_uvchr>>.
+
+L<perlapi/C<utf8_to_uv_errors>> replaces L<perlapi/C<utf8n_to_uvchr_error>>.
+
+L<perlapi/C<utf8_to_uv_msgs>> replaces
+L<perlapi/C<utf8n_to_uvchr_msgs>>.
+
+=item *
+
 Three new API functions are introduced to convert strings encoded in
 UTF-8 to native bytes format (if possible).  These are easier to use
 than the existing ones, and they avoid unnecessary memory allocations.

diff --git a/utf8.c b/utf8.c
@@ -1065,20 +1065,20 @@ syntactically invalid UTF-8.
 
 =over 4
 
-=item C<strict_utf8_to_uv>
+=item * C<strict_utf8_to_uv>
 
 additionally rejects any UTF-8 that translates into a code point that isn't
 specified by Unicode to be freely exchangeable, namely the surrogate characters
 and non-character code points (besides non-Unicode code points, any above
 0x10FFFF).  It does not raise a warning when rejecting.
 
-=item C<c9strict_utf8_to_uv>
+=item * C<c9strict_utf8_to_uv>
 
 instead uses the exchangeable definition given by Unicode's Corregendum #9,
 which accepts non-character code points while still rejecting surrogates.  It
 does not raise a warning when rejecting.
 
-=item C<extended_utf8_to_uv>
+=item * C<extended_utf8_to_uv>
 
 accepts all syntactically valid UTF-8, as extended by Perl to allow 64-bit code
 points to be encoded.