IEEE 754 hexadecimal floating-point string conversions #14021

HertzDevil · 2023-11-27T16:25:55Z

Crystal already exposes some functionality for hexadecimal floating-point strings:

"0xc.ap+5".to_f64 # => 404.0
"%a" % 404.0      # => "0x1.9400000000000p+8"

Under the hood, this is due to LibC.strtod and LibC.snprintf. If we remove those funs as a result of #11952 or #12396, we might want to keep the same functionality around; but if we are doing it anyway, it seems odd that hexfloat functionality is hidden behind String#to_f64 and String#%. So I think there should be a more straightforward API for these:

struct Float64
  def self.parse_hexfloat?(str : String) : self?
  end

  def self.parse_hexfloat(str : String) : self
    parse_hexfloat?(str) || raise ...
  end

  def to_hexfloat(io : IO) : Nil
  end

  def to_hexfloat : String
    String.build(...) { |io| to_hexfloat(io) }
  end
end

# ditto for `Float32` and `BigFloat`

Hexfloats are defined in IEEE 754-2008, section 5.12.3; they can round-trip to and from binary floating-point values with relative ease and stability, compared to decimal strings (such as when we switched from Grisu3 to Dragonbox). Some time ago I made a reference shard for this, and the standard library specs already make use of hexfloats.

Note that this isn't about supporting hexfloat literals; 0xc.ap+5 parses to 12.ap.+(5), and there is no way to support hexfloats in the language without huge breaking changes.

The text was updated successfully, but these errors were encountered:

straight-shoota · 2023-11-27T18:28:59Z

Have you consider to use the same API as for Int stringification, i.e. .new and #to_s with base parameter? I suppose it would not allow many values (i.e. only 10 and 16) so it wouldn't be that universal as its cousin. But still, using the same API has some benefits.

HertzDevil · 2023-11-27T18:51:15Z

Simply adding an optional base parameter to String#to_f would technically be a breaking change because the existing behavior is to accept both bases; a single base value can never cover both. So you'd have to do:

class String
  # unchanged
  def to_f64?(whitespace : Bool = true, strict : Bool = true) : Float64?
  end

  # required parameter
  def to_f64?(base : Int, whitespace : Bool = true, strict : Bool = true) : Float64?
    case base
    when 10; to_f64?(whitespace, strict)
    when 16; ...
    else     raise ...
    end
  end
end

Additionally the 0x and the p are mandatory for IEEE hexfloats. There is a natural interpretation of hexadecimal fractions such that "1000000.000".to_f64(base: 16) == 16777216_f64 the same way "1000000".to_i == 16777216_i32, but such a string is not an IEEE hexfloat, so I wouldn't say Int and Float are similar in this regard. Only IEEE 754 is the focus here, and I feel like an over-generic API would very soon lead to questions of how much non-IEEE functionality we need to incorporate.

straight-shoota · 2023-11-27T18:59:05Z

Just for the record, I don't think String#to_f currently supports hexadecimal values (if that's what you're saying in the first paragraph?).

Following the IEEE hexfloat standard and naming makes totally sense, though 👍

HertzDevil · 2023-11-27T19:14:36Z

What I mean is:

diff --git a/src/string.cr b/src/string.cr
index 3c378bd1d..e754cf034 100644
--- a/src/string.cr
+++ b/src/string.cr
@@ -709 +709 @@ class String
-  def to_f64(whitespace : Bool = true, strict : Bool = true) : Float64
+  def to_f64(whitespace : Bool = true, strict : Bool = true, base : Int = 10) : Float64

or the existing string constructor in Float64, which forwards to the above:

diff --git a/src/float.cr b/src/float.cr
index a4abcf5ab..e4847da4c 100644
--- a/src/float.cr
+++ b/src/float.cr
@@ -264,4 +264,4 @@ struct Float64
   # ```
-  def self.new(value : String, whitespace : Bool = true, strict : Bool = true) : self
-    value.to_f64 whitespace: whitespace, strict: strict
+  def self.new(value : String, whitespace : Bool = true, strict : Bool = true, base : Int = 10) : self
+    value.to_f64 whitespace: whitespace, strict: strict, base: base
   end

Then calls that previously rely on LibC.strtod to handle hexfloats would now fail.

Speaking of which, GMP does support the "natural interpretation" for bases between 2 and 62:

struct BigFloat
  def initialize(str : String, *, base : Int)
    if LibGMP.mpf_init_set_str(out @mpf, str, base) == -1
      raise ArgumentError.new("Invalid BigFloat: #{str.inspect}")
    end
  end

  def to_s(*, base : Int)
    String.build do |io|
      cstr = LibGMP.mpf_get_str(nil, out decimal_exponent, base, 0, self)
      # ...
    end
  end
end

BigFloat.new("1000000.000", base: 16)           # => 16777216.0
BigFloat.new("1000000.000", base: 2)            # => 64.0
BigFloat.new("123.456", base: 25).to_s(base: 5) # => "10203.041011"

HertzDevil added kind:feature topic:stdlib:numeric labels Nov 27, 2023

HertzDevil changed the title ~~Hexadecimal floating-point string conversions~~ IEEE 754 hexadecimal floating-point string conversions Nov 27, 2023

HertzDevil mentioned this issue Nov 29, 2023

Add Float::Primitive.parse_hexfloat, .parse_hexfloat?, #to_hexfloat #14027

Merged

straight-shoota closed this as completed in #14027 Dec 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IEEE 754 hexadecimal floating-point string conversions #14021

IEEE 754 hexadecimal floating-point string conversions #14021

HertzDevil commented Nov 27, 2023

straight-shoota commented Nov 27, 2023

HertzDevil commented Nov 27, 2023 •

edited

Loading

straight-shoota commented Nov 27, 2023 •

edited

Loading

HertzDevil commented Nov 27, 2023

IEEE 754 hexadecimal floating-point string conversions #14021

IEEE 754 hexadecimal floating-point string conversions #14021

Comments

HertzDevil commented Nov 27, 2023

straight-shoota commented Nov 27, 2023

HertzDevil commented Nov 27, 2023 • edited Loading

straight-shoota commented Nov 27, 2023 • edited Loading

HertzDevil commented Nov 27, 2023

HertzDevil commented Nov 27, 2023 •

edited

Loading

straight-shoota commented Nov 27, 2023 •

edited

Loading