Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IEEE 754 hexadecimal floating-point string conversions #14021

Closed
HertzDevil opened this issue Nov 27, 2023 · 4 comments · Fixed by #14027
Closed

IEEE 754 hexadecimal floating-point string conversions #14021

HertzDevil opened this issue Nov 27, 2023 · 4 comments · Fixed by #14027

Comments

@HertzDevil
Copy link
Contributor

Crystal already exposes some functionality for hexadecimal floating-point strings:

"0xc.ap+5".to_f64 # => 404.0
"%a" % 404.0      # => "0x1.9400000000000p+8"

Under the hood, this is due to LibC.strtod and LibC.snprintf. If we remove those funs as a result of #11952 or #12396, we might want to keep the same functionality around; but if we are doing it anyway, it seems odd that hexfloat functionality is hidden behind String#to_f64 and String#%. So I think there should be a more straightforward API for these:

struct Float64
  def self.parse_hexfloat?(str : String) : self?
  end

  def self.parse_hexfloat(str : String) : self
    parse_hexfloat?(str) || raise ...
  end

  def to_hexfloat(io : IO) : Nil
  end

  def to_hexfloat : String
    String.build(...) { |io| to_hexfloat(io) }
  end
end

# ditto for `Float32` and `BigFloat`

Hexfloats are defined in IEEE 754-2008, section 5.12.3; they can round-trip to and from binary floating-point values with relative ease and stability, compared to decimal strings (such as when we switched from Grisu3 to Dragonbox). Some time ago I made a reference shard for this, and the standard library specs already make use of hexfloats.

Note that this isn't about supporting hexfloat literals; 0xc.ap+5 parses to 12.ap.+(5), and there is no way to support hexfloats in the language without huge breaking changes.

@straight-shoota
Copy link
Member

Have you consider to use the same API as for Int stringification, i.e. .new and #to_s with base parameter? I suppose it would not allow many values (i.e. only 10 and 16) so it wouldn't be that universal as its cousin. But still, using the same API has some benefits.

@HertzDevil HertzDevil changed the title Hexadecimal floating-point string conversions IEEE 754 hexadecimal floating-point string conversions Nov 27, 2023
@HertzDevil
Copy link
Contributor Author

HertzDevil commented Nov 27, 2023

Simply adding an optional base parameter to String#to_f would technically be a breaking change because the existing behavior is to accept both bases; a single base value can never cover both. So you'd have to do:

class String
  # unchanged
  def to_f64?(whitespace : Bool = true, strict : Bool = true) : Float64?
  end

  # required parameter
  def to_f64?(base : Int, whitespace : Bool = true, strict : Bool = true) : Float64?
    case base
    when 10; to_f64?(whitespace, strict)
    when 16; ...
    else     raise ...
    end
  end
end

Additionally the 0x and the p are mandatory for IEEE hexfloats. There is a natural interpretation of hexadecimal fractions such that "1000000.000".to_f64(base: 16) == 16777216_f64 the same way "1000000".to_i == 16777216_i32, but such a string is not an IEEE hexfloat, so I wouldn't say Int and Float are similar in this regard. Only IEEE 754 is the focus here, and I feel like an over-generic API would very soon lead to questions of how much non-IEEE functionality we need to incorporate.

@straight-shoota
Copy link
Member

straight-shoota commented Nov 27, 2023

Just for the record, I don't think String#to_f currently supports hexadecimal values (if that's what you're saying in the first paragraph?).

Following the IEEE hexfloat standard and naming makes totally sense, though 👍

@HertzDevil
Copy link
Contributor Author

What I mean is:

diff --git a/src/string.cr b/src/string.cr
index 3c378bd1d..e754cf034 100644
--- a/src/string.cr
+++ b/src/string.cr
@@ -709 +709 @@ class String
-  def to_f64(whitespace : Bool = true, strict : Bool = true) : Float64
+  def to_f64(whitespace : Bool = true, strict : Bool = true, base : Int = 10) : Float64

or the existing string constructor in Float64, which forwards to the above:

diff --git a/src/float.cr b/src/float.cr
index a4abcf5ab..e4847da4c 100644
--- a/src/float.cr
+++ b/src/float.cr
@@ -264,4 +264,4 @@ struct Float64
   # ```
-  def self.new(value : String, whitespace : Bool = true, strict : Bool = true) : self
-    value.to_f64 whitespace: whitespace, strict: strict
+  def self.new(value : String, whitespace : Bool = true, strict : Bool = true, base : Int = 10) : self
+    value.to_f64 whitespace: whitespace, strict: strict, base: base
   end

Then calls that previously rely on LibC.strtod to handle hexfloats would now fail.

Speaking of which, GMP does support the "natural interpretation" for bases between 2 and 62:

struct BigFloat
  def initialize(str : String, *, base : Int)
    if LibGMP.mpf_init_set_str(out @mpf, str, base) == -1
      raise ArgumentError.new("Invalid BigFloat: #{str.inspect}")
    end
  end

  def to_s(*, base : Int)
    String.build do |io|
      cstr = LibGMP.mpf_get_str(nil, out decimal_exponent, base, 0, self)
      # ...
    end
  end
end

BigFloat.new("1000000.000", base: 16)           # => 16777216.0
BigFloat.new("1000000.000", base: 2)            # => 64.0
BigFloat.new("123.456", base: 25).to_s(base: 5) # => "10203.041011"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants