BOOL should accept undef #12

happy-barney · 2023-06-02T09:50:08Z

Unlike other primitive BOOL should accept undef as well.

There are tons of existing code returning undef as false.

IMHO it's not good idea forcing users to change their existing codebase in order to use newer syntax to add some features.

The text was updated successfully, but these errors were encountered:

Ovid · 2023-06-02T09:59:36Z

I think this sounds like a very reasonable thing. (Edit: Obviously, I needed to think about this more)

duncand · 2023-07-18T00:02:56Z

I believe in the principle that booleans should be treated the same as numbers/text/the-rest as far as undef goes, either they all include undef or none of them do, and for that matter there should be a version of every one of them that does NOT include undef, whether or not there is a version that does include it.

tobyink · 2023-07-21T14:34:01Z

@duncand, there's a good reason to consider booleans as separate from numbers and strings with regards to accepting undef.

use warnings;

my $thing = undef;

my $text  = "Hello $thing";  # warns
my $sum   = 42 + $thing;     # warns
do_something() if $thing;    # no warning

Perl itself is happy to treat undef as false in boolean contexts with no warning. So from that perspective, undef is "valid" as a boolean, while it's more questionable as a string or number.

Ovid · 2023-07-21T15:36:08Z

The problem with booleans as numbers/text/undef, is that there's no semantic information with them. Internally, that kinda works: was that undef supposed to be false or was it an uninitialized variable? However, externally it's a mess. I tire of all the hacks I see to get proper true and false values pass in serialized data such as JSON.

druud · 2023-07-21T20:28:14Z

On 2023-07-21 16:34, Toby Inkster wrote: @duncand <https://github.com/duncand>, there's a good reason to consider booleans as separate from numbers and strings with regards to accepting undef. use warnings; my $thing =undef; my $text ="Hello $thing";# warns my $sum = 42 +$thing;# warns do_something()if $thing;# no warning Perl itself is happy to treat undef as false in boolean contexts with no warning. So from that perspective, undef is "valid" as a boolean, while it's more questionable as a string or number.

There is value, and there is state. Those are IMO best seen as different dimensions. <undef> could be both a state (represented by an undef-flag) and a value (similar to the float value NaN). Perl has many falsy values: undef, 0, "", "0". A boolean data type adds another falsy value, that is distinguishable from an integer zero. <if> tests the state (or boolean casting) of a value. Any non-zero integer will test as true, so would "from that perspective" be "valid" as a boolean? Does that mean that "BOOL should accept" 42? IMO the BOOL data check should not accept undef.

…

-- Ruud Just to be funny: perl -E'local $| = 42; say $|' 1

duncand · 2023-07-21T21:53:56Z

@tobyink Just because Perl doesn't warn when given an undef to a boolean context doesn't mean that undef should be considered a valid boolean when it isn't a valid number or string. An undef logically means unknown or unspecified information in the general sense, so treating it the same as known-to-be-false can produce wrong answers, just as wrong as treating undef as zero or the empty string. While it is reasonable for some programs to treat undef as being equivalent to those default/empty/etc values for a type, users should have the choice to have a bool that explicitly rejects undef, same as they have numbers or strings that reject undef.

zmughal · 2023-07-21T22:21:29Z

Perhaps something like a LaxBool or Maybe[BOOL] could be done so that the default is stricter? I see there is a parameterised Maybe[ ] in the spec, but only as an example.

happy-barney · 2023-07-22T05:10:25Z

it's common practice to do empty return and use that function's value in boolean context.
I think that excluding undef from bool will force unnecessary refactoring on potential users, and that we should avoid.

duncand · 2023-07-22T05:53:55Z

it's common practice to do empty return and use that function's value in boolean context.
I think that excluding undef from bool will force unnecessary refactoring on potential users, and that we should avoid.

This is a non-issue if the course I recommend is followed that 2 versions of the BOOL are provided, one including undef and one without. Given that Oshun has never been in production, there is no existing code using any existing definition of BOOL, so no existing Perl code would have to be refactored one way or the other.

Also when it comes to multiple variants of anything including or not including undef, I believe the version that excludes undef should always have the shorter plainer name, eg plain BOOL/INT/etc should always be the version that excludes undef, as the name fits, what you see is what you get, and longer things like Maybe[BOOL] or BOOL? etc would be the versions that include undef.

Ovid · 2023-07-22T07:02:56Z

Currently I'm more concerned about the fact that Perl now has native true and false (thank goodness)!. This is currently not handled by BOOL, but I see that Data::Checks::Parser already uses $] to check the Perl version, so this shouldn't be too hard to add.

it's common practice to do empty return and use that function's value in boolean context.

Either you're producing or consuming that value. If you're consuming:

my BOOL | UNDEF  $foo = get_value();

If you're producing:

sub get_value () returns BOOL | UNDEF {...}

You could also argue that the code should explicitly return undef instead of using a bare return, but if it's existing code,that's a behavioral change.

I think that excluding undef from bool will force unnecessary refactoring on potential users, and that we should avoid.

I am very sympathetic towards this idea, but since Perl cannot distinguish between undefined and uninitialized, we're kind of stuck because we don't know which is which for Perl. So let the consumer specify BOOL | UNDEF if they want to take that (admittedly small) risk.

So the concerns seem trivial to resolve with the existing spec. Have I missed something?

Principle of Parsimony: from there, if we decide to allow undef in BOOL, the above still works. If we realize undef in BOOL was a mistake, we can't walk that back without breaking code.

demerphq · 2023-07-22T08:23:22Z

FWIW, I am with @happy-barney on the question of whether undef is a valid bool. We only introduce "true" booleans to perl recently, and for most of the existence of Perl Larry defined "false" to be "undef", "0", 0 and the empty string, and everthing else to be true (modulo overload). So I think it would be quite counter productive to not have a way to respect that design intent.

It seems to me that the proposal we should have two keywords for booleans, perhaps distinguished by spelling. For instance I could imagine "Bool" and "BOOL". Or I could imagine "Bool" and "Truthy" or whatever, or maybe "PerlBool" and "JsonBool". (I cant remember the spelling convention we chose) I don't think it is helpful to say that only one model of boolean is supported. I could pass in an obect with boolean overloads, and it should pass a bool test.

I understand the invocation of the principle of parsimony here, but i think it is misguided. Perl supports multiple notions of bool, and our checks should as well. I would be very surprised and consider it counter productive to not be able to use "undef" as a "false" value in a place expecting a bool.

I also note that people seem to be saying that bools should be a number or a string, but i consider every unoverloaded ref to be a valid true value, and any overloaded reference could act as a boolean as well, so I think there is a conceptual clash there as well. Related I think the arguments that undef has a specific meaning closer to an SQL "NULL" tend to fall down under the semantics that Larry chose to provide. undef is == to undef and 0, and undef is eq to undef and "" (albeit with warnings), the mutator operators consider "undef" to be a valid initial value silently mapping to 0 or the empty string, and it is considered a valid boolean value in all internal contexts. So the weight of history does not support that "undef" means "no value" in the same sense that SQL NULL means that. We need to accomodate that history, especially as it is particularly useful in many contexts.

A great deal of code depends on

if ($hash{bool_predicate}) { ... }

working the same regardless if $hash{bool_predicate} is 0 or the empty space, /or missing/ and thus undef. Making BOOL not respect this would be problematic IMO and would inevitably need to be changed in the future.

happy-barney · 2023-07-22T08:56:19Z

@demerphq oh, thanks mentioning overload, raising questions:

is in fact valid bool value value with overloaded bool conversion ? (in fact, any value with provided overload)
as far as this are not types (though word check is close to it), maybe implicit coercion can be applied on assignment as well ?
- if it is blessed and overloaded, accept it
- if it is not, coerce it

Ovid · 2023-07-22T09:09:26Z

I agree that undef cannot behave like NULL. I wish it did, but that ship has sailed.

Otherwise, to answer @happy-barney's question, in Data::Checks::Parser, we have this:

    BOOL   => q(( «NONREF» || overload::Method(§,'bool') )),

So a BOOL is anything that is not a reference, or has boolean overloading. NONREF is defined as this:

    NONREF => q(( «DEF» && !Data::Checks::Parser::reftype(§) )),

So yeah, currently it requires it being defined.

You can read the tests here.

(I'm not saying this is how things have to be, I'm saying "that's what the code currently does")

Also, we don't allow implicit coercion anywhere (that I recall) because of this:

my INT $max_size = \@foo;

Currently, it's trivial to do things like this: say $this + $that and if either $this or $that is a reference, it will be quietly (no warnings!) coerced to an integer. Oshun doesn't stop that directly because it applies to assignment, not operators, but we can at least prevent some simple coercion errors.

Any thoughts about applying checks outside of assignment should be post-MVP.

druud · 2023-07-22T10:08:34Z

On 2023-07-22 10:23, Yves Orton wrote: FWIW, I am with @happy-barney <https://github.com/happy-barney> on the question of whether undef is a valid bool. We only introduce "true" booleans to perl recently, and for most of the existence of Perl Larry defined "false" to be "undef", "0", 0 and the empty string, and everthing else to be true (modulo overload). So I think it would be quite counter productive to not have a way to respect that design intent.

For me this (again) conflates states ("test results") and BOOL values. I see no need for a BOOL data element to support undef. A laxer BOOL-variant can auto-cast undef to false, either always or only at initialization. But I prefer BOOL itself to be strict. What <if> tests, is truthiness, which for a number means "not zero" etc. so that does not depend on BOOL. Similar for ternary, etc. P.S. I also like to see (optional) default values, like in C++, where "int x;" sets x to 0.

Ovid · 2023-07-22T10:18:10Z

@druud We vetoed default values early on because they tend to be arbitrary. Just because something defaults to false doesn't mean it's false, for example. If we default my UINT $count; to zero, what if there's actually a count, but due to poor code, it's not assigned? What would be defaults on GLOB or HANDLE?

happy-barney · 2023-07-22T10:42:21Z

better example is with where

my  UINT where { $_ > 1 && $_ % 2 && $_ % 3 && $_ % 4 } $x ...

what should be default ?

happy-barney · 2023-07-22T10:45:07Z

oh, test case for parser implementation (should where syntax by like this):

declare check sub;
my sub where { ... } $x;
my sub where { ... };

druud · 2023-07-22T14:35:34Z

On 2023-07-22 12:18, Ovid wrote: @druud <https://github.com/druud> We vetoed default values early on because they tend to be arbitrary. Just because something defaults to false doesn't mean it's false, for example. If we default |my UINT $count;| to zero, what if there's actually a count, but due to poor code, it's not assigned? What would be defaults on |GLOB| or |HANDLE|?

I think it is fine to not have a way to define a default value right now, but that at some point it will come up again as really useful. Whether it will then be opt-in or opt-out, is harder to predict. For example: Java has no native 64-bit unsigned integer. (in 64-bit context) For defaults, I stick to the 'empty' approach: zero for numeric values, empty string for textual values, false for boolean values, empty for arrays and for hashes, etc. Not all data types need to support a default. And maybe not call it :default, but call it :empty.

…

-- Ruud

druud · 2023-07-22T14:53:19Z

On 2023-07-22 12:42, Branislav Zahradník wrote: better example is with |where| |my UINT where { $_ > 1 && $_ % 2 && $_ % 3 && $_ % 4 } $x ... | what should be default ?

If the default of UINT (or rather its "empty value") is defined as 0, and there is no initialization value, then this declaration would error. If UINT supports undef, then its "empty value" is (always) undef, and it can't have a (different) "empty value" configured. If UINT doesn't support undef, then it can have an "empty value". Without a configured "empty value", it then must be initialized at declaration. So it depends on what is in the '...'. And on how UINT is configured.

…

-- Ruud

happy-barney · 2023-07-22T14:56:29Z

@druud I'd say this discussion is irrelevant for MVP. In later stages (that's why I created Roadmap discussion ...) there should be warning / error when reading uninitialized variable which had specified contract.

druud · 2023-07-22T17:05:09Z

On 2023-07-22 16:56, Branislav Zahradník wrote: @druud <https://github.com/druud> I'd say this discussion is irrelevant for MVP. In later stages (that's why I created Roadmap discussion ...) there should be warning / error when reading uninitialized variable which had specified contract.

Yes, all good.

…

-- Ruud

duncand · 2023-07-22T20:09:59Z

Not all data types need to support a default. And maybe not call it :default, but call it :empty.

Maybe call it :unassigned or some synonym instead which is more accurate on what it actually is talking about.

Some types don't have a concept of the "empty" value either. What would be the "empty" value for a day-of-week type, or an odd-integer type?

No matter what we call it, given that in Perl like in many languages an object isn't always just representing a plain old value that can be cleanly serialized and deserialized, like a tree whose endpoints are all regular scalars, instead sometimes objects represent other resources or things, such as an open file handle or something, which are practically impossible to serialize/deserialize and it only makes sense to create them with explicit constructor arguments, and so you can't have a default/empty/whatever value.

I feel the simplest solution that takes all of the options into account is, if the type includes undef then that is its default value when not explicitly initialized, and otherwise an explicit initializing assignment is required for any variable having a CHECK. I seem to recall something like this, or explicit-assignment-always, was proposed and favoured before.

Ovid · 2023-07-23T13:03:57Z

Another odd consideration: If we allow undef for BOOL, we introduce the potential for a slight inconsistency.

my BOOL $success; # succeeds because `undef` satisfies BOOL
my INT  $count;   # fails because `undef` doesn't satisfy INT

Not saying it's a blocker, but it's a tiny thing we might want to consider.

duncand · 2023-07-23T23:33:09Z

Another odd consideration: If we allow undef for BOOL, we introduce the potential for a slight inconsistency.
my BOOL $success; # succeeds because `undef` satisfies BOOL
my INT  $count;   # fails because `undef` doesn't satisfy INT
Not saying it's a blocker, but it's a tiny thing we might want to consider.

I wouldn't call that minor, I would call that a MAJOR inconsistency, that shouldn't happen. All of the plain-named types/checks like BOOL/INT/etc should be mutually consistent, either they all include undef, or they all exclude it, not some of one and some of the other. Having them different in this respect violates the principle of least surprise, that things which look similar should behave similar etc. So I strongly disagree with the original proposal that BOOL is treated differently than the others like INT/etc.

druud · 2023-07-24T08:23:54Z

That is my position too. And don't conflate a BOOL data value with 'boolean context'.

druud · 2023-07-24T08:37:01Z

Coming back to 'empty': your examples of weekday and odd-int, just show why I propose to call it 'empty' and not 'undefined'.

Some types are good to get set when "left empty", and others aren't. I don't think we need this 'empty' feature from the start, though I think it will help making things easier to accept, as it would lead to less boilerplate: no need to initialise (almost) every INT to 0, TEXT to "", etc.

Such an 'empty' mechanism is best strictly limited to only use values like 0, "", \0, false and such. Such an 'empty' mechanism is much smaller than an 'undefined' mechanism would be, and leads to much less 'action at a distance'. So IMO it shouldn't support a <BOOL $active> with a default of true, but a <BOOL $obsoleted> with a default of false would be OK.

For example have an UINT32Z, that is mostly like UINT32, but also auto-initialises to 0.

happy-barney · 2023-07-24T08:48:42Z

@druud FYI: https://www.nntp.perl.org/group/perl.perl5.porters/2022/09/msg264841.html

druud · 2023-07-24T10:14:43Z

Important is that such a mechanism can not be used to auto-initialise values to any truthy, non-empty value. So the \0 that I mentioned earlier, must be blessed and have proper overload logic.

happy-barney · 2023-07-24T10:16:10Z

@druud milestone XYZ:

declare check FOO as UINT where { ... }  default 1_204;

Ovid · 2023-07-24T10:38:44Z

So what I'm seeing here:

Current behavior allows us to do BOOL | UNDEF to trivially satisfy this constraint.
Defaults are post-MVP
builtin::true and builtin::false complicate things a bit
Implicit coercions must not be allowed because we can't disable checks safely if they rely on coercive behavior
We're not sure what the rest of the world wants.

Given the first point, I think trying to figure out what people are going to do with this is maybe a bad thing. Let's look at his from a "lean project management" standpoint: we build what we need and nothing more. When we get feedback on real-world use, we can incorporate that.

Does that sound good, or have I missed something?

druud · 2023-07-24T10:45:02Z

Sounds good to me.

duncand · 2023-07-25T05:26:31Z

Sounds good to me too.

druud · 2023-07-25T11:54:04Z

On 2023-07-25 07:36, Darren Duncan wrote: @druud <https://github.com/druud> On a tangent, I might be naive on a point, but what does your |\0| example refer to? A data type representing exactly one octet or character code point? I didn't recall Perl having such a type, and that in practice either an integer or a string of a single element would satisfy all the use cases one may want to represent this. And in that case the most reasonable empty values are covered by the integer zero and the empty string respectively. A string with a single element like zero isn't "empty" like the other cases are. If the data type is "non-empty string" that might make sense as an "empty" version of that, but it would be rather contrived much like having a default value for "positive integer" that equals 1, and I can see a stronger case for the latter, but I think we decided it wasn't desired.

Sorry for the confusion. I had already tried to explain it a bit later, by mentioning that such a ref-case would need to be blessed and overloaded. A blessed \0 value, is a "Perl person's practice" to express some base (or "empty") value of a "type", like false for BOOL. IIRC, some JSON modules do it that way, and likely some Math modules as well. - - - - We should indeed counter the urge to have a mechanism to make 1 the default value for POSINT. To me, POSINT is a "data checkpoint" that should not be allowed to apply a default. Even though its value 1 can be seen as "empty" in multiplications, so as "false" in some sense ... ;)

…

-- Ruud

duncand · 2023-07-25T19:34:23Z

@druud With regards to your comment about the value 1 and multiplications, I see this as being with respect to a completely different feature set which is about being able to annotate particular properties of typed functions so that higher-level functions can do more intelligent things with them. For example, annotate that multiplication for particular types is commutative and associative operation with an identity value of 1, and similar for addition but identity value of zero, so that higher order functions such as reduce/fold know that if they are given the multiplication operator plus an empty list then the result is defined to be the identity value rather than the result being undefined, or the optimizer knows what flexibility it has for what order elements are combined, etc. Identity is very distinct from empty in concept.

druud · 2023-07-26T07:27:37Z

Yes, and please don't conflate them on my account; it was merely a funny way to show why non-empty defaults should not be supported. Still OK to notice the relation between empty and identity. :)

Here at work I block non-empty defaults for (non-dynamic) columns in (MySQL) table definitions. We call that "no business logic on the db-side". The only exceptions are dynamic timestamps, for which we use names like mysql_row_created_at and mysql_row_updated_at, which are not allowed to be touched from the application-side, so are to be treated as row-attributes rather than columns. For application-side timestamps, use a signed bigint and store an application-side time-value. IOW, keep the time worlds of app and db completely separated. The profit from all of this is that the application doesn't always need to read back a row right after its creation, to find out what the database made of it, which saves us millions of roundtrips per minute.

BOOL should accept undef #12

BOOL should accept undef #12

Comments

happy-barney commented Jun 2, 2023

Ovid commented Jun 2, 2023 • edited Loading

duncand commented Jul 18, 2023

tobyink commented Jul 21, 2023

Ovid commented Jul 21, 2023

druud commented Jul 21, 2023 via email

duncand commented Jul 21, 2023

zmughal commented Jul 21, 2023

happy-barney commented Jul 22, 2023

duncand commented Jul 22, 2023

Ovid commented Jul 22, 2023

demerphq commented Jul 22, 2023

happy-barney commented Jul 22, 2023

Ovid commented Jul 22, 2023

druud commented Jul 22, 2023 via email

Ovid commented Jul 22, 2023

happy-barney commented Jul 22, 2023

happy-barney commented Jul 22, 2023

druud commented Jul 22, 2023 via email

druud commented Jul 22, 2023 via email

happy-barney commented Jul 22, 2023

druud commented Jul 22, 2023 via email

duncand commented Jul 22, 2023

Ovid commented Jul 23, 2023

duncand commented Jul 23, 2023 • edited Loading

druud commented Jul 24, 2023

druud commented Jul 24, 2023 • edited Loading

happy-barney commented Jul 24, 2023

druud commented Jul 24, 2023

happy-barney commented Jul 24, 2023

Ovid commented Jul 24, 2023 • edited Loading

druud commented Jul 24, 2023

duncand commented Jul 25, 2023

druud commented Jul 25, 2023 via email

duncand commented Jul 25, 2023

druud commented Jul 26, 2023

Ovid commented Jun 2, 2023 •

edited

Loading

duncand commented Jul 23, 2023 •

edited

Loading

druud commented Jul 24, 2023 •

edited

Loading

Ovid commented Jul 24, 2023 •

edited

Loading