Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BOOL should accept undef #12

Open
happy-barney opened this issue Jun 2, 2023 · 35 comments
Open

BOOL should accept undef #12

happy-barney opened this issue Jun 2, 2023 · 35 comments

Comments

@happy-barney
Copy link
Collaborator

Unlike other primitive BOOL should accept undef as well.

There are tons of existing code returning undef as false.

IMHO it's not good idea forcing users to change their existing codebase in order to use newer syntax to add some features.

@Ovid
Copy link
Collaborator

Ovid commented Jun 2, 2023

I think this sounds like a very reasonable thing. (Edit: Obviously, I needed to think about this more)

@duncand
Copy link

duncand commented Jul 18, 2023

I believe in the principle that booleans should be treated the same as numbers/text/the-rest as far as undef goes, either they all include undef or none of them do, and for that matter there should be a version of every one of them that does NOT include undef, whether or not there is a version that does include it.

@tobyink
Copy link
Member

tobyink commented Jul 21, 2023

@duncand, there's a good reason to consider booleans as separate from numbers and strings with regards to accepting undef.

use warnings;

my $thing = undef;

my $text  = "Hello $thing";  # warns
my $sum   = 42 + $thing;     # warns
do_something() if $thing;    # no warning

Perl itself is happy to treat undef as false in boolean contexts with no warning. So from that perspective, undef is "valid" as a boolean, while it's more questionable as a string or number.

@Ovid
Copy link
Collaborator

Ovid commented Jul 21, 2023

The problem with booleans as numbers/text/undef, is that there's no semantic information with them. Internally, that kinda works: was that undef supposed to be false or was it an uninitialized variable? However, externally it's a mess. I tire of all the hacks I see to get proper true and false values pass in serialized data such as JSON.

@druud
Copy link
Collaborator

druud commented Jul 21, 2023 via email

@duncand
Copy link

duncand commented Jul 21, 2023

@tobyink Just because Perl doesn't warn when given an undef to a boolean context doesn't mean that undef should be considered a valid boolean when it isn't a valid number or string. An undef logically means unknown or unspecified information in the general sense, so treating it the same as known-to-be-false can produce wrong answers, just as wrong as treating undef as zero or the empty string. While it is reasonable for some programs to treat undef as being equivalent to those default/empty/etc values for a type, users should have the choice to have a bool that explicitly rejects undef, same as they have numbers or strings that reject undef.

@zmughal
Copy link
Contributor

zmughal commented Jul 21, 2023

Perhaps something like a LaxBool or Maybe[BOOL] could be done so that the default is stricter? I see there is a parameterised Maybe[ ] in the spec, but only as an example.

@happy-barney
Copy link
Collaborator Author

it's common practice to do empty return and use that function's value in boolean context.
I think that excluding undef from bool will force unnecessary refactoring on potential users, and that we should avoid.

@duncand
Copy link

duncand commented Jul 22, 2023

it's common practice to do empty return and use that function's value in boolean context.
I think that excluding undef from bool will force unnecessary refactoring on potential users, and that we should avoid.

This is a non-issue if the course I recommend is followed that 2 versions of the BOOL are provided, one including undef and one without. Given that Oshun has never been in production, there is no existing code using any existing definition of BOOL, so no existing Perl code would have to be refactored one way or the other.

Also when it comes to multiple variants of anything including or not including undef, I believe the version that excludes undef should always have the shorter plainer name, eg plain BOOL/INT/etc should always be the version that excludes undef, as the name fits, what you see is what you get, and longer things like Maybe[BOOL] or BOOL? etc would be the versions that include undef.

@Ovid
Copy link
Collaborator

Ovid commented Jul 22, 2023

Currently I'm more concerned about the fact that Perl now has native true and false (thank goodness)!. This is currently not handled by BOOL, but I see that Data::Checks::Parser already uses $] to check the Perl version, so this shouldn't be too hard to add.

it's common practice to do empty return and use that function's value in boolean context.

Either you're producing or consuming that value. If you're consuming:

my BOOL | UNDEF  $foo = get_value();

If you're producing:

sub get_value () returns BOOL | UNDEF {...}

You could also argue that the code should explicitly return undef instead of using a bare return, but if it's existing code,that's a behavioral change.

I think that excluding undef from bool will force unnecessary refactoring on potential users, and that we should avoid.

I am very sympathetic towards this idea, but since Perl cannot distinguish between undefined and uninitialized, we're kind of stuck because we don't know which is which for Perl. So let the consumer specify BOOL | UNDEF if they want to take that (admittedly small) risk.

So the concerns seem trivial to resolve with the existing spec. Have I missed something?

Principle of Parsimony: from there, if we decide to allow undef in BOOL, the above still works. If we realize undef in BOOL was a mistake, we can't walk that back without breaking code.

@demerphq
Copy link
Collaborator

FWIW, I am with @happy-barney on the question of whether undef is a valid bool. We only introduce "true" booleans to perl recently, and for most of the existence of Perl Larry defined "false" to be "undef", "0", 0 and the empty string, and everthing else to be true (modulo overload). So I think it would be quite counter productive to not have a way to respect that design intent.

It seems to me that the proposal we should have two keywords for booleans, perhaps distinguished by spelling. For instance I could imagine "Bool" and "BOOL". Or I could imagine "Bool" and "Truthy" or whatever, or maybe "PerlBool" and "JsonBool". (I cant remember the spelling convention we chose) I don't think it is helpful to say that only one model of boolean is supported. I could pass in an obect with boolean overloads, and it should pass a bool test.

I understand the invocation of the principle of parsimony here, but i think it is misguided. Perl supports multiple notions of bool, and our checks should as well. I would be very surprised and consider it counter productive to not be able to use "undef" as a "false" value in a place expecting a bool.

I also note that people seem to be saying that bools should be a number or a string, but i consider every unoverloaded ref to be a valid true value, and any overloaded reference could act as a boolean as well, so I think there is a conceptual clash there as well. Related I think the arguments that undef has a specific meaning closer to an SQL "NULL" tend to fall down under the semantics that Larry chose to provide. undef is == to undef and 0, and undef is eq to undef and "" (albeit with warnings), the mutator operators consider "undef" to be a valid initial value silently mapping to 0 or the empty string, and it is considered a valid boolean value in all internal contexts. So the weight of history does not support that "undef" means "no value" in the same sense that SQL NULL means that. We need to accomodate that history, especially as it is particularly useful in many contexts.

A great deal of code depends on

if ($hash{bool_predicate}) { ... }

working the same regardless if $hash{bool_predicate} is 0 or the empty space, /or missing/ and thus undef. Making BOOL not respect this would be problematic IMO and would inevitably need to be changed in the future.

@happy-barney
Copy link
Collaborator Author

@demerphq oh, thanks mentioning overload, raising questions:

  • is in fact valid bool value value with overloaded bool conversion ? (in fact, any value with provided overload)
  • as far as this are not types (though word check is close to it), maybe implicit coercion can be applied on assignment as well ?
    • if it is blessed and overloaded, accept it
    • if it is not, coerce it

@Ovid
Copy link
Collaborator

Ovid commented Jul 22, 2023

I agree that undef cannot behave like NULL. I wish it did, but that ship has sailed.

Otherwise, to answer @happy-barney's question, in Data::Checks::Parser, we have this:

    BOOL   => q(( «NONREF» || overload::Method(§,'bool') )),

So a BOOL is anything that is not a reference, or has boolean overloading. NONREF is defined as this:

    NONREF => q(( «DEF» && !Data::Checks::Parser::reftype(§) )),

So yeah, currently it requires it being defined.

You can read the tests here.

(I'm not saying this is how things have to be, I'm saying "that's what the code currently does")

Also, we don't allow implicit coercion anywhere (that I recall) because of this:

my INT $max_size = \@foo;

Currently, it's trivial to do things like this: say $this + $that and if either $this or $that is a reference, it will be quietly (no warnings!) coerced to an integer. Oshun doesn't stop that directly because it applies to assignment, not operators, but we can at least prevent some simple coercion errors.

Any thoughts about applying checks outside of assignment should be post-MVP.

@druud
Copy link
Collaborator

druud commented Jul 22, 2023 via email

@Ovid
Copy link
Collaborator

Ovid commented Jul 22, 2023

@druud We vetoed default values early on because they tend to be arbitrary. Just because something defaults to false doesn't mean it's false, for example. If we default my UINT $count; to zero, what if there's actually a count, but due to poor code, it's not assigned? What would be defaults on GLOB or HANDLE?

@happy-barney
Copy link
Collaborator Author

better example is with where

my  UINT where { $_ > 1 && $_ % 2 && $_ % 3 && $_ % 4 } $x ...

what should be default ?

@happy-barney
Copy link
Collaborator Author

oh, test case for parser implementation (should where syntax by like this):

declare check sub;
my sub where { ... } $x;
my sub where { ... };

@druud
Copy link
Collaborator

druud commented Jul 22, 2023 via email

@druud
Copy link
Collaborator

druud commented Jul 22, 2023 via email

@happy-barney
Copy link
Collaborator Author

@druud I'd say this discussion is irrelevant for MVP. In later stages (that's why I created Roadmap discussion ...) there should be warning / error when reading uninitialized variable which had specified contract.

@druud
Copy link
Collaborator

druud commented Jul 22, 2023 via email

@duncand
Copy link

duncand commented Jul 22, 2023

Not all data types need to support a default. And maybe not call it :default, but call it :empty.

Maybe call it :unassigned or some synonym instead which is more accurate on what it actually is talking about.

Some types don't have a concept of the "empty" value either. What would be the "empty" value for a day-of-week type, or an odd-integer type?

No matter what we call it, given that in Perl like in many languages an object isn't always just representing a plain old value that can be cleanly serialized and deserialized, like a tree whose endpoints are all regular scalars, instead sometimes objects represent other resources or things, such as an open file handle or something, which are practically impossible to serialize/deserialize and it only makes sense to create them with explicit constructor arguments, and so you can't have a default/empty/whatever value.

I feel the simplest solution that takes all of the options into account is, if the type includes undef then that is its default value when not explicitly initialized, and otherwise an explicit initializing assignment is required for any variable having a CHECK. I seem to recall something like this, or explicit-assignment-always, was proposed and favoured before.

@Ovid
Copy link
Collaborator

Ovid commented Jul 23, 2023

Another odd consideration: If we allow undef for BOOL, we introduce the potential for a slight inconsistency.

my BOOL $success; # succeeds because `undef` satisfies BOOL
my INT  $count;   # fails because `undef` doesn't satisfy INT

Not saying it's a blocker, but it's a tiny thing we might want to consider.

@duncand
Copy link

duncand commented Jul 23, 2023

Another odd consideration: If we allow undef for BOOL, we introduce the potential for a slight inconsistency.

my BOOL $success; # succeeds because `undef` satisfies BOOL
my INT  $count;   # fails because `undef` doesn't satisfy INT

Not saying it's a blocker, but it's a tiny thing we might want to consider.

I wouldn't call that minor, I would call that a MAJOR inconsistency, that shouldn't happen. All of the plain-named types/checks like BOOL/INT/etc should be mutually consistent, either they all include undef, or they all exclude it, not some of one and some of the other. Having them different in this respect violates the principle of least surprise, that things which look similar should behave similar etc. So I strongly disagree with the original proposal that BOOL is treated differently than the others like INT/etc.

@druud
Copy link
Collaborator

druud commented Jul 24, 2023

That is my position too. And don't conflate a BOOL data value with 'boolean context'.

@druud
Copy link
Collaborator

druud commented Jul 24, 2023

Coming back to 'empty': your examples of weekday and odd-int, just show why I propose to call it 'empty' and not 'undefined'.

Some types are good to get set when "left empty", and others aren't. I don't think we need this 'empty' feature from the start, though I think it will help making things easier to accept, as it would lead to less boilerplate: no need to initialise (almost) every INT to 0, TEXT to "", etc.

Such an 'empty' mechanism is best strictly limited to only use values like 0, "", \0, false and such. Such an 'empty' mechanism is much smaller than an 'undefined' mechanism would be, and leads to much less 'action at a distance'. So IMO it shouldn't support a <BOOL $active> with a default of true, but a <BOOL $obsoleted> with a default of false would be OK.

For example have an UINT32Z, that is mostly like UINT32, but also auto-initialises to 0.

@happy-barney
Copy link
Collaborator Author

@druud
Copy link
Collaborator

druud commented Jul 24, 2023

Important is that such a mechanism can not be used to auto-initialise values to any truthy, non-empty value. So the \0 that I mentioned earlier, must be blessed and have proper overload logic.

@happy-barney
Copy link
Collaborator Author

@druud milestone XYZ:

declare check FOO as UINT where { ... }  default 1_204;

@Ovid
Copy link
Collaborator

Ovid commented Jul 24, 2023

So what I'm seeing here:

  1. Current behavior allows us to do BOOL | UNDEF to trivially satisfy this constraint.
  2. Defaults are post-MVP
  3. builtin::true and builtin::false complicate things a bit
  4. Implicit coercions must not be allowed because we can't disable checks safely if they rely on coercive behavior
  5. We're not sure what the rest of the world wants.

Given the first point, I think trying to figure out what people are going to do with this is maybe a bad thing. Let's look at his from a "lean project management" standpoint: we build what we need and nothing more. When we get feedback on real-world use, we can incorporate that.

Does that sound good, or have I missed something?

@druud
Copy link
Collaborator

druud commented Jul 24, 2023

Sounds good to me.

@duncand
Copy link

duncand commented Jul 25, 2023

Sounds good to me too.

@druud
Copy link
Collaborator

druud commented Jul 25, 2023 via email

@duncand
Copy link

duncand commented Jul 25, 2023

@druud With regards to your comment about the value 1 and multiplications, I see this as being with respect to a completely different feature set which is about being able to annotate particular properties of typed functions so that higher-level functions can do more intelligent things with them. For example, annotate that multiplication for particular types is commutative and associative operation with an identity value of 1, and similar for addition but identity value of zero, so that higher order functions such as reduce/fold know that if they are given the multiplication operator plus an empty list then the result is defined to be the identity value rather than the result being undefined, or the optimizer knows what flexibility it has for what order elements are combined, etc. Identity is very distinct from empty in concept.

@druud
Copy link
Collaborator

druud commented Jul 26, 2023

Yes, and please don't conflate them on my account; it was merely a funny way to show why non-empty defaults should not be supported. Still OK to notice the relation between empty and identity. :)

Here at work I block non-empty defaults for (non-dynamic) columns in (MySQL) table definitions. We call that "no business logic on the db-side". The only exceptions are dynamic timestamps, for which we use names like mysql_row_created_at and mysql_row_updated_at, which are not allowed to be touched from the application-side, so are to be treated as row-attributes rather than columns. For application-side timestamps, use a signed bigint and store an application-side time-value. IOW, keep the time worlds of app and db completely separated. The profit from all of this is that the application doesn't always need to read back a row right after its creation, to find out what the database made of it, which saves us millions of roundtrips per minute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants