Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow orphan instances #1247

Closed
garyb opened this issue Jul 9, 2015 · 34 comments
Closed

Disallow orphan instances #1247

garyb opened this issue Jul 9, 2015 · 34 comments

Comments

@garyb
Copy link
Member

garyb commented Jul 9, 2015

We keep talking about this but don't seem to have an issue for it. I'm quite an enthusiastic proponent of disallowing them and not providing an escape hatch either.

Worth discussing too perhaps, what counts as an orphan in the presence of multi-parameter typeclasses? I guess for an instance to not be an orphan it must satisfy either:

  1. The instance is declared in the same module as its class.
  2. At least one of the types appearing in the instance head is declared in the current module.
@paf31
Copy link
Contributor

paf31 commented Jul 9, 2015

👍 for no escape hatch.

@puffnfresh
Copy link
Contributor

Hugely important to me 👍

And I'm pretty certain those are the correct rules for enforcing coherency with mutli-parameters 👍

@paf31 paf31 modified the milestones: 0.9.0, 0.8.0 Aug 8, 2015
@bbarker
Copy link

bbarker commented Jul 12, 2019

Sorry for reviving an old thread, but I'd like to understand this issue a bit more. I've been thinking about Scala implicits as a model for clashing instances. In my few years of programming Scala, it has never been much of an issue; Scala has manual import/export ability for implicit values as well (iirc, getting a bit rusty with Scala lately).

Existing (non-orphan) instances could continue to be imported implicitly, but orphan instances would require manual import, or give a compile-time failure otherwise. Which seems like a satisfactory outcome, as it is for conflicting implicit values in the same scope in Scala.

I know @puffnfresh has a lot more experience than me in both Scala and PureScript, though, and I haven't thought a lot about this, so I'm sure I'll learn something! But it would be cool to have that power, without sacrificing safety beyond compilation failure at use-sites, which could be avoided by altering imports.

@hdgarrood
Copy link
Contributor

This gets asked occasionally, and I'm not aware of a solid writeup, so it would probably be good to have one. Quite a lot has already been written about this in the Haskell world too, so that's probably a good place to look. See e.g. https://www.reddit.com/r/purescript/comments/4y0qwf/why_are_orphan_instances_strictly_disallowed/ and https://stackoverflow.com/questions/3079537/orphaned-instances-in-haskell. I also recommend looking at the GHC users' guide, which gives a flavour of how involved these issues can become: https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#instance-overlap

I think it's basically to do with coherence: the property that any given type can only have one instance of a given class. This is such a natural property that I suspect we often make use of it without even thinking about it. For instance, Map relies pretty heavily on the assumption that a given type can only have one Ord instance: if you were to somehow construct a Map K V in one module with some Ord K instance in scope, and then try to lookup values in that Map in another module where you had a different Ord K instance in scope, you will end up with lookup returning Nothing when it should be returning Just something, because Map depends on coherence of the Ord instance in use.

Another way of looking at it is this: if we allow orphan instances, we have to work out what to do when we have two different candidates for a particular instance in scope.This is a huge can of worms (this can be seen by googling for the numerous threads, blog posts, and stack overflow questions on the subject). One option is to reject the program entirely, but this isn't great: an upstream library adding an instance can now cause your program to stop compiling. Another is to have the compiler pick the most specific instance, but this is difficult to describe and it still suffers from a similar problem: an upstream library might add a more specific instance than the one you were using with slightly different behaviour, so that your program silently starts using that instance as opposed to the more general one which was being used previously.

@shmish111
Copy link

@garyb @hdgarrood instances in purescript have names, why not just use the non-orphan by default and give an option to use a named instance (that's what Idris does for example). I can say that not having an escape hatch for orphan instances is the single biggest waste of my time and damage to the codebase when using purescript. As soon as you want to encode and decode things (which is pretty often given purescript is a browser language) you have to add massive amounts of boiler plate, lots of newtypes, wrapping and unwrapping, etc etc etc. I can see all the arguments against orphan instances but it is really, really, really impractical. On a reasonably large codebase, once every month or so I end up wasting a whole day doing this thankless task and I am forced to make my code less readable and more complicated. An extra newtype and the odd unwrap is not a problem but these add up, our codebase is significantly worse because of it. I know I'm not the only person who has had the same experience.

P.S. I'm not as angry as I sound, purescript is an amazing and highly practical tool 😄

@garyb
Copy link
Member Author

garyb commented Aug 14, 2020

I wonder what it is in our respective codebases/styles of working that makes this a problem for you/others, but something I almost never encounter. 🤔

I think Arbitrary / Coarbitrary instances for quickcheck is the only time it's occurred for me, outside of situations where a missing instance was just an oversight and could be PR'd to the library. I guess you're using classes and datatypes from libraries that don't have an existing relationship with each other, but I'm not sure what those would be?

@puffnfresh
Copy link
Contributor

Lack of orphan instances is one of the best things about PureScript. It's not a problem at all.

@shmish111
Copy link

It is intriguing. One problem that hits me is that foreign-generic has no instances for list, set, map and tuple and that makes sense from a library perspective I guess since different encodings make sense in different systems (map and tuple particularly). Newtyping every map and tuple in my domain so that I don't have to write very difficult encode/decode instances means that I then have lots of wrap/unwrap in places that I don't want it. I've just had to do this and there was really a lot of boiler plate, way too much so that it made parts of the code very difficult to read. If my project could have it's own instances for these basic data types and I could specify the instance names then all this would be gone. Since we already have to name instances, from a user perspective having some sort of using syntax would be very comfortable and explicit about the fact that you are overriding the default behaviour.

@garyb what do you do about sending deeply nested maps and tuples over JSON?

@garyb
Copy link
Member Author

garyb commented Aug 15, 2020

I don't use any kind of "automated"/derived mechanisms for serializing, we write our codecs explicitly as values using https://github.com/garyb/purescript-codec-argonaut (some reasoning here).

I would suggest instead of using newtypes in your domain, having a serialization model separate from the domain model might be slightly simpler? That way any newtyping you need to do to massage the serialization won't get in the way of program logic at least. This is like a half-way step to the full codec approach too, it at least provides some protection from accidentally breaking the format and such.

@shmish111
Copy link

@garyb very often our communication between the front end and back end is developed in sync (even using purescript bridge sometimes). Writing encoding by hand is 100% waste of time in these cases, a large amount of effort for no real-life benefit. Deriving can be a safer often because you don't forget to change it when you change a data type. If you use a specific deriving mechanism then it can certainly be safer than writing a codec yourself (of course not always) that is assuming the deriving mechanism works as expected (maybe that's more likely to work correctly than humans writing codecs though). I think your article presents things as a bit black and white (although maybe that's no problem because that side of the problem probably isn't presented often enough), however as we know, everything has trade-offs and there are many situations where deriving everything is certainly the better (and safer) option. The newtyping I was doing doesn't need to get in the way if I have a way to escape the no orphan instances rule. Or, as is the case now, I decide to fork the library where the classes are defined. I'm doing this now and on balance it is creating less work, higher quality, less to go wrong and safer code. In this case it also happens that foreign-generic is not a large library and doesn't need to change very often so forking it is not a big deal however clearly that's not always the case.

None of this would be a problem if I could use an Idris like using syntax.

@garyb
Copy link
Member Author

garyb commented Aug 16, 2020

I think your article presents things as a bit black and white

Yeah, very true! I wrote it from my experience, and I've just always had trouble with derived codecs in the long run (we even have trouble in the compiler with them inadvertently breaking things from time to time). The case of having a frontend/backend pair where they both speak the same types, and where there's no chance of the domain model being persisted to disk, is perhaps the only situation where there are no drawbacks, but that's not a situation I've encountered personally.

@hdgarrood
Copy link
Contributor

The problem with allowing you to choose the instance you’re using in a specific part of your program is that assuming instances are globally unique is really very natural; I suspect we are all doing it more often than we realise. The classic example is if a map is used in two places where different Ord instances for its key type are in scope, it totally breaks. I think maintaining your own serialization type classes is really not a bad option at all, especially if you’re relying on Generic deriving.

@natefaubion
Copy link
Contributor

I also agree that maintaining your own serialization class is the way to go. I just don't think classes like From/ToJSON currently make for a good library abstraction. There are just going to be too many special cases, assuming that anything is a desired globally unique implementation is going to lead to someone being upset. At work, we codegen (much more efficient dictionaries and implementations) but if I was using Generic or needed to maintain certain representations against a backend API, I would most certainly just use an application internal typeclass. Maybe something like deriving via will make special cases like this easier to override in the future. I don't see us ever relaxing the orphan rule, personally.

@shmish111
Copy link

Ah yes, I could argue (to myself) that maintaining a fork of foreign-generic is actually just maintaining a serialization package for my project as a whole. Thanks for allowing me to think about this.

It's certainly a real shame if the orphan rule is never relaxed, it seems from my perspective to go against the difference between Elm and Purescript. If I want my language to be simple and restrictive (often a good choice) then I would go with Elm however if I want more power I'll go with PS. The orphan instances thing is one place where it seems PS is disallowing something "for my own good" but in fact is making perfectly valid use cases that would make my code better, impossible. I can say with certainty that it's the only thing that keeps coming up after almost 5 years of using purescript.

@natefaubion
Copy link
Contributor

natefaubion commented Aug 16, 2020

I'll note that the orphan rule does not have to necessarily be restricted to modules. That is just our compilation unit. Rust for example also has an orphan rule, but because it uses entire crates as compilation units, it is able to enforce uniqueness across the entire crate, and not just to a single module. Maybe in the future with a package-aware compiler, it would be able to provide the same guarantees over a wider scope, but I don't see the current rule being relaxed to the point that you can provide implementations for both types and classes from other packages.

@bbarker
Copy link

bbarker commented Aug 16, 2020

Just another possible use case where this came up for me: purescript/purescript-enums#39 ; I also felt that taking another approach was just too unwieldy here. This resulted me in having to fork purescript-enum, which also meant none of my resulting libraries could benefit from being on pursuit, which is too bad as that is a really wonderful site.

@paf31
Copy link
Contributor

paf31 commented Aug 17, 2020

If newtyping or separating types for different domains are really not options, then there are ways to solve this in libraries:

import Prelude
import Data.Array as A
import Data.List
import Data.Generic.Rep
import Foreign.Class
import Foreign.Generic
import Type.Proxy
import Unsafe.Coerce

data Foo = Foo (List Int)

derive instance genericFoo :: Generic Foo _

instance x :: Encode Foo where
  encode = unsafeWithListInstance (Proxy :: _ Int) genericEncode defaultOptions
  
unsafeWithListInstance :: forall a r. Encode a => Proxy a -> (Encode (List a) => r) -> r
unsafeWithListInstance _ f = (coerce f :: ({ encode :: List a -> Foreign } -> r)) { encode: encode <<< A.fromFoldable } where
  coerce :: forall a. (Encode a => r) -> ({ encode :: a -> Foreign } -> r)
  coerce = unsafeCoerce

But this definitely doesn't have to be a compiler change.

@bbarker
Copy link

bbarker commented Sep 20, 2020

Not to attempt an argument from authority, however there is an interesting (historical!) discussion on the topic from 12 years ago, in which SPJ claims there is no implementation-independent reason to stay away from orphan instances: https://mail.haskell.org/pipermail/libraries/2008-August/010399.html As near as I can tell, a lot of the same pros and cons were discussed then as are being discussed in this thread.

The problem with allowing you to choose the instance you’re using in a specific part of your program is that assuming instances are globally unique is really very natural; I suspect we are all doing it more often than we realise. The classic example is if a map is used in two places where different Ord instances for its key type are in scope, it totally breaks. I think maintaining your own serialization type classes is really not a bad option at all, especially if you’re relying on Generic deriving.

This is a great point and a great example. I imagine there could be solutions to this, and I'm probably not the best person to think them up. But I'll give it a few tries:

  1. Course grained approach: Global choice of instance for the entire compilation. Easy to implement, probably, but not very flexible. Still, useful for many cases I imagine..
  2. instance implementation as a hidden parameter of the types that it is used for in the type checker. These could possibly be erased early on if only one instance is in scope in a project (not sure how the compiler works here), but the idea is that in e.g. the Map/Ord case, different Ord impls for a Map would actually be different Map types. Most of the time this wouldn't matter unless we try to use two Map values of the same time together, in which case the type checker would catch this.
  3. Explicit scoping. This probably isn't mutually exclusive with (2) -- or maybe it is -- but the idea would be that a default instance ceases being in scope if an orphan instance is explicitly brought into scope. I'm actually not sure if this is hard to implement - I could see it going either way from the little I know.

I think between 2 and 3, and maybe just 3, this would solve the problems with orphan instances described here. I'd really like to know if I've missed understanding some subtlety here.

Again this comes back to Scala implicits. There's not a lot about Scala I miss, but I actually rather enjoyed that feature of implicits. Of course, if we could pass around typeclass instances as first-class values, it would be similar, as we could thread them through a Reader monad.

@bbarker
Copy link

bbarker commented Sep 20, 2020

I've been a bit curious about Idris2 lately and was wondering how it handled it, and in this thread it appears that both Idris(2) and Agda essentially support orphan instances. As near as I can tell without knowing either of those languages much beyond the hello-world stage, it seems like they do something similar to Scala implicits, or the explicit importing I mentioned as option (3) above, though what they have sounds closer to implicits as it is more granular than file-level, I think.

@bbarker
Copy link

bbarker commented Jan 6, 2021

Here's another example I thought of that is slightly problematic for the current approach. Let's say you want to make a typeclass that others use that supports multiple existing data types of varying degrees of popularity. One example might be a hypothetical ConcatMap typeclass. Do I then depend on every other datatype that impelements concatMap, or make PRs and hope that they will depend on my ConcatMap library?

Neither of these options seems good.

In the first case, you are adding a lot of maintenance overhead and potential bloat to your library. I see this discussed frequently as a concern, e.g.: purescript/purescript-either#38

In the second case ... it simply won't happen, between maintainers that are either MIA or disagree about including your typeclass for whatever reason.

@garyb
Copy link
Member Author

garyb commented Jan 6, 2021

I'm not sure about the other maintainers, but just to be upfront about it: there is no argument that can be made on this issue that will make me come around to agreeing that allowing orphan instances again is a good thing.

It's just how PureScript works now and forever more as far as I'm concerned. If there is ever a PureScript 2, maybe it'd be something up for discussion then, as the only way orphans would be agreeable for me is in a system that is fundamentally different from how typeclasses work just now.

@bbarker
Copy link

bbarker commented Jan 6, 2021

Fair enough, I suspected as much but I was primarily posting here as this seems to be the best place to include information on it at the moment, though a discourse thread would probably be better.

Also side point: i realize you probably wouldn't actually want a ConcatMap class - that was just an example.

@garyb
Copy link
Member Author

garyb commented Jan 6, 2021

Sure! That's fine if you want to keep posting ideas and things here in case of some possible future version of PS or something. 🙂

I just thought I should make my position on it clear so you don't feel like you're wasting time if nothing ever comes of it again.

@bbarker
Copy link

bbarker commented Jan 6, 2021

Thanks! - and sorry if it sounds like I kept badgering people on this - that's not the intent.

@JordanMartinez
Copy link
Contributor

I'm not sure about the other maintainers, but just to be upfront about it: there is no argument that can be made on this issue that will make me come around to agreeing that allowing orphan instances again is a good thing.

I'm in agreement with Gary here, too.

@natefaubion
Copy link
Contributor

I agree with that as well. I really don't think there will be a convincing argument for "allow orphans everywhere".

@rhendric
Copy link
Member

rhendric commented Jan 6, 2021

I think there are probably other solutions to these problems worth considering besides fully reverting the orphan ban. Collecting issues caused in part by PureScript's position on orphans seems to me to offer more potential value than just accumulating ideas for a different language.

For example, Coercible definitely gets rid of a lot of the problems I see with using newtypes to get around the orphan ban.

Similarly, the scenario @bbarker just raised about ConcatMap makes me think that if we ever get around to reforming PureScript's module/package system, there might be some opportunity there to reduce some of the costs of, or safely weaken, the orphan ban. Two ideas that come to mind (I have not considered these deeply, I don't recall if anyone else has, and this is far from a proposal for either):

  • The ConcatMap module could explicitly declare ‘friend’ modules ConcatMap.List, ConcatMap.Array, etc., which the compiler would not require to exist, but if they exist, instances of ConcatMap would be allowed in them. This would significantly narrow the scope of breaking-downstream-libraries-by-adding-orphan-instances problems relative to the 2015 status quo (module name collisions are already a caveat emptor situation), while still allowing the core ConcatMap package to be small.
  • The compiler and the packager could become more deeply integrated such that a dependency on a kitchen-sink concatmap package would only pull in the dependencies that actually get used by the entry point. This has broader benefits and drawbacks, of course, but it would lessen how bad the concatmap maintainer might feel about adding support for every last esoteric collection type.

I assume/hope that the maintainers' position on the whole class of ideas similar to the above is weaker than can't-imagine-ever-supporting?

@natefaubion
Copy link
Contributor

natefaubion commented Jan 6, 2021

I assume/hope that the maintainers' position on the whole class of ideas similar to the above is weaker than can't-imagine-ever-supporting?

Orphans are a bit of a red-herring. The issue is preserving global coherence, while allowing the solver to reason locally about available instances, in the presence of non-trivial type-level features and separate/incremental compilation. I think there are certainly other avenues that can be discussed, but it has to be in terms of preserving the advantages and correctness of the existing orphan rule. There won't be convincing arguments if it requires us to compromise on that.

@bbarker
Copy link

bbarker commented Jan 6, 2021

I have seen a few mentions of the Idris way of doing things in this thread, but I don't believe I've seen any feedback on that from those opposed to orphan instances. While it would be a major change in the language, I'm not quite sure it would necessitate a "PureScript 2", for instance - but I certainly could be wrong.

Edit: And as per @natefaubion 's recent comment, I think everyone wants to have coherence - so I shouldn't really say "those opposed to orphan instances" but "those championing the merits of global coherence" :-)

@natefaubion
Copy link
Contributor

I have seen a few mentions of the Idris way of doing things in this thread, but I don't believe I've seen any feedback on that from those opposed to orphan instances.

I have not seen anyone attempt to phrase the tradeoffs in the terms I just enumerated. I'm not an expert on the mentioned systems, but I believe they just forgo global coherence. This is less of an issue for dependently-typed languages because types can be indexed by values. See Kmett's "Typeclasses vs the World" talk.

@shmish111
Copy link

I'm not sure about the other maintainers, but just to be upfront about it: there is no argument that can be made on this issue that will make me come around to agreeing that allowing orphan instances again is a good thing.

It's quite disheartening to hear anyone say this about anything 😞 While I can understand that in the case of "just allow orphan instances everywhere", this thread is providing serious motivating examples and possible solutions where orphan instances could be allowed with certain constraints. It doesn't seem to make much sense to completely shut down the possibility of a mechanism that would solve people's problems in the future in the way that this comment suggests.

I have no authority to comment on this really since I have not contributed to the purescript language and @garyb has done an amazing, massive job however I just found it very sad to see this and had to comment 😢

I hope that the scenarios that are not possible are considered as good challenges in the future and that "just newtype everything" is recognized as a highly imperfect workaround and not a complete solution to the issues.

@natefaubion
Copy link
Contributor

natefaubion commented Jan 6, 2021

I could imagine a backpack-like layer that lets packages depend on a signature module with a class and potentially default implementations, where an application/project can instantiate it while adding any additional implementations they want. This essentially lets you move the definition of this class into your own project without doing it manually, while reusing existing instances defined with the types. There is no downside to this currently, since we don't ship built artifacts as part of package install, but in the future this would just mean modules that depend on a signature module cannot share build products. They incur a dependency on the instantiated module. This does not run afoul the orphan rule, because you are essentially just (automatically) copying the entire source tree of these downstream modules into your project.

Note, this can very likely be prototyped without compiler support as a meta/build layer.

@JordanMartinez
Copy link
Contributor

It's quite disheartening to hear anyone say this about anything 😞 .... It doesn't seem to make much sense to completely shut down the possibility of a mechanism that would solve people's problems in the future in the way that this comment suggests.

Good point. Let me clarify. I so strongly doubt that someone would invest the time and energy to think through, design, and implement an "it actually works!" solution to this problem in such a way that the tradeoffs are actually worth the change that my gut reaction is to just say, "No." By all means, prove me wrong! Let logic and truth win! However, I wouldn't see the point in talking about it until you've implemented code that actually works and has been tested by time.

@natefaubion
Copy link
Contributor

Further required reading: https://github.com/AndrasKovacs/pny1-assignment/blob/master/DepTC.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants