-
Notifications
You must be signed in to change notification settings - Fork 565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disallow orphan instances #1247
Comments
👍 for no escape hatch. |
Hugely important to me 👍 And I'm pretty certain those are the correct rules for enforcing coherency with mutli-parameters 👍 |
Sorry for reviving an old thread, but I'd like to understand this issue a bit more. I've been thinking about Scala implicits as a model for clashing instances. In my few years of programming Scala, it has never been much of an issue; Scala has manual import/export ability for implicit values as well (iirc, getting a bit rusty with Scala lately). Existing (non-orphan) instances could continue to be imported implicitly, but orphan instances would require manual import, or give a compile-time failure otherwise. Which seems like a satisfactory outcome, as it is for conflicting implicit values in the same scope in Scala. I know @puffnfresh has a lot more experience than me in both Scala and PureScript, though, and I haven't thought a lot about this, so I'm sure I'll learn something! But it would be cool to have that power, without sacrificing safety beyond compilation failure at use-sites, which could be avoided by altering imports. |
This gets asked occasionally, and I'm not aware of a solid writeup, so it would probably be good to have one. Quite a lot has already been written about this in the Haskell world too, so that's probably a good place to look. See e.g. https://www.reddit.com/r/purescript/comments/4y0qwf/why_are_orphan_instances_strictly_disallowed/ and https://stackoverflow.com/questions/3079537/orphaned-instances-in-haskell. I also recommend looking at the GHC users' guide, which gives a flavour of how involved these issues can become: https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#instance-overlap I think it's basically to do with coherence: the property that any given type can only have one instance of a given class. This is such a natural property that I suspect we often make use of it without even thinking about it. For instance, Another way of looking at it is this: if we allow orphan instances, we have to work out what to do when we have two different candidates for a particular instance in scope.This is a huge can of worms (this can be seen by googling for the numerous threads, blog posts, and stack overflow questions on the subject). One option is to reject the program entirely, but this isn't great: an upstream library adding an instance can now cause your program to stop compiling. Another is to have the compiler pick the most specific instance, but this is difficult to describe and it still suffers from a similar problem: an upstream library might add a more specific instance than the one you were using with slightly different behaviour, so that your program silently starts using that instance as opposed to the more general one which was being used previously. |
@garyb @hdgarrood instances in purescript have names, why not just use the non-orphan by default and give an option to use a named instance (that's what Idris does for example). I can say that not having an escape hatch for orphan instances is the single biggest waste of my time and damage to the codebase when using purescript. As soon as you want to encode and decode things (which is pretty often given purescript is a browser language) you have to add massive amounts of boiler plate, lots of newtypes, wrapping and unwrapping, etc etc etc. I can see all the arguments against orphan instances but it is really, really, really impractical. On a reasonably large codebase, once every month or so I end up wasting a whole day doing this thankless task and I am forced to make my code less readable and more complicated. An extra newtype and the odd unwrap is not a problem but these add up, our codebase is significantly worse because of it. I know I'm not the only person who has had the same experience. P.S. I'm not as angry as I sound, purescript is an amazing and highly practical tool 😄 |
I wonder what it is in our respective codebases/styles of working that makes this a problem for you/others, but something I almost never encounter. 🤔 I think |
Lack of orphan instances is one of the best things about PureScript. It's not a problem at all. |
It is intriguing. One problem that hits me is that foreign-generic has no instances for list, set, map and tuple and that makes sense from a library perspective I guess since different encodings make sense in different systems (map and tuple particularly). Newtyping every map and tuple in my domain so that I don't have to write very difficult encode/decode instances means that I then have lots of wrap/unwrap in places that I don't want it. I've just had to do this and there was really a lot of boiler plate, way too much so that it made parts of the code very difficult to read. If my project could have it's own instances for these basic data types and I could specify the instance names then all this would be gone. Since we already have to name instances, from a user perspective having some sort of @garyb what do you do about sending deeply nested maps and tuples over JSON? |
I don't use any kind of "automated"/derived mechanisms for serializing, we write our codecs explicitly as values using https://github.com/garyb/purescript-codec-argonaut (some reasoning here). I would suggest instead of using newtypes in your domain, having a serialization model separate from the domain model might be slightly simpler? That way any newtyping you need to do to massage the serialization won't get in the way of program logic at least. This is like a half-way step to the full codec approach too, it at least provides some protection from accidentally breaking the format and such. |
@garyb very often our communication between the front end and back end is developed in sync (even using purescript bridge sometimes). Writing encoding by hand is 100% waste of time in these cases, a large amount of effort for no real-life benefit. Deriving can be a safer often because you don't forget to change it when you change a data type. If you use a specific deriving mechanism then it can certainly be safer than writing a codec yourself (of course not always) that is assuming the deriving mechanism works as expected (maybe that's more likely to work correctly than humans writing codecs though). I think your article presents things as a bit black and white (although maybe that's no problem because that side of the problem probably isn't presented often enough), however as we know, everything has trade-offs and there are many situations where deriving everything is certainly the better (and safer) option. The newtyping I was doing doesn't need to get in the way if I have a way to escape the no orphan instances rule. Or, as is the case now, I decide to fork the library where the classes are defined. I'm doing this now and on balance it is creating less work, higher quality, less to go wrong and safer code. In this case it also happens that foreign-generic is not a large library and doesn't need to change very often so forking it is not a big deal however clearly that's not always the case. None of this would be a problem if I could use an Idris like |
Yeah, very true! I wrote it from my experience, and I've just always had trouble with derived codecs in the long run (we even have trouble in the compiler with them inadvertently breaking things from time to time). The case of having a frontend/backend pair where they both speak the same types, and where there's no chance of the domain model being persisted to disk, is perhaps the only situation where there are no drawbacks, but that's not a situation I've encountered personally. |
The problem with allowing you to choose the instance you’re using in a specific part of your program is that assuming instances are globally unique is really very natural; I suspect we are all doing it more often than we realise. The classic example is if a map is used in two places where different Ord instances for its key type are in scope, it totally breaks. I think maintaining your own serialization type classes is really not a bad option at all, especially if you’re relying on Generic deriving. |
I also agree that maintaining your own serialization class is the way to go. I just don't think classes like From/ToJSON currently make for a good library abstraction. There are just going to be too many special cases, assuming that anything is a desired globally unique implementation is going to lead to someone being upset. At work, we codegen (much more efficient dictionaries and implementations) but if I was using Generic or needed to maintain certain representations against a backend API, I would most certainly just use an application internal typeclass. Maybe something like |
Ah yes, I could argue (to myself) that maintaining a fork of foreign-generic is actually just maintaining a serialization package for my project as a whole. Thanks for allowing me to think about this. It's certainly a real shame if the orphan rule is never relaxed, it seems from my perspective to go against the difference between Elm and Purescript. If I want my language to be simple and restrictive (often a good choice) then I would go with Elm however if I want more power I'll go with PS. The orphan instances thing is one place where it seems PS is disallowing something "for my own good" but in fact is making perfectly valid use cases that would make my code better, impossible. I can say with certainty that it's the only thing that keeps coming up after almost 5 years of using purescript. |
I'll note that the orphan rule does not have to necessarily be restricted to modules. That is just our compilation unit. Rust for example also has an orphan rule, but because it uses entire crates as compilation units, it is able to enforce uniqueness across the entire crate, and not just to a single module. Maybe in the future with a package-aware compiler, it would be able to provide the same guarantees over a wider scope, but I don't see the current rule being relaxed to the point that you can provide implementations for both types and classes from other packages. |
Just another possible use case where this came up for me: purescript/purescript-enums#39 ; I also felt that taking another approach was just too unwieldy here. This resulted me in having to fork purescript-enum, which also meant none of my resulting libraries could benefit from being on pursuit, which is too bad as that is a really wonderful site. |
If newtyping or separating types for different domains are really not options, then there are ways to solve this in libraries: import Prelude
import Data.Array as A
import Data.List
import Data.Generic.Rep
import Foreign.Class
import Foreign.Generic
import Type.Proxy
import Unsafe.Coerce
data Foo = Foo (List Int)
derive instance genericFoo :: Generic Foo _
instance x :: Encode Foo where
encode = unsafeWithListInstance (Proxy :: _ Int) genericEncode defaultOptions
unsafeWithListInstance :: forall a r. Encode a => Proxy a -> (Encode (List a) => r) -> r
unsafeWithListInstance _ f = (coerce f :: ({ encode :: List a -> Foreign } -> r)) { encode: encode <<< A.fromFoldable } where
coerce :: forall a. (Encode a => r) -> ({ encode :: a -> Foreign } -> r)
coerce = unsafeCoerce But this definitely doesn't have to be a compiler change. |
Not to attempt an argument from authority, however there is an interesting (historical!) discussion on the topic from 12 years ago, in which SPJ claims there is no implementation-independent reason to stay away from orphan instances: https://mail.haskell.org/pipermail/libraries/2008-August/010399.html As near as I can tell, a lot of the same pros and cons were discussed then as are being discussed in this thread.
This is a great point and a great example. I imagine there could be solutions to this, and I'm probably not the best person to think them up. But I'll give it a few tries:
I think between 2 and 3, and maybe just 3, this would solve the problems with orphan instances described here. I'd really like to know if I've missed understanding some subtlety here. Again this comes back to Scala implicits. There's not a lot about Scala I miss, but I actually rather enjoyed that feature of implicits. Of course, if we could pass around typeclass instances as first-class values, it would be similar, as we could thread them through a Reader monad. |
I've been a bit curious about Idris2 lately and was wondering how it handled it, and in this thread it appears that both Idris(2) and Agda essentially support orphan instances. As near as I can tell without knowing either of those languages much beyond the hello-world stage, it seems like they do something similar to Scala implicits, or the explicit importing I mentioned as option (3) above, though what they have sounds closer to implicits as it is more granular than file-level, I think. |
Here's another example I thought of that is slightly problematic for the current approach. Let's say you want to make a typeclass that others use that supports multiple existing data types of varying degrees of popularity. One example might be a hypothetical Neither of these options seems good. In the first case, you are adding a lot of maintenance overhead and potential bloat to your library. I see this discussed frequently as a concern, e.g.: purescript/purescript-either#38 In the second case ... it simply won't happen, between maintainers that are either MIA or disagree about including your typeclass for whatever reason. |
I'm not sure about the other maintainers, but just to be upfront about it: there is no argument that can be made on this issue that will make me come around to agreeing that allowing orphan instances again is a good thing. It's just how PureScript works now and forever more as far as I'm concerned. If there is ever a PureScript 2, maybe it'd be something up for discussion then, as the only way orphans would be agreeable for me is in a system that is fundamentally different from how typeclasses work just now. |
Fair enough, I suspected as much but I was primarily posting here as this seems to be the best place to include information on it at the moment, though a discourse thread would probably be better. Also side point: i realize you probably wouldn't actually want a ConcatMap class - that was just an example. |
Sure! That's fine if you want to keep posting ideas and things here in case of some possible future version of PS or something. 🙂 I just thought I should make my position on it clear so you don't feel like you're wasting time if nothing ever comes of it again. |
Thanks! - and sorry if it sounds like I kept badgering people on this - that's not the intent. |
I'm in agreement with Gary here, too. |
I agree with that as well. I really don't think there will be a convincing argument for "allow orphans everywhere". |
I think there are probably other solutions to these problems worth considering besides fully reverting the orphan ban. Collecting issues caused in part by PureScript's position on orphans seems to me to offer more potential value than just accumulating ideas for a different language. For example, Similarly, the scenario @bbarker just raised about
I assume/hope that the maintainers' position on the whole class of ideas similar to the above is weaker than can't-imagine-ever-supporting? |
Orphans are a bit of a red-herring. The issue is preserving global coherence, while allowing the solver to reason locally about available instances, in the presence of non-trivial type-level features and separate/incremental compilation. I think there are certainly other avenues that can be discussed, but it has to be in terms of preserving the advantages and correctness of the existing orphan rule. There won't be convincing arguments if it requires us to compromise on that. |
I have seen a few mentions of the Idris way of doing things in this thread, but I don't believe I've seen any feedback on that from those opposed to orphan instances. While it would be a major change in the language, I'm not quite sure it would necessitate a "PureScript 2", for instance - but I certainly could be wrong. Edit: And as per @natefaubion 's recent comment, I think everyone wants to have coherence - so I shouldn't really say "those opposed to orphan instances" but "those championing the merits of global coherence" :-) |
I have not seen anyone attempt to phrase the tradeoffs in the terms I just enumerated. I'm not an expert on the mentioned systems, but I believe they just forgo global coherence. This is less of an issue for dependently-typed languages because types can be indexed by values. See Kmett's "Typeclasses vs the World" talk. |
It's quite disheartening to hear anyone say this about anything 😞 While I can understand that in the case of "just allow orphan instances everywhere", this thread is providing serious motivating examples and possible solutions where orphan instances could be allowed with certain constraints. It doesn't seem to make much sense to completely shut down the possibility of a mechanism that would solve people's problems in the future in the way that this comment suggests. I have no authority to comment on this really since I have not contributed to the purescript language and @garyb has done an amazing, massive job however I just found it very sad to see this and had to comment 😢 I hope that the scenarios that are not possible are considered as good challenges in the future and that "just newtype everything" is recognized as a highly imperfect workaround and not a complete solution to the issues. |
I could imagine a backpack-like layer that lets packages depend on a signature module with a class and potentially default implementations, where an application/project can instantiate it while adding any additional implementations they want. This essentially lets you move the definition of this class into your own project without doing it manually, while reusing existing instances defined with the types. There is no downside to this currently, since we don't ship built artifacts as part of package install, but in the future this would just mean modules that depend on a signature module cannot share build products. They incur a dependency on the instantiated module. This does not run afoul the orphan rule, because you are essentially just (automatically) copying the entire source tree of these downstream modules into your project. Note, this can very likely be prototyped without compiler support as a meta/build layer. |
Good point. Let me clarify. I so strongly doubt that someone would invest the time and energy to think through, design, and implement an "it actually works!" solution to this problem in such a way that the tradeoffs are actually worth the change that my gut reaction is to just say, "No." By all means, prove me wrong! Let logic and truth win! However, I wouldn't see the point in talking about it until you've implemented code that actually works and has been tested by time. |
Further required reading: https://github.com/AndrasKovacs/pny1-assignment/blob/master/DepTC.md |
We keep talking about this but don't seem to have an issue for it. I'm quite an enthusiastic proponent of disallowing them and not providing an escape hatch either.
Worth discussing too perhaps, what counts as an orphan in the presence of multi-parameter typeclasses? I guess for an instance to not be an orphan it must satisfy either:
The text was updated successfully, but these errors were encountered: