-
-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frest start : Next steps #49
Comments
A somewhat lengthy issue here with some thoughs on how to proceed.
One idea is to use the package name ONNX.jl for things which are ML framework agnostic and do Flux to ONNX import/export in e.g a OnnxFlux.jl package. As discussed in the lengthy issue above, there are some other parts which can be put in separate packages (for instance NNlib <-> ONNX primitives). One obvious drawback is the overhead from many packages, including potential confusion for end users but the advantage of reuse could be beneficial when it comes to adding and testing OPs. This issue here has a small concrete potential next step which could fit in a framework agnostic ONNX package: opus111/BaseOnnx.jl#2 but I think it needs some other eyes than mine to judge whether it is the right choice. |
I think I have a grasp on what you're proposing in that issue, but it would be great to have a couple of the motivating examples mentioned to play with :) |
Let's revive this issue. If I understand correctly, we are stuck on a Julia-friendly representation of ONNX graphs, either in a form of computational graph, or in a form of high-level serialization/deserialization primitives. I don't know the status of the NaiveNASlib / CompGraph, but one thing I've been working recently is the Ghost.jl package which implements the I'll be happy to hear your current thoughts on the representation of ONNX graphs in Julia! |
Sorry for being inactive here. I have had a bit of a dip w.r.t energy and motivation lately but I’m working on it :) Ghost looks awesome and I absolutely see how I would love to throw out the ad-hoc tracing mechanism in ONNXmutable for it when it comes to exporting. Would it be fair to say it does roughly what Mjolnir would do when it comes to exporting ONNX? How do you see the relation to import? To me, the import requires normal (but domain specific) Julia code to just implement what the natural language and type-definitions of the ONNX spec says. For instance, to “trace” through a Some verbose thoughts on representation of ONNX graphs in JuliaI would say that the CompGraph in NaiveNASlib does not require any extra work w.r.t ONNX. It can be used as is, but it might not be the ideal choice in every situation (or rather, it is probably very rarely the ideal choice). I think that the Flux crowd seem to prefer that Flux import of ONNX models return a Flux.Chain, even if this means failing to import models which can’t be expressed with Chain. Making some stand-alone This is probably stating the obvious, but both All in all, I think that the choice of how to represent the model needs to be up to each individual package and I think it should be possible to do so without having to duplicate the lions share of work which is to define the ONNX primitives. Therefore it is not a central thing which needs agreement. At least to me, there is value in being able to import an onnx-file as a model built by building blocks from ones favourite framework of choice, i.e that a Node with the operator type set to One needs to be aware that getting full support for ONNX this way is going to be very cumbersome. For instance, a Conv operator which has both the weights and the input feature maps as trainable parameters but where the bias is computed by another node is a perfectly valid ONNX model. One could also take the approach to make a package which aims at giving full support to the ONNX spec without trying to shoehorn things into structs predefined by some other framework. This is also a valid approach but I somehow see less value in this as it would basically be a brand new framework. The reason why I brought this up is that it does relate a bit to what assumptions one can do if one is to put some generic import code in this package. I suppose that if we do, we want to take height for the full-blown ONNX support, not just what seems to be the norm in existing frameworks even if it means that the code (and the use of it) will be more complicated. |
TL;DR copy and paste over I can't speak for any of the other flux contributors, but to respond to the extended thoughts above:
Au contraire, model import should not rely on any framework's functionality. There's a reason both TF and PyTorch still lack any ONNX import functionality: the impedance mismatch is kind of unavoidable. Thankfully, the Julia ecosystem means we don't have to do this! What's required to make a model that works with Flux?
ONNXMutable's
Why not put it in this package? Honestly, the only thing I see preventing this from happening now is the JuMP dependency. If that and all the functionality around graph re-balancing were removed, it could be copied over pretty much wholesale.
This is true, but the allure of runtime function generation shouldn't stop us from getting something out the door. Most users don't care about what their model looks like as long as they can import and run it. I'm not even sure that a dynamic DAG like
IMO full support is more important than feeling idiomatic. One of the biggest frustrations with ONNX implementations is hitting unsupported operators, because there's no good way to fix it as a user. Aesthetic concerns, on the other hand, are much easier to handle. I also don't think such a package would be a brand new framework. Rather, it would be little more than plumbing between ONNX protobufs and thin wrappers over NNlib functions (with as much metadata as possible for higher level frameworks to use). If it turns out some op doesn't exist in NNlib, then we should add it there. Note that this and Functors.jl support would also allow frameworks like Flux to pick and choose layers to convert to more "idiomatic" representations, while still maintaining a broad level of support. ONNX.jl and |
While I think this is a fully valid and perhaps somewhat ideal ONNX solution, I do fear that it will be quite alot of code to write and maintain. It will certainly be more code per op than the layer definitions in Flux since it has more features per op (right?). More features also mean more maintenance and looking at Flux it does not look like the layer definitions in there are low maintenance. Some more minor comments and clarifications on CompGraph
NaiveNASflux adds Functors.jl support to CompGraph for cpu<->gpu movement and datatype change and (barring any misconceptions I might have) it is unfortunately non-trivial and I was forced to rely on mutating the model instead of creating a new copy. The crux is that a vertex also contains its input vertices, and the same vertex might be input to several vertices. When recursing with NaiveNASlib has a Perhaps this is a reason to consider another design, e.g. one where the structure of the graph is inside the CompGraph and not inside the vertices. I went back and forth a few times on this in the prototyping stage and settled for the current solution because it makes it possible to to the size rebalancing stuff from the perspective of a single vertex.
Yeah, this would be very easy to do as the compute only parts of CompGraph are in files/structs which don't have any dependencies to JuMP or the re-balancing stuff. The LightGraphs stuff is obsolete and can just be deleted and same goes for the logging stuff.
Fully agree. Fwiw, I have not been able to spot a significant difference between CompGraph and Chain in benchmarking. As you implied, whatever ns -> us happens due to type instability seems to disappear in the noise when the whole function is in the millisecond range. I suppose that if someone was to use e.g. NaiveGAflux for the same purpose as SymbolicRegression.jl they might find the type instability being the bottleneck but I don't lose any sleep over that. |
I would distinguish between the amount of code and the overhead of maintenance. One reason the per-layer maintenance burden is higher for Flux is because layers need to be ergonomic, robust and somewhat general. An under the hood representation needs none of those things as long as it's not exposed to users (which it shouldn't be). As for the repeated vertex issue, I'd totally forgotten about it!. This is something we've been trying to work out recently because it affects all weight tying. See FluxML/Flux.jl#1592, FluxML/Flux.jl#1504, FluxML/Zygote.jl#991 and the last couple of days of discussion on Slack's #autodiff. I guess tracing would be a more optimal approach until we resolve weight tying, but that brings with it the issue of playing nice with source-to-source ADs like Zygote. If that's not a concern though, I say we should go for it. Given the awkwardness of sending big paragraphs back and forth here, might I suggest @dfdx and @DrChainsaw join us this coming Tuesday for the ML Community Call? The link is at https://calendar.google.com/calendar/event?eid=X2I5N2t1ajlwNnNxMzBlOXA2b3FqMmQxbl8yMDIxMDYyMlQxNjAwMDBaIGp1bGlhbGFuZy5vcmdfa29tYXVhcWV0MTRlb2c5b2l2M3A2bzdwbWdAZw and you can copy a calendar invite into your own calendar via the links on https://julialang.org/community/. |
I'm not sure this is the same issue. I think the core of the problem applying functors to
Unfortunately this is not an option for me - my open-source-working-hours are usually between 0am and 2am local time (I'm at UTC+3), and my family wouldn't really enjoy me talking aloud that late 😃 However I'll be happy to see the recording (if any). |
AIUI the main concern is with mutable structs shifting identity after fmap in a way that invalidates external references? The case of a node having multiple parents (i.e DAG vs tree structure) should be handled by
No worries! I think it would be nice to have a touch base some time this summer, but that'll likely require some more scheduling. |
I'm unsure as to whether I can join as it collides with dinner over here. I suppose one good agenda item could be whether shoehorn into existing frameworks or make it fully ONNX compliant is preferable. I guess one might want to leave both doors open which I guess is doable if we proceed with the ecosystem route.
You're right! I should probably just give it a try now and see if it works. Perhaps I could just replace the whole copy meachnism in NaiveNASlib with Functors.jl too. FWIW, fmap works in NaiveNASflux now too (would be pretty useless if it didn't), it is just that it has the surprising side effect that it mutates the input. |
In a quickly evolving discussion I forgot to answer @DrChainsaw 's questions!
A kind of, with the exception that Ghost.jl is maintained :) At some point I tried to migrate to
It really depends on what your domain is. If you have a simple ONNX definition and want to construct Obviously, I'm more familiar with |
Yup, I doubt Mjolnir will ever go beyond the experiment phase or see the light of day again (we should probably archive it). The work there is now being picked up and more robustly developed as part of https://github.com/JuliaCompilerPlugins and the Symbolics.jl ecosystem. I don't have a single authoritative link, but #compiler-plugins on slack has a lot of great discussion. Unlike last time, there is both upstream/core support and many downstream users (e.g. JET) involved, so it's very much worth a look to see what infrastructure can be shared. Edit: I should add that this new effort is able to learn from the experience of IRTools/Cassette/Mjolnir development and has a much better bus factor! |
Potentially speaking as "the crowd" 😉, I want to echo @ToucheSir's comments and reassure that this is not quite the case. I think eventually it would be nice to have an ONNX2Flux.jl package that can translate an ONNX model to a Flux model when possible. For the simple reason that it will be in the representation that Flux.jl users expect, and it will work most easily with other packages that operate on the Flux models themselves. But I do not think that our first step should be reading ONNX models into Flux layers. I think the approach outlined here is the right one. ONNX.jl should not rely on Flux.jl in any way. We want complete support for arbitrary ONNX structures, which means using I am happy to help review/write code to push this effort forward. In particular, |
Alright, I suppose the 'impedance mismatch' problem makes the separate framework the least risky way if the goal is to eventually support the whole spec although it does seem like a pretty gargantuan task to me. Between Do we want the operators in this repo? I suppose they might be dead weight for e.g. ONNX2Flux.jl unless one tries to first load the model in the generic framework and then translates what can be translated from that to Flux layers. Just some things which might be good to think about before starting: Alot of the impedance mismatch comes from ONNX being built with row major in mind (and even when its not, models from the python frameworks will most likely be row major). Both old ONNX and ONNXNaiveNASflux (ex. ONNXmutable) tries to import as column major and the latter exports as row major. I suppose alot of risk can be mitigated if one does not attempt this. The maintaining the versioning of the opsets might become quite burdensome if one does not think about it early (i.e before implementing a bunch of OPs). I tried asking the question in the onnx forum but did not get any replies, but this post is roughly the same and has some discussion in it: onnx/onnx#2872 |
I imagine 2-step import for most frameworks. Assuming
We can implement (1) in this repo and delegate (2) to the framework-specific repo. Transformations like
If we don't permute dimensions, would it still be possible to use usual operations like |
Having recently worked with |
Here's a draft of the loader based on
A few takeaways from this exercise:
|
Looks like a good starting point to me. Perhaps the tape context can store a backend selection (defaulting to |
I've added a few more operations and created the PR to easier track the changes. So far adding/migrating operations is easy as long as there's a counterpart in NNlib. Unfortunately it's not always the case, a few operations that I hit are:
Adding these ops to NNlib was discussed here and there, but as far as I know nothing has been actually done. I'm ready to handle it, but since it's quite a huge piece of work it would be great to get an agreement about the approach itself first. In case we go with Ghost and NNlib, I imagine this project to have the following stages:
|
Great progress! I would add testing somewhere between step 0 and 2 as this becomes quite an uphill battle once more ops are added. There is some functionality in ONNXNaiveNASflux you could borrow for this. One approach is do download the testdata from the onnx-repo as artifacts and run the same testcases they do. I found that alot of bugs still slip through, not seldom because the test arrays are all ones and similar. The other approach which feels more safe (but less unit-y) is testing against onnxruntime, but I suppose this might be a bit cumbersome without export functionality. Is there a chance that the Tape + NNlib format is more difficult to work with compared to the ONNX format? Or can one add metadata in there to facilitate? One thing which I strived for in ONNXNaiveNASflux is to do alot of the My 2 cents on two of the points above:
One risk/consequence is that one needs to consider the rerrangement of dimensions when implementing all ops (e.g.
I think that approach makes sense. One could perhaps just default to Any thoughts on |
I see 2 ways to implement replacement of an operation:
If there are two packages A and B using ONNX.jl, and B decides to replace one of the operations in the global dict, A will be affected too. On the other hand, using separate methods we can do something like this: load_node!(tape::Tape{ONNXCtx}, nd::NodeProto, ::Val{MyOp}) = ... # default implementation
load_node!(tape::Tape{ONNXCtx{PkgBBackend}}, nd::NodeProto, ::Val{MyOp}) = ... # implementation specific to the package B Does it make sense to you?
All on all,
Things like
I'm not so sure about testing, considering it even after step 4 is implemented. I see a lot of risk in steps 1-4 since we may hit obstacles that will affect the whole design. Maybe it will be easier to get an MVP for all four steps and then iteratively add more operations and cover them with tests. Maybe it will be too large piece of work and we'll better make the result of step 1 really robust before moving further. The next step for me is to add functional forms of |
Functional forms of dropout and alpha dropout can probably be adapted easily from FluxML/Flux.jl#1618. Norm layers are a bit trickier, but most of the pieces are in FluxML/Flux.jl#1509 and bringing them into NNlib would be great for a number of reasons. |
This is why I think handling multiple libraries now makes sense. These ops are in Flux but not NNlib (though functional forms of normalization layers should make their way to NNlib). I would suggest the something like having a load_node!(tape::Tape{ONNXCtx}, nd::NodeProto, backend, op) = nothing
load_node!(tape::Tape{ONNXCtx}, nd::NodeProto, ::Val{:NNlib}, ::Val{:Conv}) = ... The main loading function will loop over
My experience with a very similar project where I was translating computational graphs written in Julia to a more restricted subset of ops was that a global dict was more cumbersome to work with. We'd need to provide safe accessors to the dict to "register" ops, and the dict entries tend to grow with tuples of metadata. I found writing a dispatch rule to be a far more straightforward API that should be familiar to anyone writing Julia. It also keeps the door open to eventually have an ONNXCore that folks can import to write their own op translation rules for their library. |
Oh, sorry, I misinterpreted your idea and thought of backends as a way to replace particular operations, not to extend the list of supported ones. But now I'm confused - if we add Flux as a backend, ONNX.jl will become quite a heavy package and hardly suitable to be included in other deep learning libraries (e.g. it would be weird to have Flux as a dependency for Knet). In fact, I assumed the opposite direction - to move common stuff like dropout or batch normalization to NNlib so that they could be re-used by a wider set of high-level libraries. |
IIUC, this declaration: load_node!(tape::Tape{ONNXCtx}, nd::NodeProto, ::Val{:NNlib}, ::Val{:Conv}) = ... Would be in Flux or a separate package, not ONNX.jl. This repo shouldn't ever have to depend on any higher-level framework. That's another argument for a dispatch-based approach: all other libraries need to do to hook into the system is define their own overloads. NNlib should still be the common denominator, but there are scenarios where it would be nicer to extract full layer structs as well. |
Yup, exactly. It will also help iterate on this package without waiting for Flux to move stuff to NNlib (which might be slower). Eventually, we'll drop the Flux dependency. |
If eventually all operations implemented in ONNX.jl will be based on NNlib, do we actually need multiple backends? If the main goal at the moment is to have at least some implementation for all operations, we can do use Flux functions directly, e.g.: function load_node!(tape::Tape{ONNXCtx}, nd::NodeProto, ::Val{:Dropout})
...
val = Flux.dropout(...)
end and when these functions are moved to NNlib, simply replace them: function load_node!(tape::Tape{ONNXCtx}, nd::NodeProto, ::Val{:Dropout})
...
val = NNlib.dropout(...)
end Why do we need the Regarding operation overloading in specific packages, I thought about extending load_node!(tape::Tape{ONNXCtx}, nd::NodeProto, ::Val{MyOp}) = ... # default implementation, i.e. NNlib-based
load_node!(tape::Tape{ONNXCtx{:Flux}}, nd::NodeProto, ::Val{MyOp}) = ... # Flux implementation This change will be non-breaking, so we can easily postpone it. (Not that I'm against passing the backend as an additional parameter, I just don't fully understand why we need it). |
I guess we are suffering a bit from branching here. Posting a few hip-shot answers to some things which seem hip-shotable to me.
I suppose backends and translating a As for getting stuff into some other framework, I think this is useful due to auxullary tools which may be specialized towards one certain framework (e.g. NaiveNASflux). About val-dispatch: About testing: After one can import e.g. VGG, chances are that one will need to manually check the output from each operation in the graph, and this effort becomes kinda wasted if it is not put into automated tests. Anyways, I'm not a TDD zealot and I don't want try to force people to work in a certain way. Your comment read as if tests might not even be needed in the end and that is where my main complaint lies. |
@dfdx there are two overlapping goals with the ONNX effort. The first is to allow imported and exported ONNX models to run period. That shouldn't require anything beyond NNlib and maybe Functors. The second is to open up ONNX as a storage format for higher-level libraries like Flux. This would require mostly lossless round-tripping for layers, which is not a goal of the NNlib backend. Now, it may come to pass that this approach is too finicky to be useful, so I agree with your idea to structure the dispatch such that we can postpone it. @DrChainsaw I believe the point about testing was not that they shouldn't exist at all, but that trying to do too much too early. Certainly testing outputs against some reference requires enough operations to be implemented to make it worth our while, otherwise they're just tests against NNlib by proxy. Testing whether a series of ops lowers to a specific tape representation seems much easier, and I imagine that is in the works here. |
This is correct, and I see two possible options here. Direct Translation ApproachRight now, we seem to be going down the first one, which is the translate each node into a function in NNlib. The main issue I see here is NNlib must be able to express the ONNX graph. For certain ops, like conv or normalization, that will probably be okay, even though it will take some time for normalization to make it in. I don't see In this option, the API is to define direct translations via: load_node!(tape::Tape{ONNXCtx}, nd::NodeProto, ::Val{MyOp}) = ... Transformation ApproachThe other option is for ONNX.jl to translate nodes into a tape of ONNX-specific structs. For example, In this option, the API is to pass in a transform function. A package defines something like my_transform!(tape::Tape, call::ONNX.Conv) = ... and ONNX.jl uses transform_tape!(ftransform!, tape::Tape) = ... You can see here and here where I used a similar design approach with Ghost. The backends idea was an attempt at straddling between the two approaches. It would allow us to directly translate to NNlib function calls while still using other packages when NNlib falls short. And it would allow users to pass in a vector of backends in their preferred order when they want to. And it allows NNlib, Flux, and Knet translations to all be defined simultaneously. |
Sorry I made you think so! I definitely don't propose to release an untested code, I just want to have a little bit more confidence in our approach before investing a lot of time into testing, docs, etc.
Could you please elaborate on this? Adding functional form of these operations to NNlib was exactly what I was targeting. Maybe it won't bring much value to Flux or Knet which already have their own versions, but it will certainly help other (existing and potential) frameworks as well as improve unification across the infrastructure. Regarding the transformation approach, as far as I understand the API you described requires one-to-one correspondence between
This is a bit harder than replacing a single operation, but in the previous incarnation of Sigmoid(Conv(x)) => Conv(x, sigmoid) I "lost" this function due to refactoring, but earlier or later I'm going to add it again. Using it, it should be possible not only transform act(conv(x, w) .+ b) => Conv(w, b; activation=act)(x) |
The initial (and current) focus of NNlib has been defining fast kernels for operations that need them (GEMM, conv, and pooling). The purpose is to have a common API to C libraries, etc. We could expand the scope to provide a unified functional API. As it stands, since Flux won't benefit much, I wouldn't hold my breath that this will happen quickly (it would mainly be work towards ONNX). That's okay. I'm certainly not saying that a unified API package would be a bad thing. The main concern I have with the NNlib approach is that it cements NNlib functions as the definitions for usable ONNX ops. This poses two potential issues:
So, my current thoughts are to have ONNX ops like I had not considered the M to N transform case. Currently, 1 to N is easily doable. If adding the |
If we did continue with the current approach though, I would still have |
My point is that 1 to N during import means N to 1 during export :)
What concerns me in this approach is that graphs built from a definition-only ops won't be executable by themselves. This means one would not be able to run ONNX models until e.g. ONNXFlux or ONNXRuntime is implemented, but what's even more important, it will be much harder to test the code properly. How about using NNlib when possible & convenient and implementing additional ops directly in this repo? It's quite a dirty approach, but this way we should be able to move quickly, get at least something working and then see what actually works or doesn't work for us.
I think I still don't fully understand your idea here. You say that it will be useful to have multiple implementations of # Conv implemented by both - NNlib and Flux backends
load_node!(tape::Tape, nd::NodeProto, ::Val{:NNlib}, ::Val{:Conv}) = ...
load_node!(tape::Tape, nd::NodeProto, ::Val{:Flux}, ::Val{:Conv}) = ...
# Sin implemented only by NNlib
load_node!(tape::Tape, nd::NodeProto, ::Val{:NNlib}, ::Val{:Sin}) = ...
# Softmax implemented only by Flux
load_node!(tape::Tape, nd::NodeProto, ::Val{:Flux}, ::Val{:Conv}) = ...
# user chooses the backend
ONNX.load(filename, backends=[:Flux]) # will only use Flux backend, won't be able to load Sin |
Ah, got it, good point!
I agree, and what I'm suggesting, if we go the route where the API is "transformations," is to define all ops in this repo for consistency. The ops that map 1-1 with NNlib functions can just forward there, and other ops can be implemented in the repo as needed. So, (layer::ONNX.Conv)(x) = NNlib.conv(...)
(layer::ONNX.Flatten)(x) = reshape(...)
(layer::ONNX.Sine)(x) = sin(...) Alternatively, if we don't use the transformation API, then these suggestions are not as relevant.
The usefulness would be in the flexibility of ONNX.jl's design for rule writers. As @DrChainsaw pointed out, there is overlap between the backends approach and the transformations approach. We provide a # this is already defined in ONNX.jl
load_node!(tape::Tape, nd::NodeProto, ::Vale{:Base}, ::Val{:Sin}) = ...
# ONNX2Flux.jl overrides Conv
load_node!(tape::Tape, nd::NodeProto, ::Vale{:Flux}, ::Val{:Conv}) = ... And the wrapping Of course, if the none of the |
@darsnack Got it, thanks! I've pushed the version with multiple backends to the PR.
I like this idea. The only question I have is whether ONNX.Ops should behave like functions or like layers (i.e. structs), e.g. should y = conv(w, x, b; pad=pad) or c = Conv(w, b; pad=pad)
y = c(x) ONNX itself is a purely functional with no internal fields so I'm leaning towards the first option, but I can try to use layers instead to make it more convenient for the high-level frameworks. |
That's a great questions that I don't know the best answer to 😅. It will probably be a balance between staying true to the ONNX nodes vs. re-packaging it to make transformations easier. I think your suggestion to start with functions because that's what ONNX does is a good idea. |
I can now load and execute ResNet-18. It doesn't mean all operations are implemented correctly, but at least the shapes match in this limited case, I'm mostly satisfied with the API and ready to move forward. Some details:
Given the current design I don't see risks in conversion to/from Flux.Chain anymore, but I start feeling the uphill of the missing tests. @DrChainsaw could you please elaborate on the options you mentioned above? In particular:
I understand the relevant part in ONNXNativeNASflux is here, and I see that you download test data from here, but I don't immediately get the rest of the logic. Is it correct that you create a single-node graph, evaluate it and compare to...?
I don't know much about onnxruntime, could you please clarify how we can use it for testing? |
Yeah, and this tends to get worse with the more difficult to grok dimension manipulating ops. For instance, with reshape, I just closed my eyes and prayed the testcases would pass when reversing the dimensions to reshape. There are much worse things than reshape in there though, this one looks pretty anxiety inducing at a glance. Hopefully most of them just follow the pattern of either reversing the index array or the indices themselves (i.e index
The onnx tests consists of 1) an onnx model/node, 2) input data and 3) expected output when 2) is fed into 1). So, 1) load the model/node, 2) load the input data and feed it into the loaded model/node and 3) load the expected output and verify that it is identical (within last decimal or so) to the output from step 2). Drawback as I said is that in many cases the input and maybe also the model parameters are just arrays with a single unique value (e.g. all ones) and this lets alot of errors slip through. For example, incorrect padding reshuffling goes undetected with the onnx test data iirc. Perhaps this has been corrected in later versions, so it might be worthwhile to just use the latest onnx version.
onnxruntime just happens to be what my 5 minute search found to be the most 'canonical' onnx inference library. The way it is used in ONNXNaiveNASflux is that an op/model (e.g. Flux.Conv or some function with several ops) is exported, then some data is fed though 1) the original op/model, 2) the exported op/model using onnxruntime (through PyCall) and 3) the op/model imported by ONNXNaiveNASflux and it is asserted that all three produce the same output (within last decimal or so). This is for instance how I caught the incorrect padding shuffling. Without any export functionality I'm not sure how to make use of this though. One way could be do to the onnx-testdata testing with better input data. |
I assume this means testing outputs for either individual layers or groups of layers against those derived from running the same graph on onnxruntime (via PyCall). This would certainly be useful for some integration tests, but might not work as well overall because of differences in how certain operations are calculated. For example, my understanding is that |
But then the onnx model is not representing the same thing which would be very bad. If that is the case then either onnxruntime or this package needs to change. Or perhaps you are talking about small numerical differences just from computing the same thing in a slightly different order? I'm not sure how "the usual cross-correlation" differs from "convolution". The result of convolution between two sequences can be called the cross-correlation between those two sequences (ok, there are a number of options for how to scale things), or is my signal processing too rusty? Btw, ONNXNaiveNASflux has alot of tests against onnxruntime and it has only caught real errors in ONNXNaiveNASflux so far. |
I'll check how hard it will be to implement the export side. |
@DrChainsaw good to know, I'll defer to your expertise on this :) AIUI the only difference for my specific example is that convolution flips the kernel while cross-correlation doesn't, but how everything washes out once you start talking about row vs column major I'm not sure.
It seems like this scheme could still be useful? We could always bootstrap it by writing Python scripts which generate the test cases. That would avoid the need for export code until that functionality is implemented. |
I've removed the WIP status of the PR in #50 . Here's what changed from the last update:
Adding more features in a single PR becomes increasingly harder, so if you guys don't have hard objections against the approach I'd like to merge it and add further changes in smaller chunks. This doesn't imply fixing the API of course, we still can change any details in more specific PRs. |
I left a couple quick comments, but this is probably deserving of review from someone with more experience handling ONNX. |
Meanwhile I'm trying to add exporter for Conv, but I don't quite understand how to handle dimensions, so maybe you guys can help me.
|
|
Whcn is for the colmajor behaviour. And yes, conv does a proper conv, not a cross correlation. |
I had a quick look into the motivation for flipweights as I forgot why I added it in the first place. It seems like all testcases in ONNXNaiveNASflux pass if flipping is removed. This is however not strange as flipping is done both when serializing and deserializing and the onnx testdata uses ones for weights so flipping or not makes no difference. I did however try loading vgg with testdata from the onnx model zoo and both the mxnet version (vgg16) and the caffe version require weight flipping for the testdata to give identical (within floating point precision) results. Another thing which confuses me is that I would have expected a permutedims to be needed somewhere to translate OIHW to WHIO, but apparently this is not needed. If you look at the generic translation from |
Testing symmetric changes turns to be pretty hard. In this commit I do add a julia> w = Float64[1 2 3; 4 5 6]
2×3 Matrix{Float64}:
1.0 2.0 3.0
4.0 5.0 6.0
julia> onnx2julia(w)
3×2 Matrix{Float64}:
1.0 4.0
2.0 5.0
3.0 6.0
julia> reshape(w, 3, 2)
3×2 Matrix{Float64}:
1.0 5.0
4.0 3.0
2.0 6.0 As you can see, instead of transposing the matrix, which would be the proper reversion of dimensions, julia> x = rand(1, 2)
1×2 Matrix{Float64}:
0.0829937 0.579257
# matmul in ONNX-native dimensions
julia> x * w
1×3 Matrix{Float64}:
2.40002 3.06227 3.72452
# matmul in Julia-native dimensions
# the shape is different, but the content is the same
julia> onnx2julia(w) * onnx2julia(x)
3×1 Matrix{Float64}:
2.400021221468818
3.0622718363075028
3.7245224511461874
# reshape-based matmul
# the content is different
julia> reshape(w, 3, 2) * reshape(x, 2, 1)
3×1 Matrix{Float64}:
2.979278090345529
2.0697455904780284
3.641528705184214 Meanwhile, I've prepared the next chunk of changes, check out the diff with the current PR in dfdx#1 (I'll retarget the PR to ONNX.jl#master once #50 is merged). This new PR adds a set of converters between ONNX and Julia and tests them on Conv operator. |
Here's the actual PR for conversions + conv: #51 |
Opening an issue here to keep track of this. @dfdx @DrChainsaw Would be great if you could add details about what needs to be done to add complete support for Flux here.
For context, the first set of changes were done in #46.
The text was updated successfully, but these errors were encountered: