-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding open source EBSD datasets #411
Comments
Thanks for raising this issue @argerlt. As mentioned elsewhere, I agree that we should add this open datasets page to the docs. I wasn't aware of the openly available 3D dataset from Dream3D. These would be ideal to use in testing a potential 3D functionality down the line.
This is a valid point. I opened #412 to address this.
I assume you are aware of the two small test datasets orix is packaged with? If not see e.g. orix.data.sdss_ferrite_austenite(). We could add more datasets to the
I agree that easy access to such established open test datasets is important. Instead of converting these datasets to orix' HDF5 file format though, we should add readers for the format they are already stored in. I could for example easily import the first slice of the Dream3D nickel dataset >>> from orix import io, plot
>>> xmap = io.load("Slice_001.ang")
/home/hakon/kode/orix/orix/io/plugins/ang.py:268: UserWarning: Number of columns, 10, in the file is not equal to the expected number of columns, 14, for the
assumed vendor 'tsl'. Will therefore assume the following columns: euler1, euler2, euler3, x, y, unknown1, unknown2, phase_id, unknown3, unknown4, etc.
warnings.warn(
>>> xmap
Phase Orientations Name Space group Point group Proper point group Color
0 37989 (100.0%) Nickel None 432 432 tab:blue
Properties: unknown1, unknown2, unknown3, unknown4
Scan unit: nm
>>> xmap.scan_unit = "um"
>>> ipfkey = plot.IPFColorKeyTSL(xmap.phases[0].point_group)
>>> rgb_z = ipfkey.orientation2color(xmap.orientations)
>>> xmap.plot(rgb_z, overlay="unknown1", remove_padding=True) # IQ overlay And the first of the raw AF96 datasets (the raw FOV 1) >>> from orix import io, plot
>>> xmap = io.load("Field of view 1_EBSD data_Raw.ang")
/home/hakon/kode/orix/orix/io/plugins/ang.py:268: UserWarning: Number of columns, 10, in the file is not equal to the expected number of columns, 14, for the
assumed vendor 'tsl'. Will therefore assume the following columns: euler1, euler2, euler3, x, y, unknown1, unknown2, phase_id, unknown3, unknown4, etc.
warnings.warn(
>>> xmap
Phase Orientations Name Space group Point group Proper point group Color
0 29 (0.0%) Austenite None 432 432 tab:orange
1 14900 (0.7%) Ferrite None 432 432 tab:blue
2 2202639 (99.3%) None None None None tab:green
Properties: unknown1, unknown2, unknown3, unknown4
Scan unit: nm
>>> xmap.scan_unit = "um"
>>> ipfkey = plot.IPFColorKeyTSL(xmap.phases[1].point_group)
>>> rgb_z = ipfkey.orientation2color(xmap.rotations) # Hack, should be xmap["Ferrite"].orientations
>>> xmap.plot(rgb_z, overlay="unknown1", remove_padding=True) (orix does not read either of these correctly ("nm" instead of "um", and the phase IDs are incorrect for the AF96 dataset). I opened #413 to track these bugs.) The AF96 dataset is a nice test dataset because it is large (> 2 million points, 205 MB). We could use this to test how well our algorithms perform in terms of memory, CPU load and time. As for the Dream3D dataset, if we use this dataset in the docs, we can make it available via I anticipate that the orix HDF5 format will change in the future as more people use it and suggest improvements. I therefore do not want to upload files in this format to any permanent source, like Zenodo. Finally, regarding the MTEX datasets, I suggest we only link to these in the docs. Based on our discussion in #389, I think we should restrict the use of other GPL code to a minimum, ideally none, if we ever hope to make the license of orix or parts of orix more permissive (say BSD3). I think the best way forward is to work towards better interoperability between orix and MTEX (and other similar softwares, like Dream3D). |
Yes, functions like these were exactly what I was trying to mimic in #409 . Your point on "orix data is for examples only" is a valid one though, so I'm just thinking about how to best mimic this but as a 10 line code snipped in
This is actually close to what MTEX does now. it creates a cache (similar to pooch), downloads the original files if they aren't already in the cache, then converts them into an MTEX EBSD object. it then saves the EBSD object as a .mat file with a note about the version of MTEX used, and if the same MTEX version tried to use that data again, it just loads the .mat file instead.
Yup, I agree.
On this note, it's worth mentioning that Dream3D is produced by BlueQuartz, and the small_in100 datasets were collected by Mike Groeber (related paper here) while he was working there. Groeber is a co-author on everyone's favorite 2015 Rowenhorst et al paper, and Dream3D uses the exact same rotation representation conventions as Orix for all it's internal calculations(code here). Dream3D is also working on improving its python API, and is now installable via Conda, so cross-compatibility might be very realistic in the near future. It also has a pretty excellent EbsdLib for reading various EBSD formats, but it's in cpp and GUI-centric, so maybe not useful for the ORIX team. Alright, I think I have some ways of doing this that will make everyone happy. I will write an example "Open_Datasets.rst" file and post it here, and we can go from there. |
Yes, functions like these were exactly what I was trying to mimic in #409 . Your point on "orix data is for examples only" is a valid one though, so I'm just thinking about how to best mimic this but as a 10 line code snipped in
This is actually close to what MTEX does now. it creates a cache (similar to pooch), downloads the original files if they aren't already in the cache, then converts them into an MTEX EBSD object. it then saves the EBSD object as a .mat file with a note about the version of MTEX used, and if the same MTEX version tried to use that data again, it just loads the .mat file instead.
Yup, I agree.
On this note, it's worth mentioning that Dream3D is produced by BlueQuartz, and the small_in100 datasets were collected by Mike Groeber (related paper here) while he was working there. Groeber is a co-author on everyone's favorite 2015 Rowenhorst et al paper, and Dream3D uses the exact same rotation representation conventions as Orix for all it's internal calculations(code here). Dream3D is also working on improving its python API, and is now installable via Conda, so cross-compatibility might be very realistic in the near future. It also has a pretty excellent EbsdLib for reading various EBSD formats, but it's in cpp and GUI-centric, so maybe not useful for the ORIX team. Alright, I think I have some ways of doing this that will make everyone happy. I will write an example "Open_Datasets.rst" draftand post it here, and we can go from there. |
Below is a draft of how I would suggest an open_dataset.rst file be done. I have yet to make the Dream3D Zenodo file page as I am still trying to figure out the license information, but the AF96 example is complete, so people can try out the download function and give feedback. Also, not sure how best to add pictures to .rst files, but would be nice to add the ones above as previews. orix/doc/open_datasets.rst:
|
This is a continuation from #406, but with a slightly expanded scope.
I would like to add an open source datasets page to ORIX, similar to Kikuchipy or Pyxem. In particular, I'm thinking of three useful datasets:
the US Air Force Research Lab AF96 datasests, six 2100 by 1000 ebsd scans of a Martensitic steel which are often split up into a set of 90 overlapping 512 by 512 scans. Available through Globus, uses a CC-BY 4.0 license
the Dream3d IN100 dataset of serial sectioned 3d EBSD scans, 189x189x117 pixels in size, stored as 117 .ang files. Available through the BlueQuartz websit, has a BSD open source license
The MTEX ebsd files, used in all the MTEX examples. available through github, has GPL license
The EASY thing would be to just add an open_databases.rst page that looks something like this, but better, ideally with a few pictures (heavily copied from pyxem):
However, I think the problem here is new users want something they can learn with immediately, as opposed to learning ORIX's IO and having to fiddle with different import methods until they find the right way to import files into CrystalMap objects. Additionally, Globus is a massive pain to get downloads from (this is why i actually made the original PR, it was far too inconvenient to get the AF96 datasets for new users).
In this respect, as a new user, I loved how in MTEX I could just type `mtexdata ferrite' and I instantly had an MTEX ebsd object. Not an ang, or a .oim I had to then correctly import, but an actual pre-imported object.
To that end, for at least the first two examples, it think it would be useful to have ORIX .h5 versions as files hosted on Zenodo, then include a snippet of code in the download example that can download and then import those files. Bonus if it imports
_fetcher
from orix.data so that duplicats of files aren't downloaded, and can just be quickly loaded from the local cache.Thoughts? @hakonanes @pc494
The text was updated successfully, but these errors were encountered: