Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

withMeta of dataset changes the class of Dataset #328

Open
behrica opened this issue Oct 4, 2017 · 3 comments
Open

withMeta of dataset changes the class of Dataset #328

behrica opened this issue Oct 4, 2017 · 3 comments

Comments

@behrica
Copy link

behrica commented Oct 4, 2017

I maintain some code which relied on the fact that before core.marix dataset was a "defrecord",
and the withMeta call kept the type.

See here for usage in my code:

https://github.com/behrica/rojure/blob/56f6641ab8244236ee267daa63a09864be5aad99/src/rojure/convert.clj#L193

I would like to port the code to core.matrix 0.61,
but in here withMetadata returns a vector instead of a Dataset:

(with-meta (mp/convert-to-nested-vectors m) meta))

I am not really an export on the withData function,
but the clojure docu seems to suggest, that it should return an object of the same type.

Is there anything core.matrix could do here better ?

@mikera
Copy link
Owner

mikera commented Oct 7, 2017

Well mp/convert-to-nested-vectors will convert to a vector for sure! So I see how this is happening.

Perhaps the Dataset should revert to using defrecord instead of deftype, which I believe supports metadata?

@behrica
Copy link
Author

behrica commented Oct 9, 2017

I am now wondering if there should be a change in core.matrix or in rojure.
The code in rojure is using the meta information for mainly one use case:

It wants to keep the concept of storing "Strings" as factor variables as R does it.
(So storing a numeric number for each possible value of the "String" in a column)
So it stores this information in the metadata of the core.matrix.dataset.

I kind of thinking that it is better to change this and store string instead.

For me the rojure should be "compliant" with core.matrix.dataset data types.
And the current "dataset" implementation in latest core.matrix has no concept of "levels" / categorical variables.

I believe that rojure should therefore do an internal conversion from R-factors to plain strings.
@mikera any opinon on this ?

And this would mean, that I would not need to store the "meta" information in dataset anymore.

Nevertheless I believe it would be cleaner that dataset would not implement IObj and IMeta,
at all or do it correctly (not sure how)
I think that the current behavoiur is "wrong" or at least "unexpeced", as it changes the type of daaset in the "with-meta" call.

@mikera
Copy link
Owner

mikera commented Oct 10, 2017

I think that if you want to use an internal strategy that stores strings as numbers for example then it should be a hidden implementation detail.

I would use a deftype or defrecord that stores and index data required to perform such lookups, and extend the core.matrix protocols to this type as required (which would include doing any conversion etc.). I.e. you are effectively creating your own custom version of the default core.matrix Dataset.

It would be up to you if you wanted to stores the index data in metadata, though I would suggest using an actual field as it isn't reallt "metadata", it is part of your data structure.

Regarding categorical variables in general, note that core.matrix is generally pretty ambivalent to what kind of values you use. I would say it is up to higher level libraries to interpret variables as categorical, numerical, string etc. core.matrix itself can't really assume anything, so the protocols give the flexibility to the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants