-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes to allow the use of cascading.avro in Cascalog #11
Comments
Thanks - I'll take a look and see if all/some of the mods you have made can be merged into the project. |
Great. Let me know if you have any questions. On Wed, May 9, 2012 at 9:52 AM, vmagotra <
|
Hi,
I think the other changes are good to be rolled in... |
Hi Mike - we just created a 2.0 branch, and merged most the cascading-avro code (Sven's fork/modifications) with some of the cascading.avro code. Once that gets merged into trunk, then I'll need to find time to look through your changes and figure out which ones to cherry-pick. E.g. I know cascading-avro had some support for Cascalog field renaming, but I haven't looked at how they implemented that. -- Ken |
Hi Mike, If you're still interested in this can you have a look at 2.0-develop and see if it will do what you need it to? If not, can you make a pull request on that branch? |
I'm definitely still interested in this and have been reliably using my fork for a little while now. I'm actually in the process of doing a wider upgrade in my own projects, and rolling up to cascading 2.0 in the process. This lead me back to this thread as i wanted to see where you guys were and if you had made any progress upgrading to 2.x. i now see the 2.0 develop branch and will check it out. i'll take a look and get back to you shortly. |
I'm going to need to migrate some of my changes onto this branch as there are things that i will need and don't think are supported in this (correct me if i'm wrong). I will need:
those three changes are important for my use case. i'd be happy to provide a patch. |
Hi Mike, All those changes sound great. I can do the output codec if you don't want to worry about it but the others are probably best submitted as a patch. I think 2.0-develop will become master any day now and my guess is we'll have a separate develop branch where we can add new things. One thing I just added which might be interesting for you is the ability to get the unpacked Avro record (similar to how SequenceFile support works) and also pass a packed Avro record to write out. I'm adding this to make it easier to use with the Scalding typed API but it might be useful for Cascalog too. Regards, |
Thanks Chris. I'll take a look at the change you mentioned. Another thing I remembered adding/needing is the ability to set fields as nullable. I hacked it so most/all my fields were nullable, and can hack it for Cascalog based on naming conventions. But it probably be best to come up with a more direct approach to specifying nullable-fields in the Avro output/schema. |
Hi Mike, On Oct 26, 2012, at 9:22pm, Mike Stanley wrote:
Are you suggesting an option to automagically add that to all fields? And would this be for reading, writing, or both? Thanks, -- Ken Ken Krugler |
Hi all, I'm going to look at merging 2.0-dev into master this weekend, with whatever is in that branch. Then I'll do a 2.1.0 release to Conjars. After that we can add in Mike's changes (hopefully as a pull request). I guess those would be a 2.2 release, since it's new functionality vs. just bug fixes. Makes sense? Thanks, -- Ken On Oct 26, 2012, at 9:17pm, Chris Severs wrote:
Ken Krugler |
Sounds good to me. Chris |
On my fork, I simply made all fields nullable, but wouldn't recommend that In cascalog nullable fields are named !field instead of ?field. It be nice I will let you know. ... Mike On Oct 27, 2012, at 2:49 PM, Ken Krugler [email protected] wrote: Hi Mike, On Oct 26, 2012, at 9:22pm, Mike Stanley wrote:
Are you suggesting an option to automagically add that to all fields? And would this be for reading, writing, or both? Thanks, -- Ken Ken Krugler — |
Sounds good to me too. I will come back around with patches, once I have a ... Mike On Oct 27, 2012, at 2:52 PM, Ken Krugler [email protected] wrote: Hi all, I'm going to look at merging 2.0-dev into master this weekend, with Then I'll do a 2.1.0 release to Conjars. After that we can add in Mike's changes (hopefully as a pull request). I Makes sense? Thanks, -- Ken On Oct 26, 2012, at 9:17pm, Chris Severs wrote:
Ken Krugler — |
Hi all, a. I tagged master in GitHub as 1.0 b. I merged in the 2.1-develop branch c. I set the version to be 2.1.0 in both the scheme and maven-plugin sub-project pom.xml files d. I added a section to both pom.xml files: conjars Concurrent Conjars repository http://conjars.org/repoIf you then add an appropriate section to your ~/.m2/settings.xml file, you too can deploy to Conjars: conjars a registered username the passworde. I was able to deploy the scheme without any issues. One oddity, though, is that since we're using cascading.avro as the groupId, this means it shows up in Conjars at http://conjars.org/repo/cascading/avro/ So it's in the Cascading namespace (for the Maven repo). I assume that's OK with Chris Wensel/Concurrent, but I should double-check. f. I had an issue with deploying the maven-plugin "mvn deploy" kind of worked here - it uploaded the jar/pom and associated files, but I got this error: [ERROR] Failed to execute goal org.apache.maven.plugins:maven-deploy-plugin:2.5:deploy (default-deploy) on project avro-maven-plugin: Failed to deploy metadata: Could not transfer metadata org.apache.maven.artifact.repository.metadata.MetadataBridge@6c5bdfae from/to conjars (http://conjars.org/repo): Failed to transfer file: http://conjars.org/repo/cascading/avro/maven-metadata.xml. Return code is: 401 -> [Help 1] I'm not sure why it's trying to write out a maven-metadata.xml file at the root of the cascading.avro package - probably something in the pom.xml would tell me, but I'm out of time today. And I'm also not sure why this was rejected, but I assume it's a config setting for Conjars, where you can only create directories and then write files out to specific release dirs. g. I tagged this version of the code as 2.1.0 h. I edited the pom.xml versions to be 2.2-SNAPSHOT, and pushed. So we should be ready for further development. Take a look, and if it seems good then we can post something to the mailing list. Thanks! -- Ken On Oct 27, 2012, at 7:45pm, Mike Stanley wrote:
http://about.me/kkrugler |
This is from https://github.com/bixolabs/cascading.avro/pull/7 I'm hoping Mike Stanley can get back in sync with his modification. |
Is there any hope of getting these changes in? Right now it seems that this is not usable at all from Cascalog. |
I will take a look this week. No guarantees. It's been a long time since I needed anything further from this particular code and its literally just been running on autopilot. I'm probably years off the latest stuff. That said, I'm guessing the changes are still pretty relevant. I will happily look to see if I can bring it forward as a pull request. |
Thanks, Mike. I took a look at it today, but couldn't figure out where the change was needed myself. |
Could someone at least point me to where in the code the changes to support Cascalog field names would need to be made? |
Hi Dave. From what I can tell, the bulk of @mikestanley 's changes are at mikestanley@330d1f0. There is a (small) change as well at mikestanley@47bc6c7. |
I made a number of changes (most notably was the overloaded constructor that added support for providing avro field names which may differ from the tuple field names). Cascalog uses prefixes in the tuples like ? and ! which are not allowed as avro fields. For example, someone can name the tuple "?name" and the avro field "name".
The README has details about other changes.
In short, this pull request provides:
I really just tried to stick with the coding style as much as possible, but feel this whole thing can be cleaned up a bit.
Pull as you please.
The text was updated successfully, but these errors were encountered: