-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filename options for split
(iss. #1365)
#1366
Filename options for split
(iss. #1365)
#1366
Conversation
* Don't use joiner string when prefix is empty. * Add option to specify joiner string. * Add option to not URL-escape file names.
@johnkerl, I will review the development and contribution documents to make sure I've met all the requirements for contributing to the project. If you want to point out anything I should focus on, please do. Early comments on my code are welcome, too. I like code reviews. I appreciate constructive criticism. FYI: This is my first time coding with Go. |
All good, and thanks for asking! It's pretty chill TBH. If anything's wrong or missing we can chat about it. Developer notes (such as they are) are here: https://github.com/johnkerl/miller/blob/main/README-dev.md But that's just an attempt to be helpful & describe some of the issues/prereqs. Over the years Miller has benefited from a huge number of people writing in with ideas & feature requests; also from many people contributing cloud-infra things like automatically building multi-platform binaries on a release, etc etc -- but for whatever reason, there hasn't been very much contribution on development work. So this is lightly trodden ground. :) My foremost ask is, tell me if https://github.com/johnkerl/miller/blob/main/README-dev.md is a genuinely helpful getting-started guide or not, & if anything is missing/unclear/etc.
I hope you like it! I do, quite a bit, & I found it pretty intuitive to use given my background. Let me know if you've any questions though; I'm always happy to chat about such things.
Awesome! https://github.com/johnkerl/miller/pull/1366/files looks good so far 😎 |
I thought I should update the docs wherever I could. It's difficult to be sure whether I found all the references. Besides this Do you have any thoughts about putting docs inside/beside the source code so there's one set of documentation? |
I **_thought_** it'd be cool to apply URL-escaping to the file name prefix as well, just in case it included spaces or other characters. I forgot that a common use for the prefix is to specify a directory path that will contain the file. When the slashes ("`/`") of the path are URL-escaped, they become "`%2F`" and the directories will not be created. So, I moved the prefix handling code to come after the URL-escaping.
Trying to make the `return` statement cleaner, I thought it'd be good to add the file name suffix immediately after the file name is URL-escaped. I'd forgotten that the suffix will not be added if the new `-e` option is used to skip URL-escaping. So, I put the suffix back where I had it.
Feedback on dev. documents…
|
Not strictly part of this issue, but as I was checking for docs that I should update as a result of my changes, I noticed this document showed how to split data using the `put` and `tee` combination, but not about the `split` verb.
When I ran `make dev`, generating `data-diving-examples.md` failed. The two `manpage.txt` files ended up empty, but `mlr.1` seems to be correct.
split
(iss. #1365)split
(iss. #1365)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just the one change requested -- thank you!!! :)
I think this is fine. If we see anything later we can always add it.
Yeah it's definitely separate ... not sure how to best to implement this ... 🤔 |
Thanks! I will fix.
✅
Thanks! Yes, I wrote that during the Go port which has been (wow!) a couple years ago. I was porting from C to Go for about a year (late 2020 and all of 2021 IIRC); and Miller 6 came out in January 2022. So, yes, time to acknowledge the status quo and update that doc. :) Thanks again for the great feedback!! |
Also @sloanlance if you rebase on current |
Actually @sloanlance I'll accept this since it's a single-letter mod, and then I'll follow up after with a separate commit. No need to hold this fine PR open for such a little thing. :) |
@johnkerl, thanks! I was willing to make the change you suggested, but I've been busy with other work today. Anyway, it was great working with you to make this change. I would be happy to work on other issues, time permitting. Have any issues you'd like to recommend? I had some other questions about the code that I thought we might address before the merge, but they can be discussed anytime, really. I was wondering: Rather than the option-handling conditionals, would it make sense to use the In a way, I felt a little uneasy about the |
@sloanlance happy to discuss task ideas! I actively avoided the |
Re task ideas there are a bunch labelled What I actually work on at any given time is a function of how complex the ask is & how much time I have -- I'm rather deeply invested in my current day job and I don't spend quite as much time on Miller as I used to. So you can see the backlog has grown a bit. :) |
I just remembered another question I had… Was it cool for me to replace the usage of a buffer with string arrays instead? Being new to Go, I didn't know whether there was a specific reason for using buffers. I went with an array because I just wanted to join the file name parts together and not throw in an extraneous one if the prefix is empty. As it turns out, I couldn't just join the prefix with the other parts and escape it all. The escaping would change slash characters (" (And as I write this, I now think of a case where the prefix is not empty, but the file could end up beginning with the joiner string anyway. 😆 Oh, well… I can open another PR later.) |
My understanding is buffers are more efficient at concatenating. But for this which is just a one-time setup of output-file names, it's not even a measurable difference. The really important thing (which I did measure a while back) is the things that get done on every single record. There's some info at the end of https://github.com/johnkerl/miller/blob/main/README-dev.md. |
@sloanlance here are a few ideas if you're interested -- happy to chat more if you like: #965 #1082 #1184 #1325 In particular, #1325 should be the quickest as there exist other verbs with |
split
(iss. #1365)split
(iss. #1365)
-j
option to specify joiner string.-e
option to not URL-escape file names.Resolves #1365.