Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vdat_csv_format #7

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

Conversation

benjaminhlina
Copy link
Collaborator

Hi Mike,

I'm finally out of a big writing phase and will be doing more data stuff in the coming weeks. I've made a small function that removes the top rows of the csv vdat_to_csv() creates and then assigns the correct column names given the selected event field. You can also make it return the top rows giving you the event field data frame. Let me know what you think/if this is of interest. Should note that right now I have it hardcoded for the number of rows to select (not ideal, I know)...not sure if that will be appropriate given different .vrl files depending on the receiver (I did this based of .vrl files from VR2W).

I didn't add this functionality/not sure how light you want the package but could make this function use {data.table} to increase speed and could have it wrap around fread() or read.csv() ect. so that the user doesn't have to tell either of those functions to not read in the header. Might be something to consider if detection logs are big. I played with the function using a small csv.

If wanting to keep light we could shunt this to a supplementary package but not sure if that is really necessary...it really depends on your vision with the package.

Cheers,
Ben

@mhpob
Copy link
Owner

mhpob commented Jan 20, 2024

Thanks for taking a stab at a new idea!

I think this would be a good fit for a function that vdat_to_csv could wrap around via an argument as well as standing alone. It could keep vdat_to_csv snappy for those who just want to have vdat convert files while allowing something that "just works" for those who also want to immediately work in R. I'm pretty hesitant about drifting into the analysis realm where we need to make decisions for the user, but I don't think this does that.

  • From my initial read, it seems like event_field could be inferred from event_type. Does it need to be a named argument, or could we just tack on "_DESC" to the user's event_field input to move toward something that "just works"?
  • To get away from the hard row or column indices could it use some sort of which(grepl( call to find the splitting index of choice? Or, we could switch pre-defined indices based on the receiver type and VEMCO DATA LOG version?
  • If possible, it would be useful to write some tests against the different types of receivers and files to make sure everything is behavin as expected. You can run source('tests/testthat/setup-testfiles.R') to download these to your temporary directory, locations listed in the testfiles object that is created. I'm happy to develop some of these if it's asking too much.

@mhpob mhpob self-assigned this Jan 20, 2024
@mhpob mhpob added the enhancement New feature or request label Jan 20, 2024
@mhpob
Copy link
Owner

mhpob commented Jan 20, 2024

Oh, and with regard to data.table: I'm trying to use as few dependencies as possible. At some point in the future I may refactor to drop those we're using, but cli makes the messages so pretty!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants