-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrating TickStore to ArcticDB #460
Comments
Thanks for raising @markeasec. As mentioned in the Arctic repository, we don't currently have a good way to do this that we can offer but your thoughts as to why we should are very reasonable. I've prioritised this - we'll update this ticket as to when we make progress but I can't offer a timeline right now so I wouldn't advise waiting for this functionality to be made available if you can avoid it. |
Thanks. Can you clarify if there is any danger of data in Arctic (not arcticDB) being read twice due to being 'at' the start/end of a window? Or is the left-hand side of a window always inclusive and the right-hand side always exclusive? If there's no danger of duplicating data with a read/write approach, i will probably just bite the bullet and spin up a huge box and do it that way. |
You will have to ask that on the https://github.com/man-group/arctic repo. Different teams maintain these two code bases. AFAIK, you can use this DateRange type to specify whether each end is open/close. |
Thanks for the pointer about DateRange, I will look into that. |
@markeasec Hey mate |
I'm raising this issue here at the suggestion of @mehertz since the arctic repo is not actively monitored / maintained.
Arctic Version
Arctic Store
Platform and version
RHEL 7
Description of problem and/or code sample that reproduces the issue
Hello,
I have a collection of a few TB of tick data in an arctic tickstore that I want to migrate to the new ArcticDB.
I believe the only publicly available way to do this is to read all the data out from tickstore and write it to ArcticDB, is this correct?
If so, I was wondering if there is a recommended approach for that. The only way I could think of was to read it in time chunks, say 1 hour at a time, and then write it to arcticDB. Is there a way to instead iterate over the underlying mongodb documents, read 1 at a time, and write the resulting dataframe to arcticDB? I looked through tickstore.py and couldn't see any methods that would support that but maybe I missed something or maybe one of the existing methods could be modified to accomplish this?
My reason for preferring a documents approach vs a time chunks approach would just be to:
A - have deterministic data sizes in the read/write process (no risk of running out of memory during the job)
B - seems cleaner to me, I worry about ticks at the very edge of the time window getting read twice, written twice and thus duplicated.
Thanks in advance for any help you can provide.
The text was updated successfully, but these errors were encountered: