Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrating TickStore to ArcticDB #460

Closed
markeasec opened this issue Jun 5, 2023 · 5 comments
Closed

Migrating TickStore to ArcticDB #460

markeasec opened this issue Jun 5, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@markeasec
Copy link

I'm raising this issue here at the suggestion of @mehertz since the arctic repo is not actively monitored / maintained.

Arctic Version

1.80.0

Arctic Store

TickStore

Platform and version

RHEL 7

Description of problem and/or code sample that reproduces the issue

Hello,
I have a collection of a few TB of tick data in an arctic tickstore that I want to migrate to the new ArcticDB.

I believe the only publicly available way to do this is to read all the data out from tickstore and write it to ArcticDB, is this correct?

If so, I was wondering if there is a recommended approach for that. The only way I could think of was to read it in time chunks, say 1 hour at a time, and then write it to arcticDB. Is there a way to instead iterate over the underlying mongodb documents, read 1 at a time, and write the resulting dataframe to arcticDB? I looked through tickstore.py and couldn't see any methods that would support that but maybe I missed something or maybe one of the existing methods could be modified to accomplish this?

My reason for preferring a documents approach vs a time chunks approach would just be to:
A - have deterministic data sizes in the read/write process (no risk of running out of memory during the job)
B - seems cleaner to me, I worry about ticks at the very edge of the time window getting read twice, written twice and thus duplicated.
Thanks in advance for any help you can provide.

@markeasec markeasec added the enhancement New feature or request label Jun 5, 2023
@mehertz
Copy link
Collaborator

mehertz commented Jun 5, 2023

Thanks for raising @markeasec. As mentioned in the Arctic repository, we don't currently have a good way to do this that we can offer but your thoughts as to why we should are very reasonable.

I've prioritised this - we'll update this ticket as to when we make progress but I can't offer a timeline right now so I wouldn't advise waiting for this functionality to be made available if you can avoid it.

@markeasec
Copy link
Author

Thanks. Can you clarify if there is any danger of data in Arctic (not arcticDB) being read twice due to being 'at' the start/end of a window? Or is the left-hand side of a window always inclusive and the right-hand side always exclusive? If there's no danger of duplicating data with a read/write approach, i will probably just bite the bullet and spin up a huge box and do it that way.

@qc00
Copy link
Contributor

qc00 commented Jun 7, 2023

You will have to ask that on the https://github.com/man-group/arctic repo. Different teams maintain these two code bases.

AFAIK, you can use this DateRange type to specify whether each end is open/close.

@markeasec
Copy link
Author

Thanks for the pointer about DateRange, I will look into that.
I had actually originally opened it there and was instructed to raise it here instead.

@DennyZen
Copy link

DennyZen commented May 3, 2024

@markeasec Hey mate
what about your TickStore migration? was fine?
thinking about migration also..
check it
man-group/arctic#1026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants