-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider more lightweight DB system #268
Comments
Notes (just experimenting):
|
Related idea - host the latest version of the DB as a static+queryable source. |
Possibly a Pandas dataframe saved in parquet file? This is a compressed and queryable format csv_file = "..../extra/db/address_principals.csv"
df = pd.read_csv(csv_file)
df.to_parquet("rows.parquet") # 484MB columns = ['gnaf_pid', 'address', 'postcode', 'latitude', 'longitude']
filters = [
('locality_name', '==', 'BLI BLI'),
('state', '==', 'QLD'),
]
df2 = pd.read_parquet("rows.parquet", columns=columns, filters=filters)
print(df2.head())
print(len(df2)) emits
|
We can't use git-lfs on github, because it only provides 1GB of storage, and 1GB/mo of transfer. It's $5/mo for 50+50, but if GHA counts, that'd be used inside a day or two. |
Following on from the work in #255
Ideas for reducing the size of the DB. Perhaps it's time to consider:
The text was updated successfully, but these errors were encountered: