Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefer announced suburbs that have not been updated in 30 days #227

Merged
merged 2 commits into from
Aug 7, 2023

Conversation

lyricnz
Copy link
Contributor

@lyricnz lyricnz commented Aug 7, 2023

Per #226

No preference given to closer announced dates - I think processing will be fast enough already.

@LukePrior
Copy link
Owner

Looks good did you just want to remove the commented-out prints?

@lyricnz
Copy link
Contributor Author

lyricnz commented Aug 7, 2023

Done. Some day, this deserves a unit-test.

@LukePrior LukePrior merged commit 9376e78 into LukePrior:main Aug 7, 2023
4 checks passed
@LukePrior
Copy link
Owner

Cheers

@LukePrior
Copy link
Owner

Will need to keep an eye on it to see if 30 days works well.

@lyricnz lyricnz deleted the feature/prioritise-listed-suburbs branch August 7, 2023 23:11
@lyricnz
Copy link
Contributor Author

lyricnz commented Aug 7, 2023

I might PR a bit of logging that emits when it tries each of the preferences, so we can tell where it's up to...

@lyricnz
Copy link
Contributor Author

lyricnz commented Aug 7, 2023

Right now there are only 7 announced suburbs with processing dates older than 30 days:

2023-08-08 09:40:12,244 INFO MainThread Checking for unprocessed suburbs...
2023-08-08 09:40:12,244 INFO MainThread Checking for announced suburbs that haven't been updated in 30 days...
2023-08-08 09:40:12,246 INFO MainThread 1 2023-07-08 22:28:00.577343 selected SPRINGDALE HEIGHTS, NSW
2023-08-08 09:40:12,246 INFO MainThread 2 2023-07-08 22:35:47.799409 selected SPRINGFIELD, NSW
2023-08-08 09:40:12,246 INFO MainThread 3 2023-07-08 23:02:30.004437 selected STANFORD MERTHYR, NSW
2023-08-08 09:40:12,246 INFO MainThread 4 2023-07-09 01:59:33.474006 selected URALLA, NSW
2023-08-08 09:40:12,246 INFO MainThread 5 2023-07-09 03:51:16.861596 selected WILBERFORCE, NSW
2023-08-08 09:40:12,246 INFO MainThread 6 2023-07-09 04:08:57.732585 selected WINDERMERE PARK, NSW
2023-08-08 09:40:12,246 INFO MainThread 7 2023-07-09 05:06:07.695305 selected WOOLGOOLGA, NSW
2023-08-08 09:40:12,246 INFO MainThread Checking for all suburbs...

@lyricnz
Copy link
Contributor Author

lyricnz commented Aug 8, 2023

GHA hasn't got those 7 yet - difference in timezone (and the processed_date field doesn't have TZ info).

If the GHA keeps running, we should see a few 30-day suburbs popping up every day (about 90/day on average). The rest of the day will be processing the non-announced suburbs.

@lyricnz
Copy link
Contributor Author

lyricnz commented Aug 8, 2023

Turns out (duh) the number of announced-suburbs that will be processed is more lumpy than that.

Edit: See comment/date below

Given that we process ~400-500/day now, our full-cycle time will be about 30 days anyway, so I don't think this PR is actually needed (hahah).

@lyricnz
Copy link
Contributor Author

lyricnz commented Aug 8, 2023

def check_processing_rate():
    announced_tally = Counter()
    other_tally = Counter()
    for state, suburb_list in suburbs.read_all_suburbs().items():
        for suburb in suburb_list:
            tally = announced_tally if suburb.announced else other_tally
            tally[suburb.processed_date.date()] += 1

    data = (
        (day, announced_tally.get(day), other_tally.get(day))
        for day in sorted(announced_tally.keys() | other_tally.keys())
    )
    print(tabulate(data, headers=["date", "announced", "other"], tablefmt="github"))

emits

date announced other
2023-06-16 222
2023-06-17 174
2023-06-18 229
2023-06-19 252
2023-06-20 62
2023-06-24 189
2023-06-25 249
2023-06-26 182
2023-06-29 150
2023-06-30 5
2023-07-01 233
2023-07-02 226
2023-07-03 98
2023-07-04 171
2023-07-05 179
2023-07-06 176
2023-07-07 125
2023-07-08 172
2023-07-09 6 207
2023-07-10 9 192
2023-07-11 162 15
2023-07-12 100 240
2023-07-13 1 533
2023-07-14 516
2023-07-15 1 520
2023-07-16 517
2023-07-17 98 431
2023-07-18 1 624
2023-07-19 7 612
2023-07-20 59 547
2023-07-21 123 113
2023-07-22 84 70
2023-07-23 416
2023-07-24 468
2023-07-25 418
2023-07-26 442
2023-07-27 164
2023-07-28 66 475
2023-07-29 54 588
2023-07-30 643
2023-07-31 158 44
2023-08-01 267
2023-08-02 191
2023-08-03 210
2023-08-04 259
2023-08-05 257
2023-08-06 217 203
2023-08-07 376 243
2023-08-08 65

@lyricnz
Copy link
Contributor Author

lyricnz commented Aug 8, 2023

date announced other
TOTAL 2706 12400

@LukePrior
Copy link
Owner

Given that we process ~400-500/day now, our full-cycle time will be about 30 days anyway, so I don't think this PR is actually needed (hahah).

Well if it's that we can reliably recheck everything in 30 days the time period for announced could possibly be reduced to something lower or maybe checking something with the announced date.

@lyricnz
Copy link
Contributor Author

lyricnz commented Aug 8, 2023

I'm not sure I believe anything from NBNco about their announced date :)

I'll commit the code above into a PR in adhoc-tools, so we can run it again in a few weeks, see how it's going.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants