Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: improve sync reliability #67

Merged

Conversation

tyler-dot-earth
Copy link
Contributor

@tyler-dot-earth tyler-dot-earth commented Jul 4, 2024

related issues

tl;dr

  • substantially improve the ability to resync deleted items
  • unify resync logic
  • fix issues where the plugin thought there was nothing to sync
  • reduces hacks/workarounds, and generally improve the codebase

sync changes

Note

some name changes since i originally posted the notes below, see this comment

  • refreshBookExport has been renamed to syncBookHighlights
  • startSync has been folded into requestArchive
  • requestArchive has been renamed to queueExport
  • downloadArchive has been renamed to downloadExport

Primarily:

  1. Make refreshBookExport async
    • There is no reason for it not to be, though that's largely because
      it is now debounced. (keep reading)
  2. Remove debounce from refreshBookExport
    • I suspect this only ever existed because the 'delete' listener
      fires for each file that is deleted, which in turn was firing
      the refreshBookExport function a ton of times. (it isn't called
      in the 'delete' listener anymore - keep reading)
  3. Defer refreshBookExport to sync events
    • aka don't call this when deleting files
    • ensure refreshBookExport called when syncing - both on
      scheduled and on-demand
  4. Generally ensure all syncing happens via refreshBookExport
    • ensures that the various sync methods won't get tangled up very easily (eg writing to settings via various sync operations in tandem)

Additionally:

  • async/await most saveSettings - i suspect this was causing issues where settings would get clobbered. either way, couldn't see a reason not to do this in most cases.

All of these combined should ensure a much more reliably sync operation.

Extensive testing instructions found at bottom of this post ⤵

notes dump

clean plugin install happens

default data.json (aka settings):

{
  "token": "",
  "readwiseDir": "Readwise",
  "frequency": "0",
  "triggerOnLoad": true,
  "isSyncing": false,
  "lastSyncFailed": false,
  "lastSavedStatusID": 0,
  "currentSyncStatusID": 0,
  "refreshBooks": false,
  "booksToRefresh": [],
  "booksIDsMap": {},
  "reimportShowConfirmation": true
}

user clicks "connect"

  • getUserAuthToken() → saves to settings.token
  • web browser opens to https://readwise.io/export/obsidian/preferences
  • obsidian plugin UI refreshes to show all options now that user is authenticated

note that sync frequency is manual/0 at this point

user initiates initial/first sync — this can happen one of several ways:

  1. click "initiate sync"
    • refreshBookExport
  2. Sync your data now command
    • refreshBookExport
  3. toggle on "resync deleted files"
    • refreshBookExport
  4. reload workspace
    • onloadrefreshBookExport
  5. interval is triggered
    • configureSchedulerefreshBookExport

after the initial sync, settings.lastSavedStatusID and settings.booksIDsMap are populated

at this point, the user may also use the Delete and reimport this document command within a specific Readwise export — that also simply uses refreshBookExport

notes about refreshBookExport

  • all syncing goes through this. ==very important== because it ensures that the various sync methods won't get tangled up very easily (eg writing to settings via various sync operations in tandem)
  • refreshBookExport will always trigger a requestArchive (either directly or via startSync), regardless of if it was provided specific books to refresh, so new highlights should always appear

notes about "resync deleted files" (aka settings.refreshBooks):

  • if enabled OR disabled:
    • the entry will remain in booksIDsMap so we have a copy of its book ID
    • book IDs will NOT be added to booksToRefresh as they are deleted (see if enabled for rationale)
  • if enabled:
    • refreshBookExport will look for a difference in the filesystem vs what is in booksIDsMap — seemed like the only way to handle filesystem deletions (outside of Obsidian), unknown performance when there are many many readwise files. could be moved behind a toggle if it's particularly slow.

other notes:

  • auto query string seems to indicate that the sync wasn't triggered by a user (eg interval sync)
  • when settings.readwiseDir isn't there, the requestArchive will use the parentPageDeleted querystring (like /api/obsidian/init?parentPageDeleted=true) — there's no documentation on this, but i did test it and it does successfully sync the files in a clean install

install for testing

you can download and build the plugin yourself, or use BRAT to install a pre-built beta:

  1. disable the current Readwise plugin in your Obsidian
  2. install BRAT via Obsidian community plugins and enable the BRAT plugin
  3. within BRAT's plugin settings, click the "Add Beta plugin with frozen version" button
  4. fill the fields in with the beta info
    Repository: https://github.com/tyler-dot-earth/dev-obsidian-readwise
    Latest pre-built version: refactor-improve-sync-reliability-beta-5
    Should look a bit like this:
    Screenshot from 2024-07-06 10-48-27
  5. use the plugin like normal

how to test thoroughly

Warning

perform these in listed order

  • confirm clean plugin install syncs notes as expected (this isn't automatic; you must click "initiate sync" or reload the workspace after authenticating)

  • enable Resync deleted files

  • delete a synced file

  • use command palette to Readwise Official: Sync your data now

  • confirm deleted file is re-synced

  • delete a synced file

  • open plugin settings and click Initiate sync

  • confirm deleted file is re-synced

  • open a synced file

  • use command palette to Readwise Official: Delete and reimport this document

  • confirm file is deleted and is re-synced

  • delete synced file

  • use command palette to Reload app without saving (or re-open Obsidian)

  • confirm deleted file is re-synced

  • disable Resync deleted files

  • delete a synced file

  • use command palette to Readwise Official: Sync your data now

  • confirm file is NOT re-synced

  • delete a synced file

  • open plugin settings and click Initiate sync

  • confirm file is NOT re-synced

  • delete synced file

  • use command palette to Reload app without saving (or re-open Obsidian)

  • confirm deleted file isn't re-synced

  • open a synced file

  • use command palette to Readwise Official: Delete and reimport this document

  • confirm file is re-synced

  • delete synced file

  • enable Resync deleted files

  • confirm deleted file is re-synced

  • and finally, for good measure:

  • ensure adding new highlights continues to work as expected

  • ensure syncing on interval continues to work as expected

other changes

  • a few random, small bugfixes and refactors. i'm happy to split into their own PR, but felt like more friction than it was worth for both me and reviewer(s).

1. Make refreshBookExport async
    - There is no reason for it not to be, though that's largely because
      it is now debounced. (keep reading)
2. Remove debounce from refreshBookExport
    - I suspect this only ever existed because the 'delete' listener
      fires for *each* file that is deleted, which in turn was firing
      the refreshBookExport function a ton of times. (it isn't called
      in the 'delete' listener anymore - keep reading)
3. Defer refreshBookExport to sync events
    - (1) don't call this when deleting files
    - (2) ensure refreshBookExport called when syncing - both on
      scheduled and on-demand

All of these combined should ensure a much more reliably sync operation.
@TristanH
Copy link
Member

TristanH commented Jul 4, 2024

Remove debounce from refreshBookExport
I suspect this only ever existed because the 'delete' listener
fires for each file that is deleted, which in turn was firing
the refreshBookExport function a ton of times. (it isn't called
in the 'delete' listener anymore - keep reading)

I'd definitely like us to confirm this... @tadeoos do you recollect the original rationale?

@tyler-dot-earth
Copy link
Contributor Author

@TristanH

Remove debounce from refreshBookExport
I suspect this only ever existed because the 'delete' listener
fires for each file that is deleted, which in turn was firing
the refreshBookExport function a ton of times. (it isn't called
in the 'delete' listener anymore - keep reading)

I'd definitely like us to confirm this... @tadeoos do you recollect the original rationale?

Please do! I could only guess based on the code and behavior.

Given the debounce was 800ms, I can't imagine it was to compensate for some backend behavior (eg a slow sync process that would get confused by "outdated" refreshBookExport request)

The relevant changes call refreshBookExport significantly less as it is simply no longer called immediately upon deletion of every single file, instead deferring to the normal sync schedule or manual re-sync.

So beyond spamming the request, which this PR significantly reduces, i suppose the question to answer is: is there any point in keeping the debounce?

That said, I can also say:

  • this syncing behavior is working really well from my testing
  • requestSync was never similarly debounced (though perhaps that's a problem of its own)
  • rate limiting may be a more generalized solution to this problem

src/main.ts Outdated Show resolved Hide resolved
- Handles "Resync deleted files" case when deletin file
- Fixed/adjusted various uses of refreshBookExport, startSync, and requestArchive (which were a bit muddled before)
@tyler-dot-earth tyler-dot-earth marked this pull request as draft July 5, 2024 19:07
@tyler-dot-earth tyler-dot-earth marked this pull request as ready for review July 6, 2024 02:16
@tyler-dot-earth
Copy link
Contributor Author

@TristanH alright, i tested the heck out of this today — behavior seems solid, and I think I fixed a few more related bugs along the way.

added a ton of notes to the original post, and a big list of steps for testing.

enjoy!

@tyler-dot-earth
Copy link
Contributor Author

added an install for testing section to OP with instructions on testing via BRAT instead of having to manually clone & install.

@tyler-dot-earth

This comment was marked as resolved.

src/main.ts Outdated Show resolved Hide resolved
Copy link
Member

@TristanH TristanH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome work on this man. hopefully @tadeoos can have a thorough review next week

in the mean time, we've been testing it internally and so far so good!

also really appreciate you sharing it on the reddit and making the testing process easy 😍

@tyler-dot-earth
Copy link
Contributor Author

@TristanH i appreciate the support! and very happy to hear that you guys have been testing it internally without issue.

i pushed one final change removing the on(delete) listener per my previous comment that it now serves no purpose. with that, beta 4 is now tagged.

i'll keep a look out for review comments 😃

@homostellaris
Copy link

Anything I can do to help get this merged? Its making the Obsidian integration much less usable for me.

@TristanH
Copy link
Member

@homostellaris in general we are pretty game to merge this

last i heard (@tyler-dot-earth correct me if i'm wrong) we didn't actually think that the changes here would solve a lot of the syncing issues linked (which are usually caused by using Obsidian Sync)

but if we think they will substantially help, definitely game to merge them

@tyler-dot-earth
Copy link
Contributor Author

@TristanH While this PR won't resolve the "race condotion" of Readwise sync vs Obsidian Sync which is clobbering book IDs (#68), it does substantially improve the ability to resync deleted items, unify resync logic, fix issues where the plugin thought there was nothing to sync, reduces hacks/workarounds, and generally improve the codebase.

I could have made those improvements more clear in my OP — my bad.

I've been using it this whole time with no issue aside from the (preexisting) book IDs issue that is caused by the aforementioned race condition. I have ideas about how to improve/fix that (#68), though hesitant to start on them until these fixes/restructure is approved 🙂

@TristanH
Copy link
Member

@tyler-dot-earth great! @tadeoos should be able to do one final code review tomorrow, and then I think we'll be good to get it merged 👍

not your fault at all, i should have asked

Copy link
Member

@tadeoos tadeoos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dude... Really nice work!

Thanks so much for the PR and thanks a ton for your patience as I was focused on some high prio stuff (completely unrelated to Obsidian) and finally got time to review this just today. Also: thanks for making my old typescript code suck less :)

I want to move responsibly here as I haven't touched this in a couple of years so I don't remember all of the nitty-gritty details right away — I left some comments to make sure I understand your intentions and pointed out some places where you may have missed some UX corner cases...

But all in all, I think we're close to shipping this, hopefully early next week!

src/main.ts Outdated Show resolved Hide resolved
src/main.ts Outdated Show resolved Hide resolved
src/main.ts Show resolved Hide resolved
src/main.ts Show resolved Hide resolved
src/main.ts Show resolved Hide resolved
src/main.ts Show resolved Hide resolved
src/main.ts Show resolved Hide resolved
src/main.ts Outdated Show resolved Hide resolved
src/main.ts Show resolved Hide resolved
src/main.ts Outdated Show resolved Hide resolved
@tadeoos
Copy link
Member

tadeoos commented Aug 30, 2024

One more thing:
(from your notes):

notes about "resync deleted files" (aka settings.refreshBooks):
if disabled:
the entry will remain in booksIDsMap so we have a copy of its book ID
book IDs will NOT be added to booksToRefresh
if enabled:
book IDs will be added to booksToRefresh as they are deleted
the entry will remain in booksIDsMap
refreshBookExport will look for a difference in the filesystem vs what is in booksIDsMap — seemed like the only way to > handle filesystem deletions (outside of Obsidian), unknown performance when there are many many readwise files. could be moved behind a toggle if it's particularly slow.

I'm not sure I see this logic fully implemented in your current version... We now only add books to booksToRefresh in one place (downloadArchive) and there is no setting check there.

@tyler-dot-earth
Copy link
Contributor Author

I'm not sure I see this logic fully implemented in your current version... We now only add books to booksToRefresh in one place (downloadArchive) and there is no setting check there.

Ah, yes - I think I wrote that before I made the changes to the deletion logic which obsolesced the need for storing the deleted IDs in the booksToRefresh.

@tyler-dot-earth
Copy link
Contributor Author

I left some comments to make sure I understand your intentions and pointed out some places where you may have missed some UX corner cases...

Thank you for the review @tadeoos :)

I have given an initial look-through, good feedback - I should hopefully have some time this weekend to really dig in and make updates.

@tyler-dot-earth
Copy link
Contributor Author

@tadeoos Did a round of fixes and responses to your review. Also bumped the BRAT version to refactor-improve-sync-reliability-beta-5 for some dogfooding.

Copy link
Member

@tadeoos tadeoos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a ton for a quick iteration here, great changes!

@tadeoos tadeoos merged commit a208165 into readwiseio:master Sep 2, 2024
tyler-dot-earth added a commit to tyler-dot-earth/dev-obsidian-readwise that referenced this pull request Sep 12, 2024
tyler-dot-earth added a commit to tyler-dot-earth/dev-obsidian-readwise that referenced this pull request Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Resync Deleted files not working
4 participants