idxInSync is incorrect #251

KyleMaas · 2022-12-12T02:23:21Z

idxInSync is supposed to be set when indexes are not up-to-date. And that's true when processing existing data. But when new data is injected after the indexes are marked as "live", idxInSync is never set again, which means you can never do a synchronous operation like publishing and then asking the indexes for results (which at least one of the broken tests tries to do) because you have no way of knowing when the indexes are done processing their new data. We really need some method of tracking new changes to the receiveLog and when they come in s.idxInSync.Add(1) until the message has gone through all of the indexes, at which point it is cleared with s.idxInSync.Done(). This would allow the existing WaitUntilIndexesAreSynced() function to actually work properly.

The text was updated successfully, but these errors were encountered:

decentral1se · 2022-12-13T22:25:03Z

@KyleMaas Nice.

I saw in https://dev.scuttlebutt.nz/#/golang/?id=database-conceptual-overview that there are two entrypoints for message appending and I wonder is that the right place to intervene 🤔 It's not clear if the message will always end up in an index (so potentially wasteful to Add(1)) and if idxInSync will be available at that level of abstraction though.

Feels like directly inside the serveIndexFrom would be best since then the messages are constrained by the margaret.Query and are defintiely due to end up in the index. Perhaps via the https://pkg.go.dev/github.com/ssbc/go-luigi#Pump call? I'm not quite sure how to intervene there but the implementation looks fairly simple https://github.com/ssbc/go-luigi/blob/324065b9a7c6/stream.go#L54 Perhaps this can be re-worked and the calls to s.idxInSync.Add(1) threaded in?

KyleMaas · 2022-12-13T23:27:03Z

@decentral1se

The best I could work out is to put a goroutine with a Luigi pump right about here (before indexes are attached) with the sole purpose of Add(1)ing idxInSync:

go-ssb/sbot/new.go

Line 336 in 83f9097

// get(msgRef) -> rxLog sequence index

And one here after the indexes are initialized to Done() idxInSync:

go-ssb/sbot/new.go

Line 444 in 83f9097

if s.Replicator == nil {

That way, each ingested message would be processed once before and once after all of the other message processing hooks for the indexes. Plus, it's scoped within sbot where idxInSync is, rather than being handled in a plugin or somewhere further downstream which might not have access to it.

Anyway, those were my thoughts. I'm just not familiar enough yet with Go and with the codebase to know how to reliably do a Luigi pump to just a single simple callback with a Margaret query which selects all messages to be able to implement it this way myself. I figure I could stumble through it if you can't, but it's probably quicker and easier for you to build that.

KyleMaas · 2022-12-13T23:47:20Z

@decentral1se

Expanding on my earlier comment, the disadvantage I see with doing it at the serveIndexFrom layer is that, with there being multiple indexes, there would be multiple Add(1) and Done() calls per message. I tend to think it would be less wasteful to either:

Run it on every message regardless of whether it will be indexed or not, but only do it at the start and end of the pipeline so it's only done once per message, or
Do it similarly to that, but limit it to a Margaret query which somehow encompasses the selection criteria for every other index.

But, to me, I'd think that the waste of processing in the now two Margaret queries (one at the beginning and one at the end of the pipeline) would vastly surpass the processing overhead of a single increment and decrement operation per message.

decentral1se · 2022-12-14T09:33:03Z

@KyleMaas I'm just kinda getting into making code changes here and have been mostly focusing on "easy to do" maintenance tasks otherwise, so I don't think I'm any better placed to do this than you! Feel free to dive in with a PR and we see what we can do.

Thanks for these thoughts, will aim to come back on this later today. I feel like we're getting closer to a solid fix plan 🤞

KyleMaas · 2022-12-15T14:04:37Z

@decentral1se

So...that approach isn't working at all. The Luigi pumps are running in parallel instead of in a pipeline, which means that since the increments and decrements are the fastest, they're generally both executing before the index process completes. Doing it that way would require that there was a definitive blocking first and last task to do the increment and decrement operations synchronously. Unfortunately, I'm not familiar enough with Margaret and Luigi to figure that out, so I think for the moment I'm stuck. You can take a look at what doesn't work over on #255

decentral1se · 2022-12-18T13:31:44Z

@KyleMaas Hmmmm, I see 🤔 We have indexStateMu for per-index state locking, is there a way to lock more generally above that? If you're using Matrix, I'm lurking #golang-ssb-general:autonomic.zone for more back/forth random ideas. No worries if not. Also, not sure which identity you're using on SSB 🙃

KyleMaas · 2022-12-19T00:57:04Z

@decentral1se

Hmm...don't know on that one. I'll have to take a closer look.

Not on Matrix. On SSB, my accounts tend to be transitory, but I do have a semi-permanent testing account:

@MI0aKQls1HIrZlyl4qSAW4ErLqb4xVUN+diSbG0j7Fc=.ed25519

I'm not particularly active on it most of the time however because I tend to wipe it out quite frequently as I'm working on stuff. However, if you'd like to do more off-topic communication, you're welcome to e-mail me.

I can't say I'm really part of the SSB "community", per se. I'm not a very social person, so the entire concept of social media is not particularly attractive for me, bordering on distasteful. What I love about SSB is more the capability to do disconnected asynchronous communication among real-world known peers, more along the lines of @dominictarr's sailboat use case. I had actually designed up a protocol very similar to SSB before discovering SSB, and it was so similar I figured I'd try and help rather than reinvent the wheel. My available time to work on open source stuff tends to come in waves, so I'm not particularly consistent, but I do like to help when and where I can because I love the concept.

decentral1se · 2022-12-19T08:22:21Z

I'll message on SSB then 😸

KyleMaas · 2022-12-20T21:33:14Z

@decentral1se

Okay, so it looks like indexStateMu would not be helpful in this case, since it doesn't trigger during the Luigi pumps but only during:

Construction of indexes
Initial message processing
The shutdown process

It couldn't really be patched into the Luigi pump, either, since those are run concurrently so you'd still have a race condition - message could come in and if it didn't process immediately the indexes would still show as in sync. We really need something to be able to trigger any time a message is added to the receive log and then trigger something else when all of the Luigi pumps are done processing it, but I'm not sure of a good way to do that with all of the concurrency.

One way I thought of that you might be able to do this would be to have each index record the last seen message's sequence number. Then you could convert idxInSync into a function which waits until all of the indexes catch up with the latest sequence number at that point. That way, it would be entirely non-blocking - the rest of the sbot could continue to operate, and you'd know that the indexes were at least in sync up to the point where the sync function was called. The problem is that there's no way to do this, either, because not all indexes receive all messages, so if the latest message in the receive log had a sequence number of 100 and it wasn't, for example, an about message, then you'd be waiting forever (or at least until you received another about message in one of the other threads) because it would be filtered out by the Margaret query and the about index would never catch up so would never reach sequence number 100.

decentral1se · 2022-12-28T19:10:39Z

Damnit gotta get a fix out for this 😅 Got some cycles free tomorrow, gonna give it another shot.

decentral1se · 2022-12-29T16:23:30Z

Some notes from my investigation... 👀

go-ssb/sbot/null_content.go

Lines 130 to 131 in 9fe9976

    
           // TODO: currently the locking of margaret/offset doesn't allow us to get previous messages while being in an index update 
        
           // this is realized as an luigi.Broadcast and the current bases of the index update mechanism

https://pkg.go.dev/github.com/ssbc/go-luigi#Observable "Observabe wraps an interface{} value and allows tracking changes to it "
https://pkg.go.dev/github.com/ssbc/go-luigi#Broadcast "Broadcast allows subscribing to changes"
https://dev.scuttlebutt.nz/#/golang/?id=database-conceptual-overview "Seq() luigi.Observable can be used to be notified when new messages are appended."
What is luigi? "pull-stream-like abstraction to pipe between rpc and database."
Explanation of what sinks/sources are https://github.com/pull-stream/pull-stream-examples/blob/master/pull.js

decentral1se · 2023-01-02T11:13:30Z

I'm gonna try raise cryptix for a lead on a fix for this one! It might turn into a call, lemme know @KyleMaas if you wanna join.

KyleMaas · 2023-01-02T11:44:58Z

@decentral1se

Depends on the time. My schedule this week is rather chaotic. But if you want to e-mail me the info (won't have access to an SSB client with my test account on it) I'd be happy to join if I can.

KyleMaas · 2023-01-02T11:47:06Z

It's definitely a very thorny problem for the way this is architected. Concurrency's all well and good until you combine it with anything sequential.

KyleMaas · 2023-01-02T12:23:07Z

@decentral1se

Maybe this could be of use? If you look through some of the tests, it looks like there might be a way to use this to test for an end-of-stream condition where the stream isn't actually terminated:

https://github.com/ssbc/go-luigi/blob/324065b9a7c66588b6a42896de8da1b8cff6668d/stream.go#L21

decentral1se · 2023-01-02T12:51:12Z

@KyleMaas what's your email? Can fire @ helo[at]autonomic[dot]zone if you don't want to share here. Thanks for links/thinking along!

decentral1se · 2023-01-02T13:01:57Z

@cryptix if you end up having a look, I think the questions on #293 are also important and related (not to add more work but yanno 😅)

KyleMaas · 2023-01-03T21:53:01Z

I think I may have found a way to pull this off in an elegant manner. Will let you know if what I think will work does.

KyleMaas · 2023-01-15T01:18:29Z

Well, my idea didn't work.

But it led me to another really simple way to do it, which has made it so that TestNames has successfully run over 500 times with zero failures. It involves some minor changes to go-luigi. Pull requests imminent.

KyleMaas · 2023-01-15T01:26:06Z

Hm...just thought of a more betterer way to do the same thing. I think I'm going to work on it more tomorrow and do pull requests then.

KyleMaas mentioned this issue Dec 12, 2022

Attempting to fix TestNames #255

Closed

decentral1se moved this to In Progress in 🚧 go-ssb maintenance 🚧 Dec 13, 2022

decentral1se added this to 🚧 go-ssb maintenance 🚧 Dec 13, 2022

decentral1se moved this from In Progress to TODO in 🚧 go-ssb maintenance 🚧 Dec 13, 2022

decentral1se moved this from TODO to In Progress in 🚧 go-ssb maintenance 🚧 Dec 13, 2022

decentral1se moved this from In Progress to TODO in 🚧 go-ssb maintenance 🚧 Dec 13, 2022

decentral1se moved this from TODO to In Progress in 🚧 go-ssb maintenance 🚧 Dec 13, 2022

decentral1se added testing bug Something isn't working and removed testing labels Dec 18, 2022

This was referenced Dec 20, 2022

TestNullFeed is flaky #268

Open

Indexing of nulled log entries #270

Open

This was referenced Jan 2, 2023

TestSignMessages is flaky #289

Closed

Complete TODO item in publish_test #290

Closed

KyleMaas mentioned this issue Jan 2, 2023

Possible race condition in the publish log causing a forked feed #292

Closed

KyleMaas mentioned this issue Jan 2, 2023

Publishing needs to be fully synchronous #293

Closed

This was referenced Jan 16, 2023

Add version of Pump() with status callbacks ssbc/go-luigi#4

Merged

Fix idxInSync, WaitUntilIndexesAreSynced, and TestNames #301

Merged

decentral1se closed this as completed in #301 Jan 19, 2023

github-project-automation bot moved this from In Progress to Done in 🚧 go-ssb maintenance 🚧 Jan 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

idxInSync is incorrect #251

idxInSync is incorrect #251

KyleMaas commented Dec 12, 2022

decentral1se commented Dec 13, 2022

KyleMaas commented Dec 13, 2022

KyleMaas commented Dec 13, 2022

decentral1se commented Dec 14, 2022

KyleMaas commented Dec 15, 2022

decentral1se commented Dec 18, 2022 •

edited

Loading

KyleMaas commented Dec 19, 2022

decentral1se commented Dec 19, 2022

KyleMaas commented Dec 20, 2022

decentral1se commented Dec 28, 2022

decentral1se commented Dec 29, 2022 •

edited

Loading

decentral1se commented Jan 2, 2023

KyleMaas commented Jan 2, 2023

KyleMaas commented Jan 2, 2023

KyleMaas commented Jan 2, 2023

decentral1se commented Jan 2, 2023

decentral1se commented Jan 2, 2023 •

edited

Loading

KyleMaas commented Jan 3, 2023

KyleMaas commented Jan 15, 2023

KyleMaas commented Jan 15, 2023

idxInSync is incorrect #251

idxInSync is incorrect #251

Comments

KyleMaas commented Dec 12, 2022

decentral1se commented Dec 13, 2022

KyleMaas commented Dec 13, 2022

KyleMaas commented Dec 13, 2022

decentral1se commented Dec 14, 2022

KyleMaas commented Dec 15, 2022

decentral1se commented Dec 18, 2022 • edited Loading

KyleMaas commented Dec 19, 2022

decentral1se commented Dec 19, 2022

KyleMaas commented Dec 20, 2022

decentral1se commented Dec 28, 2022

decentral1se commented Dec 29, 2022 • edited Loading

decentral1se commented Jan 2, 2023

KyleMaas commented Jan 2, 2023

KyleMaas commented Jan 2, 2023

KyleMaas commented Jan 2, 2023

decentral1se commented Jan 2, 2023

decentral1se commented Jan 2, 2023 • edited Loading

KyleMaas commented Jan 3, 2023

KyleMaas commented Jan 15, 2023

KyleMaas commented Jan 15, 2023

decentral1se commented Dec 18, 2022 •

edited

Loading

decentral1se commented Dec 29, 2022 •

edited

Loading

decentral1se commented Jan 2, 2023 •

edited

Loading