When first and last cycle are out of date, should we try refreshing them and retrying before we throw? #913

nicktindall · 2021-09-29T04:12:55Z

In StoreTailer.toEnd, we get the lastCycle which is only checked against the file listing every 60s. In the event we've got lastCycle then failed to retrieve it from the SingleChronicleQueueStore is it worth forcing a directoryListing.refresh(true) and retrying before throwing an exception?

The text was updated successfully, but these errors were encountered:

glukos · 2021-10-12T21:58:06Z

@JerryShea @nicktindall
NotReachedException is thrown only by originalToEnd(), where this trick is already implemented:

    @NotNull
    private ExcerptTailer callOriginalToEnd() {
        try {
            return originalToEnd();
        } catch (NotReachedException e) {
            queue.refreshDirectoryListing();
            // due to a race condition, where the queue rolls as we are processing toEnd()
            // we may get a NotReachedException  ( see https://github.com/OpenHFT/Chronicle-Queue/issues/702 )
            // hence are are just going to retry.
            try {
                return originalToEnd();
            } catch (Exception ex) {
                Jvm.warn().on(getClass(), "Unable to find toEnd() so winding to the start " + ex);
                return toStart();
            }
        }
    }

In normal flow, when files are not manually moved, shared variable TableDirectoryListing#maxCycleValueis incremented by the roller. Other threads / processes will be aware of the change. I don't see any issues here so I'd close this ticket.

nicktindall · 2021-10-12T22:18:07Z

I think the scenario we saw was in optimizedToEnd, where an IllegalStateException is thrown

Chronicle-Queue/src/main/java/net/openhft/chronicle/queue/impl/single/StoreTailer.java

Lines 887 to 891 in db7a1ec

    
           final SingleChronicleQueueStore wireStore = queue.storeForCycle( 
        
                   lastCycle, queue.epoch(), false, this.store); 
        
           this.setCycle(lastCycle); 
        
           if (wireStore == null) 
        
               throw new IllegalStateException("Store not found for cycle " + Long.toHexString(lastCycle) + ". Probably the files were removed? queue=" + queue.fileAbsolutePath());

It's also repeated in approximateLastCycle2:

Chronicle-Queue/src/main/java/net/openhft/chronicle/queue/impl/single/StoreTailer.java

Lines 748 to 752 in db7a1ec

    
           final SingleChronicleQueueStore wireStore = (cycle == lastCycle) ? this.store : queue.storeForCycle( 
        
                   lastCycle, queue.epoch(), false, this.store); 
        
           this.setCycle(lastCycle); 
        
           if (wireStore == null) 
        
               throw new IllegalStateException("Store not found for cycle " + Long.toHexString(lastCycle) + ". Probably the files were removed? queue=" + queue.fileAbsolutePath());

but that seems to be retried already

Chronicle-Queue/src/main/java/net/openhft/chronicle/queue/impl/single/StoreTailer.java

Lines 768 to 775 in db7a1ec

    
           while (prevCycle < lastCycle) { 
        
               lastCycle--; 
        
               try { 
        
                   return approximateLastCycle2(lastCycle); 
        
               } catch (IllegalStateException e) { 
        
                   // try again. 
        
               } 
        
           }

This is probably seperate enough to what you're working on that I could try and do a PR for this today, it doesn't seem that hard

nicktindall · 2021-10-13T06:27:27Z

I just made an attempt at this and I don't think it's a good idea.

The problem is, if we implement it, it'll only refresh the cache values if we hit a path that looks for one of these missing files, if we don't, the invalid cache will happily live on.

I think #924 is actually the correct solution. If we have a file watcher we can cheaply (I think) check the validity of the cache (by looking for events in the watcher) every time we use values from it and if it isn't valid we can trigger a refresh. I'm not going to fix this and will close if you agree @glukos @JerryShea ?

nicktindall · 2021-10-13T06:33:09Z

(i.e. maybe we can call refreshDirectoryListing prior to every use of the cached values, and in refresh we can quickly see if there's been any file-system change events, and if not we can skip the refresh)
I think it would make the "force" param redundant because we'd know if were actually necessary to do it or not.
DISCLAIMER: this is based on no real analysis, just spit-balling here

glukos · 2021-10-13T15:20:18Z

I'm ok with closing this.

In my understanding, #924 is mostly about performance. It will save us from unnecessary refreshes, but it will not fix a data race - obviously, WatchService doesn't priovide any guarantees on how fast events will arrive.
As I see it, the initial purpose of refresh was dealing with manual manipulations with queue dir. Ideally, we shouldn't rely on the refresh to resolve data races that happen in normal workflow.

Some more spit-balling - that's how tailers should find out that a new cycle is created:

                    if (!readOnly && createIfAbsent && wire.writeFirstHeader()) {
                        // implicitly reserves the wireStore for this StoreSupplier
                        wireStore = storeFactory.apply(that, wire);
                        wire.updateFirstHeader();
                        wire.usePadding(wireStore.dataVersion() > 0);

                        wireStore.initIndex(wire);
                        // do not allow tailer to see the file until it's header is written
                        directoryListing.onFileCreated(path, cycle); // increments shared variable
                        // allow directoryListing to pick up the file immediately

Firstly, we write header to a mmapped memory, then we increment shared counter.
Tailer in another thread / process may go here:

                if (!createIfAbsent &&
                        (cycle > directoryListing.getMaxCreatedCycle()
                                || cycle < directoryListing.getMinCreatedCycle()
                                || !path.exists())) {
                    return null;

and find updated getMaxCreatedCycle(), but !path.exists() way be true as per file creation hasn't been synced by OS yet.
Should we consider spinning for a while here before returning null?

…testing #913

nicktindall changed the title ~~When first and last cycle are out of date, should we try refreshing them before we throw?~~ When first and last cycle are out of date, should we try refreshing them and retrying before we throw? Sep 29, 2021

JerryShea mentioned this issue Oct 11, 2021

Reduced unwanted Object creation #922

Closed

RobAustin assigned glukos Oct 12, 2021

nicktindall added a commit that referenced this issue Oct 13, 2021

Recover from _some_ missing files scenarios more gracefully, improve …

c9b26a0

…testing #913

nicktindall mentioned this issue Oct 13, 2021

Recover from _some_ missing files scenarios more gracefully, improve testing #926

Merged

nicktindall added a commit that referenced this issue Oct 14, 2021

Recover from _some_ missing files scenarios more gracefully, improve …

d31fa0f

…testing #913

JerryShea closed this as completed Dec 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When first and last cycle are out of date, should we try refreshing them and retrying before we throw? #913

When first and last cycle are out of date, should we try refreshing them and retrying before we throw? #913

nicktindall commented Sep 29, 2021

glukos commented Oct 12, 2021

nicktindall commented Oct 12, 2021

nicktindall commented Oct 13, 2021 •

edited

Loading

nicktindall commented Oct 13, 2021

glukos commented Oct 13, 2021 •

edited

Loading

When first and last cycle are out of date, should we try refreshing them and retrying before we throw? #913

When first and last cycle are out of date, should we try refreshing them and retrying before we throw? #913

Comments

nicktindall commented Sep 29, 2021

glukos commented Oct 12, 2021

nicktindall commented Oct 12, 2021

nicktindall commented Oct 13, 2021 • edited Loading

nicktindall commented Oct 13, 2021

glukos commented Oct 13, 2021 • edited Loading

nicktindall commented Oct 13, 2021 •

edited

Loading

glukos commented Oct 13, 2021 •

edited

Loading