Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phlox Exception Error #126

Open
Vinhold opened this issue Jun 11, 2021 · 7 comments
Open

Phlox Exception Error #126

Vinhold opened this issue Jun 11, 2021 · 7 comments

Comments

@Vinhold
Copy link

Vinhold commented Jun 11, 2021

A strange situation occurred in the Enchanted Grid where something caused this exception to be thrown in Phlox:
2021-06-09 13:40:37,420 [34] ERROR InWorldz.Phlox.Engine.MasterScheduler - [Phlox]: CRITICAL ERROR. The master scheduler has caught an exception from the execution environment. This indicates a bug in the script engine. The script engine has been disabled. System.InvalidOperationException: Collection was modified; enumeration operation may not execute.
at System.ThrowHelper.ThrowInvalidOperationException(ExceptionResource resource)
at System.Collections.Generic.Dictionary`2.ValueCollection.Enumerator.MoveNext()
at OpenSim.Region.Framework.Scenes.SceneObjectPartInventory.HandleChangedScripts(TaskInventoryItem newItem, ReplaceItemArgs replaceArgs)
at OpenSim.Region.Framework.Scenes.SceneObjectPartInventory._ReplaceItem(SceneObjectPart part, UUID itemID, Boolean allowedDrop, Boolean fireEvents, ReplaceItemArgs replaceArgs)
at InWorldz.Phlox.Engine.ScriptLoader.PerformUnloadRequest(LoadUnloadRequest unloadReq)
at InWorldz.Phlox.Engine.ScriptLoader.CheckAndPerformLoadUnloadRequest()
at InWorldz.Phlox.Engine.ScriptLoader.DoWork()
at InWorldz.Phlox.Engine.MasterScheduler.WorkLoop()
2021-06-09 13:42:45,933 [58] ERROR InWorldz.Phlox.Engine.EngineInterface - [Phlox]: Unable to retrieve state data for 67432bce-eba6-4830-adad-08b922d841c6 master scheduler has died.

All scripts in the region would no longer work. Some other regions have started to show the same problems and have the same source. We would like to know the cause of this problem and how it happened.

@zauberparacelsus
Copy link
Contributor

The last error line about being unable to retrieve state data for a script stands out to me.

In the Enchanted discord server, I recall some talk about script-related problems occurring when several regions are starting up at the same time, related to network/database failures. Those could potentially cause script state retrieval to fail, depending on how script states are stored and retrieved.

@Yichard
Copy link

Yichard commented Jun 11, 2021

Indeed some days ago a problem occurred when several regions were starting simultaneously: a region restart changed event could not retrieve the UUID of the script owner. The workaround was to wait for some seconds, and use llGetOwner.
I am not sure however this is the error Vinhold reported here. To understand what happened we need to know when, and where it happened.

@zauberparacelsus
Copy link
Contributor

It would be about yesterday when I saw people on the discord server talking about the script engine not working.

Can't check to confirm right now though, because Discord started having issues within the last 10-20 minutes and is unusable.

@emperorstarfinder
Copy link
Contributor

So looking at the error message,

This may be a case of multiple regions starting and loading the scriptengine caches for the dll at the same time causing a conflict in the scheduler. When starting regions unless it is a region restart, they should be started one at a time to ensure that there is no possible conflicts during the startup process and loading the script dlls for each script in a region.

I, however, would need to see the actual logs of this in each of the regions to see exactly what is really going on when this happens. The other thing I would want to know is if other worlds are experiencing this as well or is it just one world. If it is only on Enchanted and other worlds are not experiencing this then I would say it is something specific to that world and likely not a bug in the scriptengine itself.

@Vinhold
Copy link
Author

Vinhold commented Jun 11, 2021

I had taken a look in the log of the region to see when it had been restarted last. It was started up on 6/8/2021 at 1:00pm The error in Phlox occurred 6/9/2011 1:30pm and I could not clearly determine if anyone was in the region at that time, so it was not clear who may have run any scripts that would have caused the problem. This situation had not happened before in any world I have setup and is currently running. But some problems were being reported since 6/8 after that restart was done. I do not know what the exception means as for what triggered the error, so posted it here to see what can be determined from it what was the likely cause. Grated it may have its source with how the regions had been started up, that is also under investigation.

@emperorstarfinder
Copy link
Contributor

Is the world up now, and are scripts working as they should?

If so then I think it is safe to say it is just the scheduler thought there was a conflict with multiple regions starting at the same time. If it is only Enchanted having that problem then you should probably look into what else is going on in that world that may potentially create an issue, Otherwise, I don't really see an actual error in that message so far that jumps out and says that there is an issue in Phlox other then a conflict with multiple regions starting at the same time.

@kf6kjg
Copy link
Contributor

kf6kjg commented Jun 11, 2021

Based on some research into this error I think there might be rarely occurring concurrency bug. Please have Halcyon grid operators looking to see if this happens again.

Details: The underlying error indicates that a Dictionary object was modified during an enumeration operation, such as a foreach loop. The HandleChangedScripts method has exactly one such loop, used to count the number of LSL scripts in the object inventory. Thus my suspicion is that something somehow added or removed an item of any kind in an object inventory while a script was unloading. From what I can tell most calls to that data object are under a lock. However the _ReplaceItem method and HandleChangedScripts do not. This might be the rot of the issue.

Either way, this area of the code hasn't changed since the import from SVN, so it's either a .Net update changed a behavior or that it's a rare enough occurrence to have avoided observation until now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants