Skip to content
This repository has been archived by the owner on Nov 2, 2021. It is now read-only.

Deadlock. #16

Open
magnusfeuer opened this issue Mar 24, 2015 · 3 comments
Open

Deadlock. #16

magnusfeuer opened this issue Mar 24, 2015 · 3 comments

Comments

@magnusfeuer
Copy link
Member

Under load, the RVI deadlocks in several instances when Component A calls Component B, while Component B calls Component A.

An example is service_edge_rpc's handle_remote_message(), called by protocol_rpc, which can be blocked if, at the same time the service_edge_rpc is currently processing a handle_local_message gen_server call which is indirectly calling protocol. The to call chains will, in this event, be blocked.

The solution is to replace synchronous calls (gen_server:call()), with asynchronous notifications (gen_server:cast()) that do not wait for a return value before continuing operations.

WIll be fixed in the gen_server_fix feature branch and 0.3.2

@magnusfeuer
Copy link
Member Author

Another symptom of this bug is exhaust of file descriptors:

This eventually leads to:
14:09:08.197 [info] data_link_bert:receive_data(): Failed to send component request: {error,emfile}
...
14:09:08.197 [error] gen_server <0.4978.0> terminated with reason: emfile
...
14:09:08.200 [error] CRASH REPORT Process <0.4978.0> with 0 neighbours crashed with reason: maximum number of file descriptors exhausted, check ulimit -n

The reason is that the components are trying to make a JSON-RPC call to each other, but end up in the deadlock described above. Each waiting JSON-RPC call consumes one file descriptor out of the maximum 1024 allowed. Under load, the descriptors are all consumed.

@magnusfeuer
Copy link
Member Author

Deadlock can be recreated through the 'tc' command provided by iproute2.

On the backend server (rvi-test1.nginfotpdx.net - 38.129.64.31). Issue a tc command that will introduce a 20-500ms delay with a 25% probability:

tc qdisc add dev eth0 root netem delay 500ms 20ms 25%

Check out branch 0.3.1 on rvi-test1:

ssh -p1066 [email protected]
cd rvi
git pull origin 0.3.1
git checkout 0.3.1
make clean
rm -rf backend
make 
./scripts/setup_rvi_node.sh -d -n backend -c ~/rvi_backend_0_3_x.config
./scripts/rvi_node.sh -n backend

Start the mobile HVAC interface, and make sure it connects to rvi-test1.nginfotpdx.net:8808/websession, by checking its js/main.js file for its rvi.connect statement.

Install RVI 0.3.1 RPM on an IVI box.

Edit /opt/rvi-0.3.1/sys.config and set the static node entry to look like this:

       {static_nodes,[{"jlr.com/backend/","38.129.64.31:8807"}]},

Edit the node_service_prefix entry to look like this:

       {node_service_prefix,"jlr.com/vin/mfeuer"},

(Replace mfeuer with a suitable unique string)

Reboot the RVI box.

Launch mobile HVAC interface.

Drag the left temperature sensor on the mobile HVAC interface quickly up and down for 10 seconds.

The RVI node on rvi-test1.nginfotpdx.net will freeze with timeouts.

@amcgee7
Copy link
Contributor

amcgee7 commented Mar 25, 2015

A quick fix to this would be to pace our requests. For example the slider
should only update once a second.

Art

On 24 March 2015 at 16:18, Magnus Feuer [email protected] wrote:

Deadlock can be recreated through the 'tc' command provided by iproute2.

On the backend server (rvi-test1.nginfotpdx.net - 38.129.64.31). Issue a
tc command that will introduce a 20-500ms delay with a 25% probability:

tc qdisc add dev eth0 root netem delay 500ms 20ms 25%

Check out branch 0.3.1 on rvi-test1:

ssh -p1066 [email protected]
cd rvi
git pull origin 0.3.1
git checkout 0.3.1
make clean
rm -rf backend
make
./scripts/setup_rvi_node.sh -d -n backend -c ~/rvi_backend_0_3_x.config
./scripts/rvi_node.sh -n backend

Start the mobile HVAC interface, and make sure it connects to
rvi-test1.nginfotpdx.net:8808/websession, by checking its js/main.js file
for its rvi.connect statement.

Install RVI 0.3.1 RPM on an IVI box.

Edit /opt/rvi-0.3.1/sys.config and set the static node entry to look like
this:

   {static_nodes,[{"jlr.com/backend/","38.129.64.31:8807"}]},

Edit the node_service_prefix entry to look like this:

   {node_service_prefix,"jlr.com/vin/mfeuer"},

(Replace mfeuer with a suitable unique string)

Reboot the RVI box.

Launch mobile HVAC interface.

Drag the left temperature sensor on the mobile HVAC interface quickly up
and down for 10 seconds.

The RVI node on rvi-test1.nginfotpdx.net will freeze with timeouts.


Reply to this email directly or view it on GitHub
#16 (comment).

Art McGee
Infotainment Engineer

Jaguar Land Rover North America, LLC
1419 NW 14th Ave, Portland, Oregon, 97209
JaguarUSA.com http://www.jaguarusa.com/index.html | LandRoverUSA.com
http://www.landrover.com/us/en/lr/

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants