Skip to content

faq 305102862

Billy Charlton edited this page Sep 5, 2018 · 2 revisions

General guidelines for large scenarios

by Gregory Macfarlane on 2018-07-27 14:33:53


I'm running a scenario which has (at the moment) 5.5 million people on a 66k link network. I successfully ran a 10% sample of this network last night on an AWS machine with 122 GB of available RAM and a heap size (-Xmx) 64 Gb. I still want to look back and figure out if I was maxing anything out, but I know that a 100 percent sample has crashed whenever I've run it, in any configuration (I bumped from a 32 GB machine to 122Gb machine but no dice).

There will come a day when I would like to run this scenario with something close to a 100 percent population. I'm curious if others have learned anything about efficient configuration of large scenarios. I'll admit that I am basically just throwing things at the wall until they stick (or my scenario runs) and that I don't fully understand the difference between garbage collection memory and system memory. Is there already a document explaining this in the context of MATSim? Are there known steps the user can configure to make things run on a smaller machine?

I know that I could go and get a bigger machine; 1TB is available, but I need to weigh the cost of looking for the right size against the higher cost of running a machine that is too large. What are the specs of the hardware used on the Germany national model? what about Singapore and Hong Kong?

I'm curious if we could build a database of large-scale MATSim runs including network size, population size, run time, and hardware specification so that the community can learn what kinds of things are working and optimize things together. I'm happy to share my notes on this with the community.


Comments: 0

Clone this wiki locally