Streaming large result set from MySQL using Java 8 and then download data as large/big CSV file.
Fetching and processing large amount of resultsets from the database, MySQL in my case, using simple and commonly used approaches often result in out of memory error. With the ORMs or libraries like JPA, where we work on high level abstractions provided by such libraries, chances of OutOfMemory error is very high if the resultset is considerably large.
With the introduction of Streams in Java 8, we can fetch and process large amount of resultsets with minimal memory foot print.
Git clone the project on your local machine and add it to your workspace
For runnning this, you will need
- Java 1.8
- Gradle support - In Eclipse editor, goto help -> eclipse marketplace -> search for buildship (buildship gradle integration) and install it.
I have loaded MySQL database with half a million records('500, 000') of UserDetail model. You can download the sample data from Sample-SQL-File-500000rows.sql. I have also added schema [create.sql] and records file [Sample-SQL-File-500000-Rows.csv] inside [src/main/resources] folder
Then I have added two methods in SampleController, one for Stream (generateCSVUsingStream) and Other for List (generateCSVUsingList).
For running the app,
- Open application.properties file and update "spring.datasource.*" properties as per your configurations.
- Once, changes are done in application.properties, open "AppStarter.Java" file and select run/debug.
- If app starts successfully, hit. (I did some sort of perf monitoring (threads used 50), attaching perf results for both requests.)
GET http://localhost:8080/api/sample/userdetail/list/csv
- This shows stats for CPU, Heap, Classes and Threads when using List
- This shows stats for Visual GC when using List
GET http://localhost:8080/api/sample/userdetail/stream/csv - This shows stats for CPU, Heap, Classes and Threads when using Stream
- This shows stats for Visual GC when using Stream
Note : Looking at [list_gc.png], the old generation is at its max capacity. Heap almost remains choked up for the entire duration until HTTP response is sent back to the client. In case of any new HTTP request/s during this critical period, we are highly likely to get OutOfMemoryError.
Objects reaching the Old generation are stuck until a Major garbage collection cycle happens(an expensive operation, stopping everything for at least a second to as much as 2 seconds). Thus, it becomes one of the important aspect of application arhitecture to prevent objects from being "tenured" into the Old generation.
Now, if we look at [stream_gc], the Old generation is just a flat line (no objects are getting tenured) and thus allowing room to handle any other incoming HTTP requests smartly. This explains Java 8 streams are the way to go.
- Spring Boot - The web framework used
- Gradle - Dependency Management