This repository contains examples to help you get started with multithreading in Java - though the concepts should be applicable across any modern programming language.
[This repository is still under development]
I made this repository to act as a personal reference guide to building scalable and high performance multithreaded applications, and I hope this proves useful to others as well.
- Using the Thread class and an anonymous instance of Runnable.
- Setting UncaughtExceptionHandlers on your threads
- Defining your own class that extends Thread.
- Case Study: Bank Robbery
- Two threads try to brute force through a password to a vault, while another thread simultaneously tries to catch them.
- Handling interrupts, when your methods already throw/handle external interrupts
- Handling external interrupts, by checking for them on a periodic basis - ideal for methods that might not already handle them.
- Daemon threads - To allow your application to exit without being blocked due to some thread running in the background.
- Joins - How to guarantee that a thread upon which we depend, completes its work by the time we expect it.
- Latency - The time to completion of a task. Measured in time units.
- Throughput - The amount of tasks completed in a given period. Measured in tasks/time unit
Optimizing for Latency
- Image Processing - We run an image recolouring algorithm over an image in both a sequential and a multithreaded scenario and demostrate the performace gains achieved with the help of more threads.
Optimizing for Throughput
Thread Pooling: Creating the threads once and reusing them for future tasks, instead of recreating the threads each and every time from scratch.
- HTTP Server - We run a CPU heavy task on a thread pool inside a server and look at the throughput gains as we increase the number of worker threads in the thread pool.
Monitors: You can use the synchronized API. It provides a locking mechanism designed to prevent access to a block of code or an entire method by multiple threads. There are 2 ways to use the synchronized keyword:
- You can declare one or more methods in a class using the synchronized keyword. When one or more threads try and call these methods on the same object of this class, only one thread will be able to execute either of these methods.
- Define the block of code that we consider as Critical Section and use the synchronized keyword to restrict access only to that section without making the entire methods synchronized. This provides us with a lot more flexibility ,to have separate critical sections synchronize on different objects.
Note: The synchronized block is reentrant - which means for instance if a Thread A is accessing a synchronized method, while already being in a different synchronized method or block, it will be able to access that synchronized method with no problem. Basically, a thread cannot prevent itself from entering a critical section.
Atomic Operations: Sadly most operations we perform are often not atomic.
- All reference assignments are atomic (so we can change the object reference in a single operation safely - hence, all our getters and setters are atomic).
- All assignments to primitive types are atomic and safe, except long and double.
- That means we can read from, and write to the following types atomically:
- int
- short
- byte
- float
- char
- boolean
- long and double are exceptions, as they are 64 bit long - for them Java doesn't provide any guarantees (even on 64 bit machines). It's entirely possible that a write to a long or a double takes two internal operations by the CPU - one write to the lower 32 bits, and one to the upper 32 bits.
- Fortunately, we are provided with the volatile keyword, which we can use when declaring long or double variables -
and reads/writes to those variables are atomic, thread-safe - in other words, they are guaranteed to be performed with
a single hardware operation.
volatile double x = 1.0;
Metrics Aggregation: Frequently, when running production applications we need to make sure how long certain important operations or pieces of code take. The length of those operations can depend on client's input data, environment, etc. It is important for us to identify the duration of those operations and performance issues and optimize the customer's experience.
Race Conditions: A race condition occurs when -
- Multiple threads are accessing a shared resource
- At least one thread is modifying the shared resource
- The timing of the threads' scheduling might cause incorrect results
- *The core of the problem is non-atomic operations performed on a shared resource.
Data Race: Problem -
- Compiler and CPU may execute instructions Out of Order to optimize performance and hardware utilization.
- The will do so while maintaining the logical correctness of the code.
- Out of Order execution by the compiler and the CPU are important features to speed up the code.
- The compiler re-arranges instructions for better:
- Branch Prediction (optimized loops, "if" conditions, etc.)
- Vectorization - parallel instruction execution (SIMD)
- Prefetching intructions - better cache performance
- The CPU re-arranges instructions for better hardware units utilization.
Reenterant Lock:
-
Locks that allow greater flexibility to the programmer by providing methods for querying threads in locked/wait state, or querying locks to see which threads are waiting on it. These locks also provide better functionality for handling cases where a thread can't get access to a particular lock.
-
User Interface Application: Has two threads that access the same resource. The UI thread reads from the shared resource, while the worker thread writes to the shared resource.
-
Read and Write Locks: Having specialized read and write locks can be much more efficient in read intensive workflows.
- Semaphores
- Condition Variables: Powerful and general way of inter-thread communication.
- Matrix Multiplication with Back-Pressure: We use interthread communication techniques which make use of wait and notify - to implement a thread-safe queue for matrix multiplication, where the consumer provides back-pressure to prevent getting overloaded from the producer.
Java offers inbuilt support for lock free techniques to achieve thread safe functionality in several tasks.
- Lock-Free Stack Implementation: A lock free stack that outperforms a blocking stack because of low overhead as compared to maintaining locks.