###Finding The Median In Large Sets Of Numbers Split Across N Servers using zeromq and nodejs (experimental)
- It takes a data and distributes the data equally to workers;
- When StatsCollector's
getMedian
is called, sendsSORT
message to sort data on workers as first step, - After sort operation confirmed for all workers, master sends
GET_MEDIAN
message to get median for each worker and stores median of medians. This value is likely to be the median of our data set. - After this step the
binary search
approach is applied to find exact median.- As a first step of this approach, the median estimation which is median of medians which are gathered from workers, will be used as a mid value in binary search. By collecting the values which are upper and lower than the estimated median, I updated the estimated median in order to equalize the counts of upper and lower values.
- This step works recursively and I converge to the exact median.
- The recursive step is that the master sends
GET_LOWER_UPPER_COUNTS
message to get lower and upper counts regarding to estimated median.
Improvements
- Could be improve design by decouple from ZeroMQ to provide extensibility (e.g MPI).
- Dynamically manage worker size and data distribution to workers and continuous data processing (streaming)
- Could be implement multi-core processing using cluster on worker nodes to improve performance
Known issues
- It needs refactoring to support duplicate data handling
- It needs design refactoring
##Usage
###Install Dependencies
On Windows
npm install
On Linux
sudo npm install
Start App
//Start Workers up to size that determined in config file (for example:3)
node main.js --role='WORKER'
node main.js --role='WORKER'
node main.js --role='WORKER'
//Start Master
node main.js --role='MASTER'
Test
npm test
Coverage
npm run test-cov
ESLint
npm run lint