-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow multiple CUs to write output files to a single DU. #173
Comments
The task is still in 'Running' until the output data of the task is moved, which is captured by output DU. This should be decoupled.. The task should be moved to 'Done' state, to improve concurrency and Output DU should take care of moving data. |
Hi Pradeep, apart from the semantic problems this would create, why is it an optimization to have all data in one DU? |
consider below example - I have 1000 map tasks running, where each task generates set of output files.. which need to be grouped later into 8 DUs.... With current framework.. I need to create 1000 empty intermediate Output DUS to store files of each task.. and later create 8 more DUS to manage the files between DUs. With the 1000 tasks directly writing to 8 DUS, will eliminate the creation of intermediate 1000 DUS and related wait time. This is useful for applications where high performance is required, where even milli seconds matter.. I think, this is additional flexibility that can be provided, and its upto application to use it or not. |
Hi Pradeep, On 19 Feb 2014, at 19:19 , pradeepmantha [email protected] wrote:
Ok, clear.
Ok.
Arguably you don’t need those 8 DUs but could divide the 1000 DUs in 8 piles, right?
I hear your concern about performance, which is important of course. Having “mixed” writing to a DU opens up a whole can of worms:
Note that I’m not against your “wish”, but your proposal has far reaching implications, so it needs to be considered carefully. Gr, Mark |
Hi,
i.e Task 1 creates - { file-0, file-1,file-2, file-3.... file-7} Now 8 Dus should look like. DU -1 = { file1-0, file2-0, file3-0 .. file999-0}DU-8 = { file 1-7,file2-7 ... file999-7} so each DU ( of the 8DUS) need one file from the 1000 DUs, which is lot of Having "mixed" writing to a DU opens up a whole can of worms:
with Current PD design? Consider CUS running on XSEDE cluster that supports
on the affinity of target DU & Pilot-Data affinity.
be long and need a intermediate state to avoid cpu resource blocking, until
Thanks On Wed, Feb 19, 2014 at 11:50 AM, Mark Santcroos
|
Hi Pradeep, Still had this open, gave it some more thought, triggered by the Two Tales paper. Long ago pradeepmantha [email protected] wrote:
Ok, I didn’t understand the pattern correctly, thanks for clarifying. But I’m still tempted to say that you should “just” either: If we don’t talk about performance at this stage, then from a semantical/expressiveness perspective, this both does what you need I believe. Of course both A and B will have different performance characteristics, but I would consider that an optimisation problem once we settle on the semantics.
Regardless of what route we go there, I’m tempted to say that the initial output DUs should be written to by just one CU.
Thats a bit too ambiguous ...
See my comment about states above. Overall, I fully agree that the current granularity of PD makes use cases like yours complex. Then the issue of having to manage 1, 8, 1000, or 8000 DUs becomes “just” a performance issue. Gr, Mark |
Use case - In case of MapReduce - multiple Map tasks generates data related to a single reduce task.
Currently all the output files of a Map task are stored together in a DU, and later all the files related to a reduce from Map tasks are segregated by the MapReduce framework to pass the DU's as inputs to reduce task.
This could be optimized, if we can allow a Map task to write output files to a reduce DU's directly.
I envision, this could be a useful feature for other use cases too.
There could be some concurrency problems with metdata udpate, I just want log this, as we might come up with some solution to make this possible.
The text was updated successfully, but these errors were encountered: