You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now that Scoobi has DObjects, it should be possible to implement "map-side" joins. That is, joins where one data set is small enough to fit into memory.
The text was updated successfully, but these errors were encountered:
@raronson - please provide some requirements on the functionality of the types of map-side joins we'd like to see. Appropriate names and pointers to implementation approaches would be great.
Often the smaller table in the map-side join is some data set that is present on disk. In order to perform the map-side join, it needs to be ingested and placed inside a DObject. For example:
The way this works is that the reading of the file and placing it into a Map is all done client side. Then, a serialised version of the Map is pushed to the distributed cache from which each mapper task reads and deserialises as part of the join with the DList.
The problem with this approach is that it may be more efficient to move all of this work (reading the file and creating the Map) into the setup phase of the mapper (or reducer) task. And I don't believe there is a way to make this happen with the APIs Scoobi currently exposes ...
Now that Scoobi has DObjects, it should be possible to implement "map-side" joins. That is, joins where one data set is small enough to fit into memory.
The text was updated successfully, but these errors were encountered: