Skip to content

HPC Case Studies for RLLVMCompile

Michael Kane edited this page Feb 27, 2015 · 2 revisions
  1. Override the memory allocator - Override R's current memory allocator so that allocations above a specified size are memory mapped to a specified directory.
  2. Alternative character vector implementation - Character vectors with many elements can be difficult for the system to manage since each vector is associated with a separate memory allocation. An alternative for big character vectors might be to collect subsets of the vector into "blocks" which consist of a single chunk of memory and pointer offsets to the beginning of each element.
  3. Allow extrnalptr types to act as "first class" objects - External objects are precluded from being "first class" objects in R because there is currently no mechanism for overriding the underlying duplicate function. It would be nice to propose and implement a mechanism for doing this.
  4. Allow any object to be treated as a reference - It would be nice if we could treat any object as a reference, including R's native types. It might be nice to implement this through the current object systems. If an object inherits from "reference" it is treated as a reference.
  5. Provide a mechanism for to allow R to marshal user defined objects - It would be nice to present R with objects to be marshaled to subvert importing them. For example, if I binary R object (including the header) on a disk, I might like to memory map it and pass the pointer to the memory mapped location for R to treat as a native object.
  6. Provide copy-on-write with the memory allocator (hierarchical memory) - A common memory-use pattern used in R (particularly in loops) is to malloc new memory, copy data from an existing object, make either no changes or few changes, and free the associated memory. If instead we were to provide a copy-on-write memory layer, we could avoid the associated overhead, which is particularly bad for when objects are large.
  7. Compile model.matrix for a fixed formula object - A common method for creating a regression where the number of samples is much larger than the number of variables is to read in "chunks" of the rows, calculate intermediate values, and aggregate them. A current computational bottleneck when doing this is converting from the data (a data.frame) to a model matrix. The calls to model.matrix could be faster if we could compile with a fixed formula object and contrasts.
Clone this wiki locally