Skip to content

Commit

Permalink
Implement Fillable, which are append-only, non-readable arrays. (#18)
Browse files Browse the repository at this point in the history
In the end, I implemented `Fillables` for all types _except_ records (which will be a little more complicated in the way they interact with unions, though not impossible). The promotion of `UnknownType` to first-class status will necessitate an `EmptyArray` that's equivalent to any type of array (on the README as a to-do item).

* Put all array classes in an 'array' directory ('include', 'src', and '_numba').

* Start Fillable implementation.

* Will need high-level types for Fillable.

* Try passing high-level types through to Python.

* High-level 'Type' classes pass through pybind11 with inheritance.

* Set up inheritance for 'Content' as well; no more 'anycontent'.

* Basic 'BoolFillable' works.

* Null-handling in fillables compiles.

* [skip ci] in progress.

* Can't use 'std::vector' because of memory ownership; defining our own GrowableBuffer.

* Fix compilation errors.

* FillableArray is usable as an array, but it's not a Content.

* Now use the UnknownFillable stuff.

* Add simdjson.

* Pass down a 'FillableOptions' object.

* Include missing 'FillableOptions' files.

* Fully replace 'Mostly*Fillables' with 'OptionFillable'.

* Start on NumberFillable.

* [skip ci] separate Int64Fillable from Float64Fillable.

* [skip ci] We'll need a UnionFillable.

* UnionType/UnionFillable compiles.

* [skip ci] checkpoint.

* OptionType, UnionType are usable in Python.

* Implemented UnionFillable.

* Implemented and tested Int64Fillable.

* [skip ci] checkpoint.

* [skip ci] checkpoint.

* Int64Fillable and Float64Fillable work together.

* Fix 32-bit issues and return value in UnionFillable::get2.

* Added skeleton for ListFillable/ListType.

* UnknownType (like a 'bottom type') is equal to all other types.

* Implemented all but the begin/end methods of ListFillable.

* Skeleton for beginlist/end.

* ListFillable::beginlist/end compiles.

* ListFillable::beginlist/endlist is working in a test.

* endlist() bubbles up to the deepest list that has begun.

* Fix last 32-bit issues and finish off PR.

* Fix compiler errors on MacOS.
  • Loading branch information
jpivarski authored Oct 22, 2019
1 parent 7515064 commit 81f0e40
Show file tree
Hide file tree
Showing 47 changed files with 1,995 additions and 32 deletions.
2 changes: 0 additions & 2 deletions .atom-build.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,2 @@
# cmd: ([ ! -d build ] && cmake -Bbuild) ; cmake --build build
# cmd: 'rm -rf **/*~ **/__pycache__ build dist *.egg-info awkward1/*.so **/*.pyc ; python setup.py build'
cmd: 'python setup.py build'
errorMatch: '(?<file>[/0-9a-zA-Z\._-]+):(?<line>\d+):(?<col>\d+):\s+error:\s+(?<message>.+)'
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
[submodule "pybind11"]
path = pybind11
url = https://github.com/pybind/pybind11.git
[submodule "simdjson"]
path = simdjson
url = https://github.com/lemire/simdjson.git
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ add_definitions(-DVERSION_INFO="${VERSION_INFO}")
set(CMAKE_MACOSX_RPATH 1)

file(GLOB CPU_KERNEL_SOURCES "src/cpu-kernels/*.cpp")
file(GLOB LIBAWKWARD_SOURCES "src/libawkward/*.cpp" "src/libawkward/array/*.cpp")
file(GLOB LIBAWKWARD_SOURCES "src/libawkward/*.cpp" "src/libawkward/array/*.cpp" "src/libawkward/fillable/*.cpp" "src/libawkward/type/*.cpp")
include_directories(include)

add_subdirectory(pybind11)
Expand Down
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,10 +59,14 @@ Completed items are ☑check-marked. See [closed PRs](https://github.com/scikit-
* [X] Error messages with location-of-failure information if the array has an `Identity` (except in Numba).
* [X] Fully implement `__getitem__` for int/slice/intarray/boolarray/tuple (placeholders for newaxis/ellipsis), with perfect agreement with [Numpy basic/advanced indexing](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html), to all levels of depth.
* [ ] Appendable arrays (a distinct phase from readable arrays, when the type is still in flux) to implement `awkward.fromiter` in C++.
* [X] Implemented all types but records; tested all primitives and lists.
* [ ] Implement appendable records.
* [ ] Test all (requires array types for all).
* [ ] JSON → Awkward via header-only [simdjson](https://github.com/lemire/simdjson#readme) and `awkward.fromiter`.
* [ ] Explicit broadcasting functions for jagged and non-jagged arrays and scalars.
* [ ] Extend `__getitem__` to take jagged arrays of integers and booleans (same behavior as old).
* [ ] Full suite of array types:
* [ ] `EmptyArray`: 1-dimensional array with length 0 and unknown type (result of `UnknownFillable`, compatible with all types of arrays).
* [X] `RawArray`: flat, 1-dimensional array type for pure C++ (header-only).
* [X] `NumpyArray`: rectilinear, N-dimensional array type without Python/pybind11 dependencies, but intended for Numpy.
* [X] `ListArray`: the new `JaggedArray`, based on `starts` and `stops` (i.e. fully general).
Expand Down
2 changes: 1 addition & 1 deletion VERSION_INFO
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.1.14
0.1.15
5 changes: 4 additions & 1 deletion awkward1/operations/convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,13 @@ def tolist(array):
elif isinstance(array, numpy.ndarray):
return array.tolist()

elif isinstance(array, awkward1.layout.FillableArray):
return [tolist(x) for x in array]

elif isinstance(array, awkward1.layout.NumpyArray):
return numpy.asarray(array).tolist()

elif isinstance(array, awkward1.util.anycontent):
elif isinstance(array, awkward1.layout.Content):
return [tolist(x) for x in array]

else:
Expand Down
10 changes: 0 additions & 10 deletions awkward1/util.py
Original file line number Diff line number Diff line change
@@ -1,11 +1 @@
# BSD 3-Clause License; see https://github.com/jpivarski/awkward-1.0/blob/master/LICENSE

import awkward1.layout

anycontent = (
awkward1.layout.NumpyArray,
awkward1.layout.ListArray32,
awkward1.layout.ListArray64,
awkward1.layout.ListOffsetArray32,
awkward1.layout.ListOffsetArray64,
)
2 changes: 2 additions & 0 deletions include/awkward/Content.h
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
namespace awkward {
class Content {
public:
virtual ~Content() { }

virtual const std::string classname() const = 0;
virtual const std::shared_ptr<Identity> id() const = 0;
virtual void setid() = 0;
Expand Down
2 changes: 1 addition & 1 deletion include/awkward/Index.h
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ namespace awkward {
const int64_t length_;
};

typedef IndexOf<uint8_t> Index8;
typedef IndexOf<int8_t> Index8;
typedef IndexOf<int32_t> Index32;
typedef IndexOf<int64_t> Index64;
}
Expand Down
2 changes: 2 additions & 0 deletions include/awkward/Slice.h
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ namespace awkward {
public:
static int64_t none() { return kSliceNone; }

virtual ~SliceItem() { }

virtual const std::shared_ptr<SliceItem> shallow_copy() const = 0;
virtual const std::string tostring() const = 0;
};
Expand Down
34 changes: 34 additions & 0 deletions include/awkward/fillable/BoolFillable.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
// BSD 3-Clause License; see https://github.com/jpivarski/awkward-1.0/blob/master/LICENSE

#ifndef AWKWARD_BOOLFILLABLE_H_
#define AWKWARD_BOOLFILLABLE_H_

#include "awkward/cpu-kernels/util.h"
#include "awkward/fillable/FillableOptions.h"
#include "awkward/fillable/GrowableBuffer.h"
#include "awkward/fillable/Fillable.h"

namespace awkward {
class BoolFillable: public Fillable {
public:
BoolFillable(const FillableOptions& options): options_(options), buffer_(options) { }

virtual int64_t length() const;
virtual void clear();
virtual const std::shared_ptr<Type> type() const;
virtual const std::shared_ptr<Content> snapshot() const;

virtual Fillable* null();
virtual Fillable* boolean(bool x);
virtual Fillable* integer(int64_t x);
virtual Fillable* real(double x);
virtual Fillable* beginlist();
virtual Fillable* endlist();

private:
const FillableOptions options_;
GrowableBuffer<uint8_t> buffer_;
};
}

#endif // AWKWARD_BOOLFILLABLE_H_
29 changes: 29 additions & 0 deletions include/awkward/fillable/Fillable.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
// BSD 3-Clause License; see https://github.com/jpivarski/awkward-1.0/blob/master/LICENSE

#ifndef AWKWARD_FILLABLE_H_
#define AWKWARD_FILLABLE_H_

#include "awkward/cpu-kernels/util.h"
#include "awkward/Content.h"
#include "awkward/type/Type.h"

namespace awkward {
class Fillable {
public:
virtual ~Fillable() { }

virtual int64_t length() const = 0;
virtual void clear() = 0;
virtual const std::shared_ptr<Type> type() const = 0;
virtual const std::shared_ptr<Content> snapshot() const = 0;

virtual Fillable* null() = 0;
virtual Fillable* boolean(bool x) = 0;
virtual Fillable* integer(int64_t x) = 0;
virtual Fillable* real(double x) = 0;
virtual Fillable* beginlist() = 0;
virtual Fillable* endlist() = 0;
};
}

#endif // AWKWARD_FILLABLE_H_
41 changes: 41 additions & 0 deletions include/awkward/fillable/FillableArray.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
// BSD 3-Clause License; see https://github.com/jpivarski/awkward-1.0/blob/master/LICENSE

#ifndef AWKWARD_FILLABLEARRAY_H_
#define AWKWARD_FILLABLEARRAY_H_

#include "awkward/cpu-kernels/util.h"
#include "awkward/Content.h"
#include "awkward/type/Type.h"
#include "awkward/fillable/FillableOptions.h"
#include "awkward/fillable/Fillable.h"
#include "awkward/fillable/UnknownFillable.h"

namespace awkward {
class FillableArray {
public:
FillableArray(const FillableOptions& options): fillable_(new UnknownFillable(options)) { }

const std::string tostring() const;
int64_t length() const;
void clear();
const std::shared_ptr<Type> type() const;
const std::shared_ptr<Content> snapshot() const;
const std::shared_ptr<Content> getitem_at(int64_t at) const;
const std::shared_ptr<Content> getitem_range(int64_t start, int64_t stop) const;
const std::shared_ptr<Content> getitem(const Slice& where) const;

void null();
void boolean(bool x);
void integer(int64_t x);
void real(double x);
void beginlist();
void endlist();

private:
std::shared_ptr<Fillable> fillable_;

void maybeupdate(Fillable* tmp);
};
}

#endif // AWKWARD_FILLABLE_H_
25 changes: 25 additions & 0 deletions include/awkward/fillable/FillableOptions.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
// BSD 3-Clause License; see https://github.com/jpivarski/awkward-1.0/blob/master/LICENSE

#ifndef AWKWARD_FILLABLEOPTIONS_H_
#define AWKWARD_FILLABLEOPTIONS_H_

#include <cmath>
#include <cstring>

#include "awkward/cpu-kernels/util.h"

namespace awkward {
class FillableOptions {
public:
FillableOptions(int64_t initial, double resize): initial_(initial), resize_(resize) { }

int64_t initial() const { return initial_; }
double resize() const { return resize_; }

private:
int64_t initial_;
double resize_;
};
}

#endif // AWKWARD_FILLABLEOPTIONS_H_
46 changes: 46 additions & 0 deletions include/awkward/fillable/Float64Fillable.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
// BSD 3-Clause License; see https://github.com/jpivarski/awkward-1.0/blob/master/LICENSE

#ifndef AWKWARD_FLOAT64FILLABLE_H_
#define AWKWARD_FLOAT64FILLABLE_H_

#include "awkward/cpu-kernels/util.h"
#include "awkward/fillable/FillableOptions.h"
#include "awkward/fillable/GrowableBuffer.h"
#include "awkward/fillable/Fillable.h"

namespace awkward {
class Float64Fillable: public Fillable {
public:
Float64Fillable(const FillableOptions& options): options_(options), buffer_(options) { }
Float64Fillable(const FillableOptions& options, const GrowableBuffer<double>& buffer): options_(options), buffer_(buffer) { }

static Float64Fillable* fromint64(const FillableOptions& options, GrowableBuffer<int64_t> old) {
GrowableBuffer<double> buffer = GrowableBuffer<double>::empty(options, old.reserved());
int64_t* oldraw = old.ptr().get();
double* newraw = buffer.ptr().get();
for (int64_t i = 0; i < old.length(); i++) {
newraw[i] = (double)oldraw[i];
}
buffer.set_length(old.length());
return new Float64Fillable(options, buffer);
}

virtual int64_t length() const;
virtual void clear();
virtual const std::shared_ptr<Type> type() const;
virtual const std::shared_ptr<Content> snapshot() const;

virtual Fillable* null();
virtual Fillable* boolean(bool x);
virtual Fillable* integer(int64_t x);
virtual Fillable* real(double x);
virtual Fillable* beginlist();
virtual Fillable* endlist();

private:
const FillableOptions options_;
GrowableBuffer<double> buffer_;
};
}

#endif // AWKWARD_FLOAT64FILLABLE_H_
98 changes: 98 additions & 0 deletions include/awkward/fillable/GrowableBuffer.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
// BSD 3-Clause License; see https://github.com/jpivarski/awkward-1.0/blob/master/LICENSE

#ifndef AWKWARD_GROWABLEBUFFER_H_
#define AWKWARD_GROWABLEBUFFER_H_

#include <cmath>
#include <cstring>
#include <cassert>

#include "awkward/cpu-kernels/util.h"
#include "awkward/fillable/FillableOptions.h"

namespace awkward {
template <typename T>
class GrowableBuffer {
public:
GrowableBuffer(const FillableOptions& options, std::shared_ptr<T> ptr, int64_t length, int64_t reserved): options_(options), ptr_(ptr), length_(length), reserved_(reserved) { }
GrowableBuffer(const FillableOptions& options): GrowableBuffer(options, std::shared_ptr<T>(new T[(size_t)options.initial()], awkward::util::array_deleter<T>()), 0, options.initial()) { }

static GrowableBuffer<T> empty(const FillableOptions& options) {
return GrowableBuffer<T>::empty(options, 0);
}

static GrowableBuffer<T> empty(const FillableOptions& options, int64_t minreserve) {
size_t actual = (size_t)options.initial();
if (actual < (size_t)minreserve) {
actual = (size_t)minreserve;
}
std::shared_ptr<T> ptr(new T[actual], awkward::util::array_deleter<T>());
return GrowableBuffer(options, ptr, 0, (int64_t)actual);
}

static GrowableBuffer<T> full(const FillableOptions& options, T value, int64_t length) {
GrowableBuffer<T> out = empty(options, length);
T* rawptr = out.ptr().get();
for (int64_t i = 0; i < length; i++) {
rawptr[i] = value;
}
return GrowableBuffer<T>(options, out.ptr(), length, out.reserved());
}

static GrowableBuffer<T> arange(const FillableOptions& options, int64_t length) {
size_t actual = (size_t)options.initial();
if (actual < (size_t)length) {
actual = (size_t)length;
}
T* rawptr = new T[(size_t)actual];
std::shared_ptr<T> ptr(rawptr, awkward::util::array_deleter<T>());
for (int64_t i = 0; i < length; i++) {
rawptr[i] = (T)i;
}
return GrowableBuffer(options, ptr, length, (int64_t)actual);
}

const std::shared_ptr<T> ptr() const { return ptr_; }

int64_t length() const { return length_; }
void set_length(int64_t newlength) {
if (newlength > reserved_) {
set_reserved(newlength);
}
length_ = newlength;
}

int64_t reserved() const { return reserved_; }
void set_reserved(int64_t minreserved) {
if (minreserved > reserved_) {
std::shared_ptr<T> ptr(new T[(size_t)minreserved], awkward::util::array_deleter<T>());
memcpy(ptr.get(), ptr_.get(), (size_t)(length_ * sizeof(T)));
ptr_ = ptr;
reserved_ = minreserved;
}
}

void clear() {
length_ = 0;
reserved_ = options_.initial();
ptr_ = std::shared_ptr<T>(new T[(size_t)options_.initial()], awkward::util::array_deleter<T>());
}

void append(T datum) {
assert(length_ <= reserved_);
if (length_ == reserved_) {
set_reserved((int64_t)ceil(reserved_ * options_.resize()));
}
ptr_.get()[length_] = datum;
length_++;
}

private:
const FillableOptions options_;
std::shared_ptr<T> ptr_;
int64_t length_;
int64_t reserved_;
};
}

#endif // AWKWARD_GROWABLEBUFFER_H_
Loading

0 comments on commit 81f0e40

Please sign in to comment.