-
Notifications
You must be signed in to change notification settings - Fork 4
Field API
The Field
object is the analogy of the Pandas DataFrame Series
or Numpy ndarray
in ExeTera.
Fields contain (often very large) arrays of a given data type, with an API that allows intuitive manipulations of the data.
In order to store very large data arrays as efficiently as possible, Fields store their data in ways that may not be intuitive to people familiar with Pandas or Numpy. Numpy makes certain design decisions that reduce the flexibility of lists in order to gain speed and memory efficiency, and ExeTera does the same to further improve on speed and memory. The IndexedStringField
, for example, uses two arrays, one containing a concatinated array of bytevalues from all of the strings in the field, and another array of indices indicating where each field starts and end. This is much faster and more memory efficient to iterate over than a Numpy string array when the variability of string lengths is very high. This kind of change however, creates a great deal of complexity when exposed to the user, and Field
does its best to hide that away and act like a single array of string values.
f = # get a field from somewhere
g = f.create_like() # creates an empty field
The following snipped shows
df = # get a dataframe from somewhere
df['c'] = df['a'] + df['b']