Skip to content

Field API

Ben Murray edited this page Apr 19, 2021 · 9 revisions

Fields

The Field object is the analogy of the Pandas DataFrame Series or Numpy ndarray in ExeTera. Fields contain (often very large) arrays of a given data type, with an API that allows intuitive manipulations of the data.

Fields correspond to one (or more) arrays of data

In order to store very large data arrays as efficiently as possible, Fields store their data in ways that may not be intuitive to people familiar with Pandas or Numpy. Numpy makes certain design decisions that reduce the flexibility of lists in order to gain speed and memory efficiency, and ExeTera does the same to further improve on speed and memory. The IndexedStringField, for example, uses two arrays, one containing a concatinated array of bytevalues from all of the strings in the field, and another array of indices indicating where each field starts and end. This is much faster and more memory efficient to iterate over than a Numpy string array when the variability of string lengths is very high. This kind of change however, creates a great deal of complexity when exposed to the user, and Field does its best to hide that away and act like a single array of string values.

Field usage examples

Create a field from another field

f = # get a field from somewhere
g = f.create_like() # creates an empty field

Field arithmetic

The following snipped shows

df = # get a dataframe from somewhere
df['c'] = df['a'] + df['b']
Clone this wiki locally