Skip to content

Commit

Permalink
docs: add initial array adr
Browse files Browse the repository at this point in the history
  • Loading branch information
neilcampbell committed Dec 10, 2024
1 parent 3091dc0 commit c25f315
Showing 1 changed file with 237 additions and 0 deletions.
237 changes: 237 additions & 0 deletions docs/architecture-decisions/2024-12-10_arrays.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,237 @@
# Puya Arrays

- **Status**: Proposed
- **Owner**: Daniel McGregor (MakerX)
- **Deciders**: Neil Campbell (MakerX), Daniel McGregor (MakerX), Tristan Menzel (MakerX), Larkin Young (Algorand Foundation), David Rojas (Algorand Foundation), Brian Whippo (Algorand Foundation), Joe Polny (Algorand Foundation), Giorgio Ciotti (Algorand Foundation)
- **Date created**: 2024-12-10
- **Date decided**:
- **Date updated**:

## Context

Arrays are a fundamental primitive in a high level languages and as such broad support is desirable, even in a constrained environment like the AVM. Currently Algorand Python only supports ARC-4 encoded arrays that have been designed to be efficient for reading, letting clients interacting with smart contracts manage the additional work required to ARC-4 encode the values. Given they are not efficient for creating or manipulating values, we have room to improve and introduce a more general purpose array.

In addition to how an array is encoded, the other facet to consider is where the array is stored in memory. The AVM supports two different types of memory which can be leveraged when writing a program: the stack and scratch space. These concepts are akin to a stack and heap found in more traditional virtual machine environments like the JVM.

The AVM memory types have some constraints that need to be considered. The stack can hold up to 1000 values, but only 256 are addressable at any point. Scratch slots support up to 256 addressable values. Both support values types of either a single uint64 or a byte slice of up to 4096 bytes. The stack is addressed using relative static offsets whereas scratch slots are addressed using dynamic absolute indexes. The stack is a good fit for value based semantics as each value is a copy, whereas scratch slots are a good fit for reference based semantics as each value can be uniquely addressed.

In our target high level languages (Python and JavaScript), the builtin arrays (or lists) types have reference based semantics and mutable API's. Meaning that when an array is modified, all references to that array see the modification.

The developer experience with Puya Arrays will depend on the API design (mutable vs immutable) and where those arrays reside in memory (stack vs scratch) this ADR will outline the pros and cons of some different approaches with these facets in mind.

## Requirements

- Provide a more efficient array implementation for general purpose use
- Provide an API that communicates clearly the effect of operations and is semantically correct
- Support a broad range of types (including non ARC-4 types)

## Principles

- [AlgoKit Guiding Principles](https://github.com/algorandfoundation/algokit-cli/blob/main/docs/algokit.md#guiding-principles) - specifically Seamless onramp, Meet developers where they are, Leverage existing ecosystem
- [Algorand Python Principles](https://algorandfoundation.github.io/puya/principles.html#principles) - specifically Least surprise and Leverage existing ecosystem by maintaining semantic compatibility.
- [Algorand TypeScript Guiding Principles](https://github.com/algorandfoundation/puya-ts/blob/main/docs/README.md#guiding-principals)

## Options

### Option 1. Support mutable array and element API with a stack based implementation (e.g. ARC4 style)

This option introduces a mutable array API, which is familiar and would closely align with native collection types in all currently supported languages (arrays in TypeScript, lists in Python). A stack based implementation is relatively simple to implement in the AVM as it is a stack based virtual machine. Additionally the existing arc4 arrays are stack based and we'd likely repurpose a large portion of the existing functionality.

However a mutable API implies reference semantics, but a stack based implementation is semantically incompatible with this, so various restrictions are required to make the API semantically compatible.

**Supported Operations**

```typescript
const arr = new Array<uint64>();
// add elements
arr.push(42);
arr.push(123);
// replace element
arr[0] = 123;
// remove element
arr.pop();
// read element
const value = arr[0];
// pass to subroutine without .copy() - additional complexity in compiler to support this
someFunction(arr);

// awkward consequences
// forced copies - enforcing this is complex and has been a source of bugs
const arr2 = arr.copy();
structs.push(someStruct.copy());
```

**Unsupported Operations**

```typescript
const structs = new Array<MutableStruct>();

//unsupported - introducing additional references without a .copy()
const arr2 = arr;
structs.push(someStruct);
structs[0] = someStruct;

// passing an array multiple times
someFunction(arr, arr);
```

**Pros**

- A mutable API supports both mutable and immutable elements and arrays have an API that is similar to the native collection API
- Can repurpose the existing ARC-4 arrays and extend to support non ARC-4 values

**Cons**

- There is a risk of subtle unknown bugs due to the incompatibility of the API and the implementation
- Requires `.copy()` whenever introducing additional references to a mutable array or struct
- Requires additional checks in the compiler to detect when `.copy()` is needed
- To reduce awkwardness, we'd want to reduce the need for `.copy()` where possible i.e. implicitly updating array values when passed to a subroutine
- The total array memory usage is limited to 4096 bytes

### Option 2. Support immutable array and element API with a stack based implementation

This option introduces an immutable array API, which may be less familiar to developers, but is not uncommon (e.g. Redux), however it is a more natural fit for a stack based implementation. Arrays typically have reference semantics, however with an immutable API there isn't an observable difference between an immutable reference based array and an immutable value based array.

**Supported Operations**

```typescript
let arr = new Array<uint64>();

// add elements
arr = arr.concat(42, 123);
// replace element
arr = arr.with(0, 123);
// remove elements
arr = arr.slice(0, 1);
// read element
const value = arr[index];
// pass to subroutine, does not require additional complexity to support this
someReadonlyFunction(arr);
// for a subroutine to return a modified array, it must be an explicitly returned value
arr = someFunctionThatMutates(arr);

let arr2 = new Array<ImmutableStruct>();
arr2 = arr2.concat(someStruct);
// mutating elements requires creating a copy of the new element and the array
arr2 = arr2.with(0, someStruct.evolve((someProp = "someValue")));
```

**Unsupported Operations**

```typescript
// unsupported operations
arr.push(...)
arr.pop(...)
arr.splice(...)
arr2[0].someProp = "someValue"

// mutatating an array without returning the modified copy
someFunctionThatMutates(arr)
```

**Pros**

- The API is semantically compatible with the implementation making it easy to reason about
- It won't require `.copy()` or other "tricks" to ensure semantic compatibility

**Cons**

- Will result in more verbose code as each modification requires reassignment
- Is less familiar and more verbose for those used to mutable array and element API's
- The total array memory usage is limited to 4096 bytes

### Option 3. Support mutable array with immutable element API with a scratch based implementation

This option introduces a scratch based array, which is a better fit for a mutable API as it has reference semantics, which makes for a more familiar developer experience. Because of the constraints that exist with scratch slots, supporting mutable elements could result in exhausting available resources in unpredictable ways. As a result only immutable elements shoud be supported.

**Supported Operations**

```typescript
const arr = new Array<uint64>();
// add elements
arr.push(42);
arr.push(123);
// replace element
arr[0] = 123;
// remove element
arr.pop();
// read element
const value = arr[0];

const structs = new Array<ImmutableStruct>();
// reassigning still points to the same array
const sameStructs = structs;
// can modify array
structs.push(someStruct);
// can replace elements
structs[0] = someStruct.evolve((someProp = "someValue"));
```

**Unsupported Operations**

```typescript
//unsupported mutable types
const structs = new Array<MutableStruct>();
//modifications to elements
structs[0].someProp = "someValue";
```

**Pros**

- A familiar API
- Won't require `.copy()` or other "tricks" to ensure semantic compatibility
- Has predicable scratch slot usage

**Cons**

- Is restricted to immutable elements
- Still has potential to exhaust scratch slot allocations when a large number of arrays are used
- The total array memory usage is limited to 4096 bytes

### Option 4. Support mutable array and element API with a scratch based implementation

This option is similar to option 3, but expands support to include mutable elements. In this implementation each mutable element would reside in it's own slot so that multiple references can all address it independently.

**Supported Operations**

```typescript
const arr = new Array<uint64>()
// add elements
arr.push(42)
arr.push(123)
// replace element
arr[0] = 123
// remove element
arr.pop()
// read element
const value = arr[0]

const structs = new Array<MutableStruct>()
structs.push(someStruct)
// reassigning still points to the same array
const structs2 = structs
// modifying an element updates all referencies
structs2[0].someProp = "someValue"
assert structs[0].someProp == structs2[0].someProp
// can pass the same reference multiple times
// any mutations to either reference will be seen in calling function
someFunctionThatMutates(structs, structs2)
```

**Pros**

- A familiar and fully supported API
- An array of mutable elements is not limited to 4096 bytes, however each element would still have this limit

**Cons**

- Easily possible to exhaust scratch slots quickly when working with an array of mutable elements as each element would require it's own slot
- It's the most complex option to fully implement

## Preferred option

In an ideal world being able to fully support option 4 would be great, however due to the constrained nature of the AVM it's not really feasible. The other options on their own will work, however there is some differences to what developers would be used to. As a result we feel that a combination of both option 2 and option 3 would give developers the familiarity and control they would expect.

## Selected option

We have chosen to implement option 2 and option 3 in the form of ImmutableArray and MutableArray variants. The exact naming and API design will be fleshed out as support is built into the Algorand Python and Algorand TypeScript languages.

0 comments on commit c25f315

Please sign in to comment.