Merge pull request #665 from olynch/comptime-refactor

Refactor ACSets to use CompTime.jl
AlgebraicJulia · Sep 15, 2022 · 17b6965 · 17b6965
2 parents 39637a1 + 63efe26
commit 17b6965
Show file tree

Hide file tree

Showing 13 changed files with 1,308 additions and 824 deletions.
diff --git a/Project.toml b/Project.toml
@@ -6,6 +6,7 @@ version = "0.14.4"
 
 [deps]
 Colors = "5ae59095-9a9b-59fe-a467-6f913c188581"
+CompTime = "0fb5dd42-039a-4ca4-a1d7-89a96eae6d39"
 Compose = "a81c6b42-2e10-5240-aca2-a61377ecd94b"
 DataStructures = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
 GeneralizedGenerated = "6b9d7cbe-bcb9-11e9-073f-15a7a543e2eb"
@@ -27,6 +28,7 @@ Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
 
 [compat]
 Colors = "0.12"
+CompTime = "0.1"
 Compose = "0.7, 0.8, 0.9"
 DataStructures = "0.17, 0.18"
 GeneralizedGenerated = "0.2, 0.3"

diff --git a/docs/src/apis/categorical_algebra.md b/docs/src/apis/categorical_algebra.md
@@ -58,7 +58,7 @@ An acset $$F$$ on a schema consists of...
 
 For those with a categorical background, an acset on a schema $$S$$ consists of a functor from $$S$$ to $$\mathsf{Set}$$, such that objects in $$S^{-1}(0)$$ map to finite sets, and objects in $$S^{-1}(1)$$ map to sets that represent types. For any particular functor $$K \colon S^{-1}(1) \to \mathsf{Set}$$, we can also take the category of acsets that restrict to this map on $$S^{-1}$$.
 
-We can also add relations to this presentation, but we currently do nothing with those relations in the implementation; they mostly serve as documentation.
+We can also add equations to this presentation, but we currently do nothing with those equations in the implementation; they mostly serve as documentation.
 
 We will now give an example of how this all works in practice.
 
@@ -92,22 +92,47 @@ end
 
 ### API
 
-We first give an overview of the data types used in the acset machinery.
+The mathematical abstraction of an acset can of course be implemented in many different ways. Currently, there are three implementations of acsets in Catlab, which share a great deal of code.
 
-`FreeSchema` A finite presentation of a category that will be used as the schema of a database in the *algebraic databases* conception of categorical database theory. Functors out of a schema into FinSet are combinatorial structures over the schema. Attributes in a schema allow you to encode numerical (any julia type) into the database. You can find several examples of schemas in `Catlab.Graphs` where they define categorical versions of graph theory.
+These implementations can be split into two categories.
 
-`CSet/AttributedCSet` is a struct/constructors whose values (tables, indices) are parameterized by a CatDesc/AttrDesc. These are in memory databases over the schema equiped with `ACSetTranformations` as natural transformations that encode relationships between database instances.
+The first category is **static acset types**. In this implementation, different schemas correspond to different Julia types. Methods on these Julia types are then custom-generated for the schema, using [CompTime.jl](https://github.com/AlgebraicJulia/CompTime.jl).
 
-`CSetType/AttributedCSetType`provides a function to construct a julia type for ACSet instances, parameterized by CatDesc/AttrDesc. This function constructs the new type at runtime. In order to have the interactive nature of Julia, and to dynamically construct schemas based on runtime values, we need to define new Julia types at runtime. This function converts the schema spec to the corresponding Julia type.
+Under this category, there are two classes of static acset types. The first class is acset types that are generated using the `@acset_type` macro. These acset types are custom-derived structs. The advantage of this is that the structs have names like `Graph` or `WiringDiagram` that are printed out in error messages. The disadvantage is that if you are taking in schemas at runtime, you have to `eval` code in order to use them.
 
-`CatDesc/AttrDesc` the encoding of a schema into a Julia type. These exist because Julia only allows certain kinds of data in the parameter of a dependent type. Thus, we have to serialize a schema into those primitive data types so that we can use them to parameterize the ACSet type over the schema. This is an implementation detail subject to complete overhaul.
+Here is an example of using `@acset_type`
 
+```julia
+@acset_type WeightedGraph(SchWeightedGraph, index=[:src,:tgt])
+g = WeightedGraph()
+```
+
+The second class is `AnonACSet`s. Like acset types derived from `@acset_type`, these contain the schema in their type. However, they also contain the type of their fields in their types, so the types printed out in error messages are long and ugly. The advantage of these is that they can be used in situations where the schema is passed in at runtime, and they don't require using `eval` to create a new acset type.
+
+Here is an example of using `AnonACSet`
+
+```julia
+const WeightedGraph = AnonACSetType(SchWeightedGraph, index=[:src,:tgt])
+g = WeightedGraph()
+```
+
+The second category is **dynamic acset types**. Currently, there is just one type that falls under this category: `DynamicACSet`. This type has a **field** for the schema, and no code-generation is done for operations on acsets of this type. This means that if the schema is large compared to the data, this type will often be faster than the static acsets.
+
+However, dynamics acsets are a new addition to Catlab, and much of the machinery of limits, colimits, and other high-level acset constructions assumes that the schema of an acset can be derived from the type. Thus, more work will have to be done before dynamic acsets become a drop-in replacement for static acsets.
+
+Here is an example of using a dynamic acset
+
+```julia
+g = DynamicACSet("WeightedGraph", SchWeightedGraph; index=[:src,:tgt])
+```
 
 ```@autodocs
 Modules = [
   CategoricalAlgebra.CSets,
   CategoricalAlgebra.StructuredCospans,
   CategoricalAlgebra.ACSetInterface,
+  CategoricalAlgebra.ACSetColumns,
+  CategoricalAlgebra.CSetDataStructures
 ]
 Private = false
 ```
diff --git a/src/Catlab.jl b/src/Catlab.jl
@@ -5,6 +5,7 @@ include("theories/Theories.jl")
 
 include("categorical_algebra/IndexUtils.jl")
 include("categorical_algebra/ACSetInterface.jl")
+include("categorical_algebra/ACSetColumns.jl")
 include("categorical_algebra/CSetDataStructures.jl")
 include("categorical_algebra/Permutations.jl")
 include("graphs/Graphs.jl")

diff --git a/src/categorical_algebra/ACSetColumns.jl b/src/categorical_algebra/ACSetColumns.jl
@@ -0,0 +1,276 @@
+"""
+An acset column should satisfy the following interface
+
+```julia
+Base.getindex
+Base.setindex!
+Base.values
+clear_index!
+codom_hint!
+preimage
+preimage_multi
+resize_clearing!
+```
+"""
+module ACSetColumns
+export preimage, preimage_multi, clear_index!, clear_indices!, codom_hint!, IndexedVector, resize_clearing!
+
+using ..IndexUtils
+
+
+"""
+This function takes an acset column and an element of the domain and sets the column
+value at that element be "missing"; 0 in the case of an integer, or nothing in the case
+of a type that is a supertype of nothing. It also clears the index of the previous value
+at the element.
+
+The semantics of this function are deeply janky because we don't have proper support
+for partial acsets; this will be fixed soon.
+"""
+function clear_index! end
+
+"""
+This is called to alert the column that there are new values in its codomain; the column
+may then potentially preallocate some space for those new values.
+"""
+function codom_hint! end
+
+"""
+This gets the preimage of a single value in the codomain.
+"""
+function preimage end
+
+"""
+This gets the preimage of several values in the codomain. This is semantically equivalent
+to broadcasting preimage, but a column implementation might instead return a view of
+the index.
+"""
+function preimage_multi end
+
+"""
+This resizes a column, and if the column grows, initializes the new elements to the "missing"
+value: 0 in the case of an integer or nothing in the case of a type that is a supertype of Nothing.
+
+The semantics of this function are deeply janky because we don't have proper support for partial
+acsets; this will be fixed soon. Specifically, if we have values that aren't a supertype of Nothing
+or an integer, we get random uninitialized memory.
+"""
+function resize_clearing! end
+
+# Additionally, the type should be able to be called with no arguments to create an empty column
+
+function clear_indices!(v, idxs)
+  for i in idxs
+    clear_index!(v, i)
+  end
+end
+
+function preimage(v::AbstractVector, x)
+  findall(y -> x == y, v)
+end
+
+function preimage_multi(v::AbstractVector, xs)
+  broadcast(x -> preimage(v,x), xs)
+end
+
+function clear_index!(v::AbstractVector{Int}, i::Int)
+  v[i] = 0
+end
+
+function clear_index!(v::AbstractVector{T}, i::Int) where {T >: Nothing}
+  v[i] = nothing
+end
+
+function clear_index!(v::AbstractVector{T}, i::Int) where {T}
+end
+
+function codom_hint!(v::AbstractVector{T}, n::Int) where {T}
+end
+
+function resize_clearing!(v::AbstractVector{Int}, n::Int)
+  oldn = length(v)
+  resize!(v, n)
+  v[(oldn+1):n] .= 0
+end
+
+function resize_clearing!(v::AbstractVector{T}, n::Int) where {T}
+  resize!(v, n)
+end
+
+struct IndexedVector{T,Index} <: AbstractVector{T}
+  vals::Vector{T}
+  index::Index
+end
+
+function IndexedVector{T,Index}() where {T,Index}
+  IndexedVector{T,Index}(T[],Index())
+end
+
+Base.copy(v::IndexedVector{T,Index}) where {T,Index} =
+  IndexedVector{T,Index}(copy(v.vals), deepcopy(v.index))
+
+Base.size(v::IndexedVector) = size(v.vals)
+
+function Base.getindex(v::IndexedVector{T}, i::Int) where {T}
+  v.vals[i]
+end
+
+function Base.setindex!(v::IndexedVector{T}, x::T, i::Int) where {T}
+  if isassigned(v.vals, i)
+    oldx = v.vals[i]
+    v.vals[i] = x
+    update_index!(v.index, x, oldx, i)
+  else
+    v.vals[i] = x
+    insert_index!(v.index, x, i)
+  end
+end
+
+function insert_index!(index::Vector{Vector{Int}}, x::Int, i::Int)
+  if x != 0
+    insertsorted!(index[x], i)
+  end
+end
+
+function insert_index!(index::Dict{T,Vector{Int}}, x::T, i::Int) where {T}
+  if !isnothing(x)
+    if x ∈ keys(index)
+      insertsorted!(index[x],i)
+    else
+      index[x] = [i]
+    end
+  end
+end
+
+function insert_index!(index::Vector{Int}, x::Int, i::Int)
+  if x != 0
+    @assert index[x] == 0
+    index[x] = i
+  end
+end
+
+function insert_index!(index::Dict{T,Int}, x::T, i::Int) where {T}
+  if !isnothing(x)
+    @assert !(x ∈ keys(index))
+    index[x] = i
+  end
+end
+
+Base.values(v::IndexedVector) = v.vals
+
+function update_index!(index::Vector{Vector{Int}}, x::Int, oldx::Int, i::Int)
+  if 1 ≤ oldx ≤ length(index)
+    deletesorted!(index[oldx], i)
+  end
+  insert_index!(index,x,i)
+end
+
+function update_index!(index::Dict{T,Vector{Int}}, x::T, oldx::T, i::Int) where {T}
+  if oldx ∈ keys(index) # oldx could just be gobbledegook
+    deletesorted!(index[oldx],i)
+  end
+  insert_index!(index,x,i)
+end
+
+function update_index!(index::Vector{Int}, x::Int, oldx::Int, i::Int)
+  if oldx != 0
+    index[oldx] = 0
+  end
+  insert_index!(index,x,i)
+end
+
+function update_index!(index::Dict{T,Int}, x::T, oldx::T, i::Int) where {T}
+  if oldx ∈ keys(index)
+    delete!(index, oldx)
+  end
+  insert_index!(index,x,i)
+end
+
+function resize_clearing!(v::IndexedVector{Int}, n::Int)
+  oldn = length(v.vals)
+  resize!(v.vals, n)
+  v.vals[(oldn+1):n] .= 0
+end
+
+function resize_clearing!(v::IndexedVector{T}, n::Int) where {T}
+  resize!(v.vals, n)
+end
+
+function clear_index!(v::IndexedVector{T}, i::Int) where {T >: Nothing}
+  v[i] = nothing
+end
+
+function clear_index!(v::IndexedVector{Int}, i::Int)
+  v[i] = 0
+end
+
+# There isn't an "empty" variable in this case, but we can still unset the index
+function clear_index!(v::IndexedVector{T,Dict{T,Vector{Int}}}, i::Int) where {T}
+  if isassigned(v.vals, i)
+    oldx = v[i]
+    if oldx ∈ keys(v.index)
+      deletesorted!(v.index[oldx], i)
+    end
+  end
+end
+
+# There isn't an "empty" variable in this case, but we can still unset the index
+function clear_index!(v::IndexedVector{T,Dict{T,Int}}, i::Int) where {T}
+  if isassigned(v.vals, i)
+    oldx = v[i]
+    if oldx ∈ keys(v.index)
+      delete!(v.index, oldx)
+    end
+  end
+end
+
+function preimage(v::IndexedVector{Int, <:Vector{<:Union{Int,Vector{Int}}}}, x::Int)
+  v.index[x]
+end
+
+function preimage(v::IndexedVector{T, Dict{T, Vector{Int}}}, x) where {T}
+  if x ∈ keys(v.index)
+    v.index[x]
+  else
+    []
+  end
+end
+
+function preimage(v::IndexedVector{T, Dict{T, Int}}, x) where {T}
+  if x ∈ keys(v.index)
+    v.index[x]
+  else
+    0
+  end
+end
+
+function preimage_multi(v::IndexedVector{Int, <:Vector{<:Union{Int,Vector{Int}}}},
+                  xs::Union{AbstractVector,UnitRange})
+  @view v.index[xs]
+end
+
+function preimage_multi(v::IndexedVector{T, <:Dict{T, <:Union{Int,Vector{Int}}}},
+                  xs::Union{AbstractVector,UnitRange}) where {T}
+  [preimage(v, x) for x in xs]
+end
+
+
+function codom_hint!(v::IndexedVector{T}, n::Int) where {T}
+  codom_hint_index!(v.index, n)
+end
+
+function codom_hint_index!(index::Vector{Vector{Int}}, n::Int)
+  oldn = length(index)
+  resize!(index, n)
+  for i in (oldn + 1):n
+    index[i] = Vector{Int}[]
+  end
+end
+
+function codom_hint_index!(index::Vector{Int}, n::Int)
+  oldn = length(index)
+  resize!(index, n)
+  index[(oldn + 1):n] .= 0
+end
+
+end