diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index 4e3d3f78..3a6ce2fe 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.9.3","generation_timestamp":"2023-09-20T23:37:18","documenter_version":"1.0.1"}} \ No newline at end of file +{"documenter":{"julia_version":"1.9.3","generation_timestamp":"2023-09-20T23:59:28","documenter_version":"1.0.1"}} \ No newline at end of file diff --git a/dev/api/index.html b/dev/api/index.html index 98ccf6be..cdaa3aea 100644 --- a/dev/api/index.html +++ b/dev/api/index.html @@ -1,12 +1,12 @@ -Library Reference · GATlab.jl

Library Reference

Syntax

GATlab.Syntax.Scopes.BindingType

Binding{T, Sig}

A binding associates some T-typed value to a name and possibly some aliases, disambiguated by a signature in Sig in the case of overloading.

primary is an optional distinguished element of aliases value is the element sig is a way of uniquely distinguishing this element from others with the same name (for example, ⊗ : Ob x Ob -> Ob and Hom x Hom -> Hom)

source
GATlab.Syntax.Scopes.ContextType

A Context is anything which contains an ordered list of scopes.

Scopes within a context are referred to by level, which is their index within this list.

Contexts should overload:

  • getscope(c::Context, level::Int) -> Scope
  • nscopes(c::Context) -> Int
  • hastag(c::Context, tag::ScopeTag) -> Bool
  • hasname(c::Context, name::Symbol) -> Bool
  • getlevel(c::Context, tag::ScopeTag) -> Int
  • getlevel(c::Context, name::Symbol) -> Int
source
GATlab.Syntax.Scopes.IdentType

Ident

An identifier.

tag refers to the scope that this Ident is bound in lid indexes the scope that Ident is bound in name is an optional field for the sake of printing. A variable in a scope might be associated with several names

source
GATlab.Syntax.Scopes.LIDType

A LID (Local ID) indexes a given scope.

Currently, scopes assign LIDs sequentially – this is not a stable guarantee however, and in theory scopes could have "sparse" LIDs.

source
GATlab.Syntax.Scopes.ScopeType

Scope{T, Sig}

In GATlab, we handle overloading and shadowing with a notion of scope. Anything which binds variables introduces a scope, for instance a @theory declaration or a context. For example, a scope with 3 elements:

x::Int = 3
+Library Reference · GATlab.jl

Library Reference

Syntax

GATlab.Syntax.Scopes.BindingType

Binding{T, Sig}

A binding associates some T-typed value to a name and possibly some aliases, disambiguated by a signature in Sig in the case of overloading.

primary is an optional distinguished element of aliases value is the element sig is a way of uniquely distinguishing this element from others with the same name (for example, ⊗ : Ob x Ob -> Ob and Hom x Hom -> Hom)

source
GATlab.Syntax.Scopes.ContextType

A Context is anything which contains an ordered list of scopes.

Scopes within a context are referred to by level, which is their index within this list.

Contexts should overload:

  • getscope(c::Context, level::Int) -> Scope
  • nscopes(c::Context) -> Int
  • hastag(c::Context, tag::ScopeTag) -> Bool
  • hasname(c::Context, name::Symbol) -> Bool
  • getlevel(c::Context, tag::ScopeTag) -> Int
  • getlevel(c::Context, name::Symbol) -> Int
source
GATlab.Syntax.Scopes.IdentType

Ident

An identifier.

tag refers to the scope that this Ident is bound in lid indexes the scope that Ident is bound in name is an optional field for the sake of printing. A variable in a scope might be associated with several names

source
GATlab.Syntax.Scopes.LIDType

A LID (Local ID) indexes a given scope.

Currently, scopes assign LIDs sequentially – this is not a stable guarantee however, and in theory scopes could have "sparse" LIDs.

source
GATlab.Syntax.Scopes.ScopeType

Scope{T, Sig}

In GATlab, we handle overloading and shadowing with a notion of scope. Anything which binds variables introduces a scope, for instance a @theory declaration or a context. For example, a scope with 3 elements:

x::Int = 3
 y::String = "hello"
-x::String = "ex"

This is a valid scope even though there are name collisions, because the signature (in this case, a datatype) disambiguates. Of course, it may not be wise to disambiguate by type, because it is not always possible to infer expected type. In general, one should pick something that can be inferred, and Nothing is always a reasonable choice, which disallows any name collisions.

source
Base.:+Method

Deterministically combine two scope tags into a third

source
GATlab.Syntax.Scopes.hasidentMethod

hasident checks whether an identifier with specified data exists, by attempting to create it and returning whether or not that attempt failed.

source
GATlab.Syntax.Scopes.identMethod

ident creates an Ident from a context and some partial data supplied as keywords.

Keywords arguments:

  • tag::Union{ScopeTag, Nothing}. The tag of the scope that the Ident is in.
  • name::Union{Symbol, Nothing}. The name of the identifier.
  • lid::Union{LID, Nothing}. The lid of the identifier within its scope.
  • level::Union{Int, Nothing}. The level of the scope within the context.
  • sig::Any. The signature of the identifier, to disambiguate between multiple identifiers with the same name within the same scope.
  • strict::Bool. If strict is true, throw an error if not found, else return nothing.
  • isunique::Bool. If isunique is true, then don't use the signature to disambiguate, instead fail if their are multiple identifiers with the same name in a scope.
source
GATlab.Syntax.Scopes.renameMethod

rename(tag::ScopeTag, replacements::Dict{Symbol, Symbol}, x::T) where {T} -> T

Recurse through the structure of x, and change any name n in scope tag to get(replacements, n, n). Overload this function on structs that have names in them.

source
GATlab.Syntax.Scopes.retagMethod

retag(replacements::Dict{ScopeTag, ScopeTag}, x::T) where {T} -> T

Recurse through the structure of x, swapping any instance of a ScopeTag t with get(replacements, t, t). Overload this function on structs that have ScopeTags within them.

source
GATlab.Syntax.GATs.AlgSortsType

AlgSorts

A description of the argument sorts for a term constructor, used to disambiguate multiple term constructors of the same name.

source
GATlab.Syntax.GATs.AlgTermType

AlgTerm

One syntax tree to rule all the terms. Head can be a reference to an AlgTermConstructor, to a Binding{AlgType, Nothing}, or simply an AbstractConstant

source
GATlab.Syntax.GATs.GATType

GAT

A generalized algebraic theory. Essentially, just consists of a name and a list of GATSegments, but there is also some caching to make access faster. Specifically, there is a dictionary to map ScopeTag to position in the list of segments, and there are lists of all of the identifiers for term constructors, type constructors, and axioms so that they can be iterated through faster.

GATs allow overloading but not shadowing.

source
GATlab.Syntax.GATs.equationsMethod

Implicit equations defined by a context.

This function allows a generalized algebraic theory (GAT) to be expressed as an essentially algebraic theory, i.e., as partial functions whose domains are defined by equations.

References:

  • (Cartmell, 1986, Sec 6: "Essentially algebraic theories and categories with finite limits")
  • (Freyd, 1972, "Aspects of topoi")

This function gives expressions for computing each of the elements of context from the args, as well as checking that the args are well-typed.

Example:

equations({f::Hom(a,b), g::Hom(b,c)}, {a::Ob, b::Ob, c::Ob}, ThCategory)

waysofcomputing = Dict(a => [dom(f)], b => [codom(f), dom(g)], c => [codom(g)], f => [f], g => [g])

Algorithm:

Start from the arguments. We know how to compute each of the arguments; they are given. Each argument tells us how to compute other arguments, and also elements of the context

source
GATlab.Syntax.GATs.infer_typeMethod

Infer the type of the term of a term. If it is not in context, recurse on its arguments. The term constructor's output type yields the resulting type once its localcontext variables are substituted with the relevant AlgTerms.

          (x,y,z)::Ob, p::Hom(x,y), q::Hom(y,z)

E.g. given ––––––––––––––––––– id(x)⋅(p⋅q)

             (a,b,c)::Ob, f::Hom(a,b), g::Hom(b,c)

and output type: –––––––––––––––––– Hom(a,c)

We first recursively find {id(x) => Hom(x,x), p⋅q => Hom(x,z)}. We ultimately want an AlgTerm for everything in the output type's context such that we can substitute into Hom(a,c) to get the final answer. It will help to also compute the AlgType for everything in the context. We work backwards, since we start by knowing {f => id(x)::Hom(x,x), g=> p⋅q :: Hom(x,z)}. For a b and c, we use equations which tell us, e.g., that a = dom(f). So we can grab the first argument of the type of f (i.e. grab x from Hom(x,x)).

source
GATlab.Syntax.GATs.parsetypescopeMethod

Keep track of variables already bound (e.g. in local context) so that they need not be redefined, e.g. compose(f,g::Hom(b,c)) ⊣ [(a,b,c)::Ob, f::Hom(a,b)] (If f were not defined in the local context, it would be parsed as default.)

source
GATlab.Syntax.GATs.sortcheckMethod

sortcheck(ctx::Context, t::AlgTerm)

Throw an error if a the head of an AlgTerm (which refers to a term constructor) has arguments of the wrong sort. Returns the sort of the term.

source
GATlab.Syntax.GATs.sortcheckMethod

sortcheck(ctx::Context, t::AlgType)

Throw an error if a the head of an AlgType (which refers to a type constructor) has arguments of the wrong sort.

source
GATlab.Syntax.Presentations.PresentationType

A presentation has a set of generators, given by a TypeScope, and a set of equations among terms which can refer to those generators. Each element of eqs is a list of terms which are asserted to be equal.

source
GATlab.Syntax.ExprInterop.fromexprMethod

Parse, e.g.:

(a,b,c)::Ob
+x::String = "ex"

This is a valid scope even though there are name collisions, because the signature (in this case, a datatype) disambiguates. Of course, it may not be wise to disambiguate by type, because it is not always possible to infer expected type. In general, one should pick something that can be inferred, and Nothing is always a reasonable choice, which disallows any name collisions.

source
Base.:+Method

Deterministically combine two scope tags into a third

source
GATlab.Syntax.Scopes.hasidentMethod

hasident checks whether an identifier with specified data exists, by attempting to create it and returning whether or not that attempt failed.

source
GATlab.Syntax.Scopes.identMethod

ident creates an Ident from a context and some partial data supplied as keywords.

Keywords arguments:

  • tag::Union{ScopeTag, Nothing}. The tag of the scope that the Ident is in.
  • name::Union{Symbol, Nothing}. The name of the identifier.
  • lid::Union{LID, Nothing}. The lid of the identifier within its scope.
  • level::Union{Int, Nothing}. The level of the scope within the context.
  • sig::Any. The signature of the identifier, to disambiguate between multiple identifiers with the same name within the same scope.
  • strict::Bool. If strict is true, throw an error if not found, else return nothing.
  • isunique::Bool. If isunique is true, then don't use the signature to disambiguate, instead fail if their are multiple identifiers with the same name in a scope.
source
GATlab.Syntax.Scopes.renameMethod

rename(tag::ScopeTag, replacements::Dict{Symbol, Symbol}, x::T) where {T} -> T

Recurse through the structure of x, and change any name n in scope tag to get(replacements, n, n). Overload this function on structs that have names in them.

source
GATlab.Syntax.Scopes.retagMethod

retag(replacements::Dict{ScopeTag, ScopeTag}, x::T) where {T} -> T

Recurse through the structure of x, swapping any instance of a ScopeTag t with get(replacements, t, t). Overload this function on structs that have ScopeTags within them.

source
GATlab.Syntax.GATs.AlgSortsType

AlgSorts

A description of the argument sorts for a term constructor, used to disambiguate multiple term constructors of the same name.

source
GATlab.Syntax.GATs.AlgTermType

AlgTerm

One syntax tree to rule all the terms. Head can be a reference to an AlgTermConstructor, to a Binding{AlgType, Nothing}, or simply an AbstractConstant

source
GATlab.Syntax.GATs.GATType

GAT

A generalized algebraic theory. Essentially, just consists of a name and a list of GATSegments, but there is also some caching to make access faster. Specifically, there is a dictionary to map ScopeTag to position in the list of segments, and there are lists of all of the identifiers for term constructors, type constructors, and axioms so that they can be iterated through faster.

GATs allow overloading but not shadowing.

source
GATlab.Syntax.GATs.equationsMethod

Implicit equations defined by a context.

This function allows a generalized algebraic theory (GAT) to be expressed as an essentially algebraic theory, i.e., as partial functions whose domains are defined by equations.

References:

  • (Cartmell, 1986, Sec 6: "Essentially algebraic theories and categories with finite limits")
  • (Freyd, 1972, "Aspects of topoi")

This function gives expressions for computing each of the elements of context from the args, as well as checking that the args are well-typed.

Example:

equations({f::Hom(a,b), g::Hom(b,c)}, {a::Ob, b::Ob, c::Ob}, ThCategory)

waysofcomputing = Dict(a => [dom(f)], b => [codom(f), dom(g)], c => [codom(g)], f => [f], g => [g])

Algorithm:

Start from the arguments. We know how to compute each of the arguments; they are given. Each argument tells us how to compute other arguments, and also elements of the context

source
GATlab.Syntax.GATs.infer_typeMethod

Infer the type of the term of a term. If it is not in context, recurse on its arguments. The term constructor's output type yields the resulting type once its localcontext variables are substituted with the relevant AlgTerms.

          (x,y,z)::Ob, p::Hom(x,y), q::Hom(y,z)

E.g. given ––––––––––––––––––– id(x)⋅(p⋅q)

             (a,b,c)::Ob, f::Hom(a,b), g::Hom(b,c)

and output type: –––––––––––––––––– Hom(a,c)

We first recursively find {id(x) => Hom(x,x), p⋅q => Hom(x,z)}. We ultimately want an AlgTerm for everything in the output type's context such that we can substitute into Hom(a,c) to get the final answer. It will help to also compute the AlgType for everything in the context. We work backwards, since we start by knowing {f => id(x)::Hom(x,x), g=> p⋅q :: Hom(x,z)}. For a b and c, we use equations which tell us, e.g., that a = dom(f). So we can grab the first argument of the type of f (i.e. grab x from Hom(x,x)).

source
GATlab.Syntax.GATs.parsetypescopeMethod

Keep track of variables already bound (e.g. in local context) so that they need not be redefined, e.g. compose(f,g::Hom(b,c)) ⊣ [(a,b,c)::Ob, f::Hom(a,b)] (If f were not defined in the local context, it would be parsed as default.)

source
GATlab.Syntax.GATs.sortcheckMethod

sortcheck(ctx::Context, t::AlgTerm)

Throw an error if a the head of an AlgTerm (which refers to a term constructor) has arguments of the wrong sort. Returns the sort of the term.

source
GATlab.Syntax.GATs.sortcheckMethod

sortcheck(ctx::Context, t::AlgType)

Throw an error if a the head of an AlgType (which refers to a type constructor) has arguments of the wrong sort.

source
GATlab.Syntax.Presentations.PresentationType

A presentation has a set of generators, given by a TypeScope, and a set of equations among terms which can refer to those generators. Each element of eqs is a list of terms which are asserted to be equal.

source
GATlab.Syntax.ExprInterop.toexprFunction

toexpr(c::Context, t) -> Any

Converts GATlab syntax into an Expr that can be read in via fromexpr to get the same thing. Crucially, the output of this will depend on the order of the scopes in c, and if read back in with a different c may end up with different results.

source

Models

GATlab.Models.ModelInterface.implementsMethod

implements(m::Model, tag::ScopeTag) -> Union{ImplementationNotes, Nothing}

If m implements the GATSegment referred to by tag, then return the corresponding implementation notes.

source
GATlab.Syntax.ExprInterop.toexprFunction

toexpr(c::Context, t) -> Any

Converts GATlab syntax into an Expr that can be read in via fromexpr to get the same thing. Crucially, the output of this will depend on the order of the scopes in c, and if read back in with a different c may end up with different results.

source

Models

GATlab.Models.ModelInterface.implementsMethod

implements(m::Model, tag::ScopeTag) -> Union{ImplementationNotes, Nothing}

If m implements the GATSegment referred to by tag, then return the corresponding implementation notes.

source
GATlab.Models.ModelInterface.@instanceMacro

Usage:

struct TypedFinSetC <: Model{Tuple{Vector{Int}, Vector{Int}}}
   ntypes::Int
 end
 
@@ -27,7 +27,7 @@
 end
 
 @instance ThCategory{Tuple{Ob, Hom}, Hom} [model::SliceCat{Ob, Hom, C}] where {Ob, Hom, C<:Model{Tuple{Ob, Hom}}} begin
-end
source
GATlab.Models.SymbolicModels.GATExprType

Base type for expressions in the syntax of a GAT. This is an alternative to AlgTerm used for backwards compatibility with Catlab.

We define Julia types for each type constructor in the theory, e.g., object, morphism, and 2-morphism in the theory of 2-categories. Of course, Julia's type system does not support dependent types, so the type parameters are incorporated in the Julia types. (They are stored as extra data in the expression instances.)

The concrete types are structurally similar to the core type Expr in Julia. However, the term constructor is represented as a type parameter, rather than as a head field. This makes dispatch using Julia's type system more convenient.

source
Base.nameofMethod

Get name of GAT generator expression as a Symbol.

If the generator has no name, returns nothing.

source
GATlab.Models.SymbolicModels.functorMethod

Functor from GAT expression to GAT instance.

Strictly speaking, we should call these "structure-preserving functors" or, better, "model homomorphisms of GATs". But this is a category theory library, so we'll go with the simpler "functor".

A functor is completely determined by its action on the generators. There are several ways to specify this mapping:

  1. Specify a Julia instance type for each GAT type, using the required types tuple. For this to work, the generator constructors must be defined for the instance types.

  2. Explicitly map each generator term to an instance value, using the generators dictionary.

  3. For each GAT type (e.g., object and morphism), specify a function mapping generator terms of that type to an instance value, using the terms dictionary.

The terms dictionary can also be used for special handling of non-generator expressions. One use case for this capability is defining forgetful functors, which map non-generators to generators.

FIXME

source
GATlab.Models.SymbolicModels.@symbolic_modelMacro

@symbolic_model generates the free model of a theory, generated by symbols.

This is backwards compatible with the @syntax macro in Catlab.

One way of thinking about this is that for every type constructor, we add an additional term constructor that builds a "symbolic" element of that type from any Julia value. This term constructor is called "generator".

An invocation of @symbolic_model creates the following.

  1. A module with a struct for each type constructor, which has a single type

parameter T and two fields, args and type_args. Instances of this struct are thought of as elements of the type given by the type constructor applied to type_args. The type parameter refers to the term constructor that was used to generate the element, including the special term constructor :generator, which has as its single argument an Any.

For instance, in the theory of categories, we might have the following elements.

using .FreeCategory
+end
source
GATlab.Models.SymbolicModels.GATExprType

Base type for expressions in the syntax of a GAT. This is an alternative to AlgTerm used for backwards compatibility with Catlab.

We define Julia types for each type constructor in the theory, e.g., object, morphism, and 2-morphism in the theory of 2-categories. Of course, Julia's type system does not support dependent types, so the type parameters are incorporated in the Julia types. (They are stored as extra data in the expression instances.)

The concrete types are structurally similar to the core type Expr in Julia. However, the term constructor is represented as a type parameter, rather than as a head field. This makes dispatch using Julia's type system more convenient.

source
Base.nameofMethod

Get name of GAT generator expression as a Symbol.

If the generator has no name, returns nothing.

source
GATlab.Models.SymbolicModels.functorMethod

Functor from GAT expression to GAT instance.

Strictly speaking, we should call these "structure-preserving functors" or, better, "model homomorphisms of GATs". But this is a category theory library, so we'll go with the simpler "functor".

A functor is completely determined by its action on the generators. There are several ways to specify this mapping:

  1. Specify a Julia instance type for each GAT type, using the required types tuple. For this to work, the generator constructors must be defined for the instance types.

  2. Explicitly map each generator term to an instance value, using the generators dictionary.

  3. For each GAT type (e.g., object and morphism), specify a function mapping generator terms of that type to an instance value, using the terms dictionary.

The terms dictionary can also be used for special handling of non-generator expressions. One use case for this capability is defining forgetful functors, which map non-generators to generators.

FIXME

source
GATlab.Models.SymbolicModels.@symbolic_modelMacro

@symbolic_model generates the free model of a theory, generated by symbols.

This is backwards compatible with the @syntax macro in Catlab.

One way of thinking about this is that for every type constructor, we add an additional term constructor that builds a "symbolic" element of that type from any Julia value. This term constructor is called "generator".

An invocation of @symbolic_model creates the following.

  1. A module with a struct for each type constructor, which has a single type

parameter T and two fields, args and type_args. Instances of this struct are thought of as elements of the type given by the type constructor applied to type_args. The type parameter refers to the term constructor that was used to generate the element, including the special term constructor :generator, which has as its single argument an Any.

For instance, in the theory of categories, we might have the following elements.

using .FreeCategory
 x = Ob{:generator}([:x], [])
 y = Ob{:generator}([:y], [])
 f = Hom{:generator}([:f], [x, y])
@@ -84,4 +84,4 @@
 
 function ThCategory.compose(f::FreeCategory.Hom, g::FreeCategory.Hom; strict=true)
   associate_unit(FreeCategory.compose(f, g; strict), id)
-end
source

Utilities

+end
source

Utilities

diff --git a/dev/concepts/models/index.html b/dev/concepts/models/index.html index 91399a89..c1cced1a 100644 --- a/dev/concepts/models/index.html +++ b/dev/concepts/models/index.html @@ -1,2 +1,2 @@ -Model families · GATlab.jl

Model families

The semantics part of GATlab is the model family infrastructure.

Given a theory, one can declare a model family for that theory. This consists of the following.

  1. A struct T. Each value of the struct is a model of the theory. The struct subtypes Model{Th, Tuple{Ts...}}. Th is the theory that the struct is a model family for, and Ts specifies types for each of the type constructors in Theory.

  2. For each type constructor i in Theory, an overload of checkvalidity of the form

    checkvalidity(m::T, ::TypCon{i}, args..., val) = ...

    This checks that val is a valid instance of i applied to args..., assuming that args... have been previously checked to be valid.

    This can then be applied using, for instance

    checkvalidity(m, Category.Hom, x, y, f)

    using the singleton structs defined in Theories

  3. For each term constructor i in Theory, an overload of ap of the form

    ap(m::T, ::TrmCon{i}, args...) = ...

    Here args... is to be interpreted as elements of the full context of the term constructor, not just the direct arguments of the term constructor itself. The reason for this is that the implementation of ap might need the data of, say, the domain and codomain of morphisms in order to compose them, and because we have an indexed form of dependent types, this information is not available from the morphism data itself.

+Model families · GATlab.jl

Model families

The semantics part of GATlab is the model family infrastructure.

Given a theory, one can declare a model family for that theory. This consists of the following.

  1. A struct T. Each value of the struct is a model of the theory. The struct subtypes Model{Th, Tuple{Ts...}}. Th is the theory that the struct is a model family for, and Ts specifies types for each of the type constructors in Theory.

  2. For each type constructor i in Theory, an overload of checkvalidity of the form

    checkvalidity(m::T, ::TypCon{i}, args..., val) = ...

    This checks that val is a valid instance of i applied to args..., assuming that args... have been previously checked to be valid.

    This can then be applied using, for instance

    checkvalidity(m, Category.Hom, x, y, f)

    using the singleton structs defined in Theories

  3. For each term constructor i in Theory, an overload of ap of the form

    ap(m::T, ::TrmCon{i}, args...) = ...

    Here args... is to be interpreted as elements of the full context of the term constructor, not just the direct arguments of the term constructor itself. The reason for this is that the implementation of ap might need the data of, say, the domain and codomain of morphisms in order to compose them, and because we have an indexed form of dependent types, this information is not available from the morphism data itself.

diff --git a/dev/concepts/overview/index.html b/dev/concepts/overview/index.html index a35b0c03..4421bdb1 100644 --- a/dev/concepts/overview/index.html +++ b/dev/concepts/overview/index.html @@ -1,4 +1,4 @@ For Catlab users · GATlab.jl

For Catlab users

GATlab is a refactoring of the core GAT system in Catlab. There are three main differences between the GATlab GAT system and the Catlab GAT system, in addition to a variety of new features enabled by these differences.

1. Models as values

Instances (or models) of GATs in Catlab worked on a system analogous to Haskell type classes. That is, given a type and a GAT, that type could be involved in at most one instance of the GAT. However, there are many, many categories that have, for instance, integers as objects. The way to fix this was to make lots of "newtype" wrappers, like

struct FinSetInt
   n::Int
-end

This is not ideal, but the larger problem is that this made it difficult to have models of a GAT that were parameterized by other models, because there was no way of referring to a particular model. So constructions like slice categories, dual categories, or cospan categories were not easy to do.

In GATlab, we instead do models of GATs in a more "Standard ML" style. That is, each model is a value that can be passed around dynamically. Then the implementations of the GAT term constructors are attached via multiple dispatch to that value. In simple cases, that value might be the unique instance of a zero-field struct, but more generally there could be more data inside the value, as is the case for slice categories. If the struct has fields, then we can think of it like a "functor" in ML parlance (which has little to do with the standard categorical use of the word functor); i.e. it is a parameterized model of the theory.

Sometimes we refer to this by "categories as contexts", because the category (i.e., the model of the theory of categories) serves as a context for disambiguating which method to use, and this is a pun off of the famous category theory book "Categories in Context" by Emily Riehl.

2. E-Graphs

An e-graph is a data structure used to store congruence relations on a collections of terms. A congruence relation is an equivalence relation with the property that if $x_{i} \sim x_{i}'$, then $f(x_{1},\ldots,x_{n}) \sim f(x_{1}',\ldots,x_{n}')$ for any term constructor $f$. E-graphs are used for a variety of purposes, including:

  • finding simple representations of terms
  • deciding whether two terms are equal

The syntax systems we use for GATs in GATlab are built from the ground-up to be compatible with a purpose-built e-graph implementation. This allows us to make use of the axioms in a GAT to rewrite terms and perform equational reasoning.

Additionally, it allows us to perform a weak form of typechecking. Typechecking in general for a GAT requires deciding the equality of arbitrary terms, which is undecidable. However, we can typecheck "up to" the collection of equivalences that we have recorded in an e-graph. Any well-typed term can be typechecked using this procedure by first placing the necessary equalities in the e-graph and then running the type-checking algorithm. Moreover, whenever this algorithm finds a positive result, one can be sure that the term does in fact typecheck. This is analogous to the termination-checking in proof assistants; we circumvent undecidability by putting the burden on the user to give us some additional data to show that their program is valid. And in many circumstances, very little additional data is required. For instance, in the bare theory of categories, there are no term constructors that produce objects. So checking whether two objects are equal in a presentation of a category is a simple problem of computing an equivalence relation (instead of a congruence relation). Thus, typechecking is very easy in this circumstance.

3. Indexed dependent types in models

A morphism between the finite sets $\{1,\ldots,n\}$ and $\{1,\ldots,m\}$ is simply a length-$n$ vector whose entries are in $\{1,\ldots,m\}$. However, in Catlab we cannot make a model of the theory of categories which has Vector{Int} as its type for morphisms. This is because we require an implementation of dom and codom to have a model, and it is not possible to infer the size of the codomain from a Vector{Int}.

Mathematically, this can be interpreted as a requirement that models of GATs interpret the dependent types as fibered. I.e. the dependent type Hom(a::Ob,b::Ob) gets interpreted as a Julia type Hom with functions dom, codom from Hom to Ob.

The practical downside to this is that if you want to store several morphisms that have the same domain/codomain, you have to redundantly store the domain and codomain. So although a C-set consists of a collection of objects and morphisms in FinSet, we do not actually define it in this way in Catlab, because we don't want to redundantly store the objects in each morphism.

In GATlab, we fix this in the following way. When you declares a model in GATlab, you associate types to it that correspond to the "bare" data. I.e., we would just associate Vector{Int} to Hom in an implementation of FinSet. Then various methods expect you to pass in the domain and codomain of a morphism in addition to the data of the morphism, rather than assuming that you can infer the domain and codomain from the data.

Mathematically, we interpret this by saying that we interpret the dependent types as indexed. That is, we assign a set to each pair of objects, and those sets can intersect. I.e., Hom(a,b) could intersect non-trivially Hom(c,d). So whenever we handle a morphism, we need external data that tells us what the domain and codomain is.

Notice that from such an implementation, we can automatically derive a wrapper type which contains domain, codomain, and data. But we can't go the other way; we can't automatically "pick out" what is data and what is just recording domain/codomain from the fibered implementation.

Although mathematically, fibered and indexed dependent types are equivalent, they have different implementations, and we hope that the indexed point of view will end up being more natural for our purposes.

+end

This is not ideal, but the larger problem is that this made it difficult to have models of a GAT that were parameterized by other models, because there was no way of referring to a particular model. So constructions like slice categories, dual categories, or cospan categories were not easy to do.

In GATlab, we instead do models of GATs in a more "Standard ML" style. That is, each model is a value that can be passed around dynamically. Then the implementations of the GAT term constructors are attached via multiple dispatch to that value. In simple cases, that value might be the unique instance of a zero-field struct, but more generally there could be more data inside the value, as is the case for slice categories. If the struct has fields, then we can think of it like a "functor" in ML parlance (which has little to do with the standard categorical use of the word functor); i.e. it is a parameterized model of the theory.

Sometimes we refer to this by "categories as contexts", because the category (i.e., the model of the theory of categories) serves as a context for disambiguating which method to use, and this is a pun off of the famous category theory book "Categories in Context" by Emily Riehl.

2. E-Graphs

An e-graph is a data structure used to store congruence relations on a collections of terms. A congruence relation is an equivalence relation with the property that if $x_{i} \sim x_{i}'$, then $f(x_{1},\ldots,x_{n}) \sim f(x_{1}',\ldots,x_{n}')$ for any term constructor $f$. E-graphs are used for a variety of purposes, including:

The syntax systems we use for GATs in GATlab are built from the ground-up to be compatible with a purpose-built e-graph implementation. This allows us to make use of the axioms in a GAT to rewrite terms and perform equational reasoning.

Additionally, it allows us to perform a weak form of typechecking. Typechecking in general for a GAT requires deciding the equality of arbitrary terms, which is undecidable. However, we can typecheck "up to" the collection of equivalences that we have recorded in an e-graph. Any well-typed term can be typechecked using this procedure by first placing the necessary equalities in the e-graph and then running the type-checking algorithm. Moreover, whenever this algorithm finds a positive result, one can be sure that the term does in fact typecheck. This is analogous to the termination-checking in proof assistants; we circumvent undecidability by putting the burden on the user to give us some additional data to show that their program is valid. And in many circumstances, very little additional data is required. For instance, in the bare theory of categories, there are no term constructors that produce objects. So checking whether two objects are equal in a presentation of a category is a simple problem of computing an equivalence relation (instead of a congruence relation). Thus, typechecking is very easy in this circumstance.

3. Indexed dependent types in models

A morphism between the finite sets $\{1,\ldots,n\}$ and $\{1,\ldots,m\}$ is simply a length-$n$ vector whose entries are in $\{1,\ldots,m\}$. However, in Catlab we cannot make a model of the theory of categories which has Vector{Int} as its type for morphisms. This is because we require an implementation of dom and codom to have a model, and it is not possible to infer the size of the codomain from a Vector{Int}.

Mathematically, this can be interpreted as a requirement that models of GATs interpret the dependent types as fibered. I.e. the dependent type Hom(a::Ob,b::Ob) gets interpreted as a Julia type Hom with functions dom, codom from Hom to Ob.

The practical downside to this is that if you want to store several morphisms that have the same domain/codomain, you have to redundantly store the domain and codomain. So although a C-set consists of a collection of objects and morphisms in FinSet, we do not actually define it in this way in Catlab, because we don't want to redundantly store the objects in each morphism.

In GATlab, we fix this in the following way. When you declares a model in GATlab, you associate types to it that correspond to the "bare" data. I.e., we would just associate Vector{Int} to Hom in an implementation of FinSet. Then various methods expect you to pass in the domain and codomain of a morphism in addition to the data of the morphism, rather than assuming that you can infer the domain and codomain from the data.

Mathematically, we interpret this by saying that we interpret the dependent types as indexed. That is, we assign a set to each pair of objects, and those sets can intersect. I.e., Hom(a,b) could intersect non-trivially Hom(c,d). So whenever we handle a morphism, we need external data that tells us what the domain and codomain is.

Notice that from such an implementation, we can automatically derive a wrapper type which contains domain, codomain, and data. But we can't go the other way; we can't automatically "pick out" what is data and what is just recording domain/codomain from the fibered implementation.

Although mathematically, fibered and indexed dependent types are equivalent, they have different implementations, and we hope that the indexed point of view will end up being more natural for our purposes.

diff --git a/dev/concepts/scopes/index.html b/dev/concepts/scopes/index.html index 6a17ec7d..8fbc79a9 100644 --- a/dev/concepts/scopes/index.html +++ b/dev/concepts/scopes/index.html @@ -1,2 +1,2 @@ -Scopes · GATlab.jl

Scopes

Design rationale:

Namespacing, i.e. references which look like britain.economy.GDP, should be first class. This doesn't play nicely with the internal representation of names via deBruijn levels; we don't want to do a bunch of math to convert between flattened and nested versions of contexts. Conceptually, we should then switch our identifiers to be lists of deBruijn levels.

But there are also significant subtleties in the translation between named and nameless representations of identifiers. Also in practice if we are referencing something that might change over time, like a C-set, we don't want to use deBruijn levels, because we don't want have to change all of our identifiers when we delete something, so in the rest of this I will instead say "local identifier" instead of deBruijn level to refer to a nameless representation of an identifier.

(the name "local identifier" and also some of this philosophy is taken from chit)

In general, naming is given by a relation between symbols and bindings (as refered to by local identifiers). Each binding may be associated with zero or more symbols, and each symbol might be associated with zero or more bindings.

We name the failure of this relation to be a bijection in several ways.

  • If more than one symbol is related to a single binding, we say that this

binding has aliases. In this case, we should also have a primary alias used as a default.

  • If more than one binding is related to a single symbol, we have name overlap.

Name overlap can be resolved by overloading and shadowing, we will talk about this later on.

  • If no symbol is related to a binding, such as when the binding is created

programmatically, we need nameless references to be able to refer to the binding, which are also created programmatically.

  • If no binding is related to a symbol, then we have a name error; this name

is not valid in the current context and we must throw an error.

The most important thing to get right is name overlap; we discuss this first.

There are two ways in which to resolve name overlap. The first way is via overloading. In overloading, we disambiguate an ambiguous name via its context, for instance what type it is expected to be, or what arguments is it receiving. The second way is via shadowing. In shadowing, a name is resolved by making it point to the "most recently defined" thing with that name.

In GATlab, we handle overloading and shadowing with a notion of scope. Anything which binds variables introduces a scope, for instance a @theory declaration or a context. Each scope is identified with a ScopeTag, which is an opaque identifier (i.e. a UUID). We take this idea from Racket, but we don't need the more complex "scope sets" from Racket because GATs don't need to process the syntax of other GATs; if we start doing this we'll have to rethink the design, but we'll probably have bigger problems too.

Two variables with the same name within a single scope are meant to overload one another. For instance, this is what happens with mcompose in the theory of monoidal categories; it is overloaded for Hom and Ob, and this is what happens with d in the discrete exterior calculus; it is overloaded for all of the (k+1)-forms. Variables in different scopes with the same name shadow one another. When we parse an expression a + b, we do so in an ordered list of scopes, and the most recent scope "wins".

Parsing turns a Symbol into an Ident. The Ident contains the original Symbol for printing, but it also contains a reference to a scope via ScopeTag, and an local identifier ("lid") within that scope. Thus, the name in the Ident is never needed for later stages of programatic manipulation.

Scopes cache the relation between names and bindings, by providing a way to go from binding (as reified by local identifier) to a set of aliases with distinguished primary alias, and to go from name to the set of bindings that have that name that we need to disambiguate between in the case of an overload.

Because Idents are fully unambiguous even without their name, it is also possible to create an Ident with no name, which solves the problem of nameless references.

+Scopes · GATlab.jl

Scopes

Design rationale:

Namespacing, i.e. references which look like britain.economy.GDP, should be first class. This doesn't play nicely with the internal representation of names via deBruijn levels; we don't want to do a bunch of math to convert between flattened and nested versions of contexts. Conceptually, we should then switch our identifiers to be lists of deBruijn levels.

But there are also significant subtleties in the translation between named and nameless representations of identifiers. Also in practice if we are referencing something that might change over time, like a C-set, we don't want to use deBruijn levels, because we don't want have to change all of our identifiers when we delete something, so in the rest of this I will instead say "local identifier" instead of deBruijn level to refer to a nameless representation of an identifier.

(the name "local identifier" and also some of this philosophy is taken from chit)

In general, naming is given by a relation between symbols and bindings (as refered to by local identifiers). Each binding may be associated with zero or more symbols, and each symbol might be associated with zero or more bindings.

We name the failure of this relation to be a bijection in several ways.

  • If more than one symbol is related to a single binding, we say that this

binding has aliases. In this case, we should also have a primary alias used as a default.

  • If more than one binding is related to a single symbol, we have name overlap.

Name overlap can be resolved by overloading and shadowing, we will talk about this later on.

  • If no symbol is related to a binding, such as when the binding is created

programmatically, we need nameless references to be able to refer to the binding, which are also created programmatically.

  • If no binding is related to a symbol, then we have a name error; this name

is not valid in the current context and we must throw an error.

The most important thing to get right is name overlap; we discuss this first.

There are two ways in which to resolve name overlap. The first way is via overloading. In overloading, we disambiguate an ambiguous name via its context, for instance what type it is expected to be, or what arguments is it receiving. The second way is via shadowing. In shadowing, a name is resolved by making it point to the "most recently defined" thing with that name.

In GATlab, we handle overloading and shadowing with a notion of scope. Anything which binds variables introduces a scope, for instance a @theory declaration or a context. Each scope is identified with a ScopeTag, which is an opaque identifier (i.e. a UUID). We take this idea from Racket, but we don't need the more complex "scope sets" from Racket because GATs don't need to process the syntax of other GATs; if we start doing this we'll have to rethink the design, but we'll probably have bigger problems too.

Two variables with the same name within a single scope are meant to overload one another. For instance, this is what happens with mcompose in the theory of monoidal categories; it is overloaded for Hom and Ob, and this is what happens with d in the discrete exterior calculus; it is overloaded for all of the (k+1)-forms. Variables in different scopes with the same name shadow one another. When we parse an expression a + b, we do so in an ordered list of scopes, and the most recent scope "wins".

Parsing turns a Symbol into an Ident. The Ident contains the original Symbol for printing, but it also contains a reference to a scope via ScopeTag, and an local identifier ("lid") within that scope. Thus, the name in the Ident is never needed for later stages of programatic manipulation.

Scopes cache the relation between names and bindings, by providing a way to go from binding (as reified by local identifier) to a set of aliases with distinguished primary alias, and to go from name to the set of bindings that have that name that we need to disambiguate between in the case of an overload.

Because Idents are fully unambiguous even without their name, it is also possible to create an Ident with no name, which solves the problem of nameless references.

diff --git a/dev/concepts/theories/index.html b/dev/concepts/theories/index.html index 2b038839..d514b6db 100644 --- a/dev/concepts/theories/index.html +++ b/dev/concepts/theories/index.html @@ -11,4 +11,4 @@ struct compose <: TrmCon{3} end ... -end +end diff --git a/dev/index.html b/dev/index.html index a664e497..601ab084 100644 --- a/dev/index.html +++ b/dev/index.html @@ -1,2 +1,2 @@ -GATlab.jl · GATlab.jl

GATlab.jl

GATlab.jl is a computer algebra system based on Generalized Algebraic Theories.

Roughly speaking, GATlab consists of two parts. The first part is concerned with symbolic constructs. The prototypical symbolic construct is simply a term, i.e. a syntax tree. In GATlab, every term is always relative to a context, in which the free variables in that term are typed. And the context and term themselves are relative to an underlying theory, which defines the allowable type and term constructors that can be used in contexts and terms, along with laws that allow one to rewrite terms into equivalent other terms.

We think about a term as a symbolic representation of a function from an assignment of values to the free variables to a value for the term. I.e., the term $x^{2} + y^{2}$ represents the function $\mathbb{R}^2 \to \mathbb{R}$ given by $(x,y) \mapsto x^{2} + y^{2}$. However, GATlab also supports higher-level symbolic constructs. For instance, we can consider "substitutions" between contexts, which map free variables in one context to terms in another, as symbolic functions, and we can consider a collection of equations between terms in a context as a symbolic relation. We can use these building blocks to build symbolic dynamical systems and other things along these lines. GATlab ensures that all of the manipulation of symbolic constructs is well-typed, so that you don't do something like accidently add a vector and a scalar.

The second part of GATlab is concerned with computational semantics. The core of computational semantics is models of theories. A model tells you:

  • When a Julia value is a valid instance of a type in a theory
  • How to apply term constructors to Julia values

Given a model for a theory, a term in a context, and values for each of the free variables in that context, one can produce a new value by recursively applying the term constructors; this is known as interpreting the term. One can also compile the term by producing a Julia function that takes as its arguments values for the free variables in the context and then computes the value of the term. This can be faster than interpreting, because one only has to walk the syntax tree at runtime, not compile time. Analogous operations of interpretation and compilation can also be defined for higher level symbolic constructs. Moreover, GATlab provides facilities for manipulation of models themselves. For instance, from a model of a ring, one can construct the model of a module over that ring, where the scalars are ring elements and the vectors are Vectors of ring elements.

GATlab is the new backend for Catlab, and we are currently working to replace the old implementation of GATs with this new one.

There are many programs which have influenced the development of GATlab, here we just list a few

+GATlab.jl · GATlab.jl

GATlab.jl

GATlab.jl is a computer algebra system based on Generalized Algebraic Theories.

Roughly speaking, GATlab consists of two parts. The first part is concerned with symbolic constructs. The prototypical symbolic construct is simply a term, i.e. a syntax tree. In GATlab, every term is always relative to a context, in which the free variables in that term are typed. And the context and term themselves are relative to an underlying theory, which defines the allowable type and term constructors that can be used in contexts and terms, along with laws that allow one to rewrite terms into equivalent other terms.

We think about a term as a symbolic representation of a function from an assignment of values to the free variables to a value for the term. I.e., the term $x^{2} + y^{2}$ represents the function $\mathbb{R}^2 \to \mathbb{R}$ given by $(x,y) \mapsto x^{2} + y^{2}$. However, GATlab also supports higher-level symbolic constructs. For instance, we can consider "substitutions" between contexts, which map free variables in one context to terms in another, as symbolic functions, and we can consider a collection of equations between terms in a context as a symbolic relation. We can use these building blocks to build symbolic dynamical systems and other things along these lines. GATlab ensures that all of the manipulation of symbolic constructs is well-typed, so that you don't do something like accidently add a vector and a scalar.

The second part of GATlab is concerned with computational semantics. The core of computational semantics is models of theories. A model tells you:

  • When a Julia value is a valid instance of a type in a theory
  • How to apply term constructors to Julia values

Given a model for a theory, a term in a context, and values for each of the free variables in that context, one can produce a new value by recursively applying the term constructors; this is known as interpreting the term. One can also compile the term by producing a Julia function that takes as its arguments values for the free variables in the context and then computes the value of the term. This can be faster than interpreting, because one only has to walk the syntax tree at runtime, not compile time. Analogous operations of interpretation and compilation can also be defined for higher level symbolic constructs. Moreover, GATlab provides facilities for manipulation of models themselves. For instance, from a model of a ring, one can construct the model of a module over that ring, where the scalars are ring elements and the vectors are Vectors of ring elements.

GATlab is the new backend for Catlab, and we are currently working to replace the old implementation of GATs with this new one.

There are many programs which have influenced the development of GATlab, here we just list a few

diff --git a/dev/stdlib/index.html b/dev/stdlib/index.html index 3318897d..bd1f91b8 100644 --- a/dev/stdlib/index.html +++ b/dev/stdlib/index.html @@ -1,2 +1,2 @@ -Standard Library · GATlab.jl

Standard Library

Categories

Our friend ThCategory is the main theory in this module.

You can specialize a theory by adding more axioms. In this case we can specialize the theory of categories to that of thin category by adding the axiom that two morphisms are equal if they have the same domain and codomain.

thineq := f == g :: Hom(A,B) ⊣ [A::Ob, B::Ob, f::Hom(A,B), g::Hom(A,B)]

Category Building Blocks

The remaining theories in this module are not necessarily useful, but go to show demonstrate the theory hierarchy can be built up in small increments.

+Standard Library · GATlab.jl

Standard Library

Categories

Our friend ThCategory is the main theory in this module.

You can specialize a theory by adding more axioms. In this case we can specialize the theory of categories to that of thin category by adding the axiom that two morphisms are equal if they have the same domain and codomain.

thineq := f == g :: Hom(A,B) ⊣ [A::Ob, B::Ob, f::Hom(A,B), g::Hom(A,B)]

Category Building Blocks

The remaining theories in this module are not necessarily useful, but go to show demonstrate the theory hierarchy can be built up in small increments.