Skip to content

Commit

Permalink
edits to post and gitignore
Browse files Browse the repository at this point in the history
  • Loading branch information
Lucas Roberts authored and Lucas Roberts committed Oct 15, 2024
1 parent 2587da4 commit 781bf3b
Show file tree
Hide file tree
Showing 2 changed files with 87 additions and 24 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,6 @@ package-lock.json

# ignore mac fs indexing files
*.DS_store*

# for local rendering to reduce so many edits and load on gh servers
.jekyll-metadata
108 changes: 84 additions & 24 deletions _posts/2024-10-14-blog-post.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: 'On subsets and order'
title: 'On subsets and orderings'
date: 2024-10-14
permalink: /posts/2024/10/blog-post-1/
tags:
Expand All @@ -11,7 +11,7 @@ tags:
If you're using a recent version of SciPY
you might be using this idea already and not even know.

Bringing order to choas
Bringing order to partition counts
======

Subsets have many varieties and they have some aspects that are distinguishable
Expand All @@ -35,15 +35,17 @@ the third?
There a decision to be made when adding the new item; first do we make a
new set which contains only this new item or do we add the new item to one of
the existing non-empty subsets? For the first we have only 1 way to do this and
for the second option we have $k$ existing subsets (say) so we can pick any one
of these $k$.
for the second option we have $$k$$ existing subsets (say) so we can pick any one
of these $$k$$.

Now if we arrange these numbers as a list for a given number of distinct items
starting with 2 and moving to 3 we see that we have

n=2 -> [0,1,1]

```
n=1 -> [0,1,0,0]
n=2 -> [0,1,1,0]
n=3 -> [0,1,3,1]
```

many ways where the ith entry in the nth list tells us how many ways to make
i non-empty subsets of n distinct items.
Expand All @@ -53,16 +55,16 @@ Now if you're wondering where the 3 came from it is from choosing to add the
with only the new item and adding that to the singleton set that contains the
two previous items.

In general for $n$ distinct items into $i$ non-empty subsets we count these
In general for $$n$$ distinct items into $$i$$ non-empty subsets we count these
using this recursion, these subset numbers are listed in GKP's concrete math
book as [Stirling numbers of the second kind](https://en.wikipedia.org/wiki/Stirling_numbers_of_the_second_kind) where this recursion is given.

Two interesting things, the first is that there is no commonly agreed upon
notation for these numbers. I follow GKP and use the bracket/brace notation,
notation for these numbers. I follow [GKP](https://www.amazon.com/Concrete-Mathematics-Foundation-Computer-Science/dp/0201558025) and use the bracket/brace notation,

$${n \brace k} = k{n-1 \brace k} + {n-1 \brace k-1}$$,

for n>=k>0 and k>0. The above equation also codifies the recursion relation for
for $$0<k \leq n $$. The above equation also codifies the recursion relation for
these numbers. Now you can use this to compute any value you like in a computer
via what is called a [dynamic programm](https://en.wikipedia.org/wiki/Dynamic_programming)
or DP.
Expand All @@ -72,25 +74,70 @@ can do the DP in-place instead of using 2 adjacent rows of integers in separate
memory locations, one storing the values for $$n-1$$ and the other for $$n$$ that
is being populated in the current iteration.

# Code implementations

To do this in place you need to traverse the row of integers right to left, also
called traversing fro the back in some circles.

The steps are:

(0) append a new entry to the list of integers at the end

(1) multiply that value by the index in place

(2) add the previous index value to the current indexed value
0. append a new entry to the list of integers at the end
1. multiply that value by the index in place
2. add the previous index value to the current indexed value

stop after index 1, assuming your list is 0-indexed.

This might not seem like much, it is only 1 list vs 2 after all but each item
in the list becomes a large integer fairly quickly so this adds up for large n.

The above logic is how I implemented the [stirling2](https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.stirling2.html) function is SciPy for the
Here is a brief python code snippet to illustrate

```python
def stirling2(n: int, k: int) -> int:
# stirling numbers second kind ...
if n==1 or k == n:
return 1
s2 = [0,1]
for i in range(1, n-1):
s2.append(1)
for j in range(len(s2)-1, 1, -1):
s2[j] *= j
s2[j] += s2[j-1]
s2.append(1)
return s2[k]
```

Or javascript if you prefer...

``` javascript
/**
* @param {number} n
* @param {number} k
* @return {number}
*/
var stirling2= function(n, k) {
let row = [BigInt(0),BigInt(1)];
// Note: you cal also do the tiling via a parallelogram
// to reduce computation...
for(let r=1; r<n-1; r++){
if(row.length <= k){
row.push(BigInt(1));
}
for(let c=Math.min(row.length-1, k); c>1; c--){
row[c] *= BigInt(c);
row[c] += row[c-1];
}
}
row.push(BigInt(1));
return Number(row[k]);
};
```

The above logic is roughly how I implemented the [stirling2](https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.stirling2.html) function is SciPy for the
`exact=True` branch.

# Traverse in Reverse and Ordering partitions

The $$k$$ value in the recursion formula above tells us that we need to multiply
in place on index $$k$$ before adding the value from $$k-1$$. This means we must
traverse from the back.
Expand All @@ -112,13 +159,6 @@ To calculate these the recursion is easily seen as
$$b(n,k) = kb(n-1, k) + kb(n-1,k-1)$$.

In other words we're multiplying both items by $$k$$.
We can interpret this as each non-empty subset being distinguished in some way,
e.g. by different colors. This means 1 and then 2 as two non-empty subsets is
distinct from 2 and then 1 as two non-empty subsets.

Another way to interpret is as the number of ties in a ranking of $$n$$ elements
where we have $$k$$ distinct sets ranked. If there are no ties in a set then the
set is a singleton.

Now because the value $$k$$ appears on both the recursion can go from either side
e.g. you can traverse from the front, e.g. the usual way i=0, i=1, etc. and then
Expand All @@ -131,8 +171,28 @@ primary difference beside what the numbers actually count is that for
ordered Bell numbers of order $$k$$ you can iterate over a row in either direction
and still do the computation in place.

Note: In the SciPy implementation the Stirling number computation is vectorized
# Applications

For ordered bell numbers of order $$k$$,
we can interpret this as each non-empty subset being distinguished in some way,
e.g. by different colors or distinct numbers on the subsets.
This means 1 and then 2 as two non-empty subsets is distinct from 2 and then 1
as two non-empty subsets.

Another way to interpret is as the number of ties in a ranking of $$n$$ elements
where we have $$k$$ distinct sets ranked. If there are no ties in a set then the
set is a singleton.

The calculation of the number of possible ways to win a race with ties can come up when
calculating summary metrics on search ranking results. This is especially true
when you are working with human annotation teams and they are allowed to rank
the values with ties in the ground truth search results.

# Implementation Note

In the SciPy implementation the Stirling number computation is vectorized
and given the nature of the partially overlapping subproblems I used a [heap](https://en.wikipedia.org/wiki/Heap_(data_structure)) to
schedule the computation for the individual cells of the matrices. There isn't
a better way to do this that I'm aware of and still be exact.
For the approximate branch-the default-the code is using the [Lambert W function](https://en.wikipedia.org/wiki/Lambert_W_function) and uses the second order approximation from Temme.

For the approximate branch-the default-the code is using the [Lambert W function](https://en.wikipedia.org/wiki/Lambert_W_function) and uses the second order approximation from [Temme](https://ir.cwi.nl/pub/5461) see section 4 of the linked paper for details.

0 comments on commit 781bf3b

Please sign in to comment.