Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StableSet fork merge #92

Open
wants to merge 18 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
120fabc
.gitignore - add `.python-version`, `build`, `.idea`
idanmiara Feb 14, 2023
1f71480
.pre-commit-config.yaml added
idanmiara Feb 13, 2023
71d6030
requirements-dev.txt - added
idanmiara Feb 14, 2023
7660654
setup.py, pyproject.toml, tox.ini - update to drop support for Python…
idanmiara Feb 14, 2023
0ed6b3c
v5.2.0 Major refactor - adding `StableSet` implementation and many fe…
idanmiara Feb 14, 2023
8997dbc
setup.py - use a general purpose setup.py that takes the package deta…
idanmiara Feb 14, 2023
a1f6a86
test_ordered_set.py moved to test/test_ordered_set_1.py
idanmiara Feb 14, 2023
58ce4a5
test/__init__.py - add test types groups to be used across the differ…
idanmiara Feb 14, 2023
0c482dc
test/test_ordered_set_1.py - add many tests and parameterized the tes…
idanmiara Feb 14, 2023
596840e
test/test_ordered_set_2.py - added from https://github.com/simonperci…
idanmiara Feb 14, 2023
12f92bc
test/pytest_util.py - auxiliary module to help transition from `unitt…
idanmiara Feb 14, 2023
c773560
test/test_ordered_set_2.py - updated the tests from https://github.co…
idanmiara Feb 14, 2023
13e1a53
test/test_ordered_set_3.py - added from https://github.com/bustawin/o…
idanmiara Feb 13, 2023
c5bcb10
test/test_ordered_set_3.py - update the tests from https://github.com…
idanmiara Feb 14, 2023
11b4e5d
_test/_test_ordered_set_all.py - added to ease running all the tests …
idanmiara Feb 14, 2023
c92c219
ordered_set_benchmark.py - added to test the performance of different…
idanmiara Feb 14, 2023
c6860e3
README.md - updated with new features
idanmiara Feb 14, 2023
c1b4f8f
CHANGELOG.md - updated for v5.2.0
idanmiara Feb 14, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,6 @@ dist/
htmlcov
.eggs
.venv
.python-version
build
.idea
15 changes: 15 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
files: 'ordered_set/'
repos:
- repo: https://github.com/psf/black
rev: 22.3.0
hooks:
- id: black
- repo: https://github.com/pycqa/flake8
rev: 5.0.4
hooks:
- id: flake8
- repo: https://github.com/pycqa/isort
rev: 5.10.1
hooks:
- id: isort
args: ["--profile", "black"]
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,18 @@

Significant changes in major and minor releases of this library:

## Version 5.2 (February 2022)

- Major refactor
- Added a StableSet implementation, as a base class for OrderedSet.
- Added Many functions to OrderedSet, to be more complete and more compatible with other implementations.
- popitem(last: bool = True), similar to `dict.popitem` (note minor incompatibility with another implementation (`orderedset`) that have the `last` keyword in the `pop` function)
- move_to_end(key), similar to `dict.move_to_end`
- __le__, __lt__, __ge__, __gt__ - to improve subset/superset testing
- Minimum Python version is 3.8 (because __reversed__)
- Fix: OrderedSet.update now raised a TypeError instead of a ValueError when the type of the input is incorrect
- Added many new tests, and all the tests from 2 other implementations.

## Version 4.1 (January 2022)

- Packaged using flit. Wheels now exist, and setuptools is no longer required.
Expand Down
2 changes: 1 addition & 1 deletion MIT-LICENSE
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Copyright (c) 2012-2022 Elia Robyn Lake
Copyright (c) 2012-2022 Elia Robyn Lake, 2023 Idan Miara

Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and associated documentation files (the "Software"),
Expand Down
41 changes: 30 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,15 @@
[![Pypi](https://img.shields.io/pypi/v/ordered-set.svg)](https://pypi.python.org/pypi/ordered-set)

A StableSet is a mutable set that remembers its insertion order.
Featuring: Fast O(1) insertion, deletion, iteration and membership testing.
But slow O(N) Index Lookup.

An OrderedSet is a mutable data structure that is a hybrid of a list and a set.
It remembers the order of its entries, and every entry has an index number that
can be looked up.
It remembers its insertion order so that every entry has an index that can be looked up.
Featuring: O(1) Index lookup, insertion, iteration and membership testing.
But slow O(N) Deletion.

Both have similar interfaces but differ in respect of their implementation and performance.

## Installation

Expand Down Expand Up @@ -105,12 +112,23 @@ in OrderedSet).
## Authors

OrderedSet was implemented by Elia Robyn Lake (maiden name: Robyn Speer).
StableSet was implemented by Idan Miara, built upon the foundations of OrderedSet.
Jon Crall contributed changes and tests to make it fit the Python set API.
Roman Inflianskas added the original type annotations.


## Comparisons

A StableSet is a mutable set that remembers its insertion order.
Featuring: Fast O(1) insertion, deletion, iteration and membership testing.
But slow O(N) Index Lookup.

An OrderedSet is a mutable data structure that is a hybrid of a list and a set.
It remembers its insertion order so that every entry has an index that can be looked up.
Featuring: O(1) Index lookup, insertion, iteration and membership testing.
But slow O(N) Deletion.

Both have similar interfaces but differ in respect of their implementation and performance.

The original implementation of OrderedSet was a [recipe posted to ActiveState
Recipes][recipe] by Raymond Hettiger, released under the MIT license.

Expand All @@ -120,14 +138,15 @@ Hettiger's implementation kept its content in a doubly-linked list referenced by
dict. As a result, looking up an item by its index was an O(N) operation, while
deletion was O(1).

This version makes different trade-offs for the sake of efficient lookups. Its
content is a standard Python list instead of a doubly-linked list. This
This version of OrderedSet makes different trade-offs for the sake of efficient lookups.
Its content is a standard Python list instead of a doubly-linked list. This
provides O(1) lookups by index at the expense of O(N) deletion, as well as
slightly faster iteration.

In Python 3.6 and later, the built-in `dict` type is inherently ordered. If you
ignore the dictionary values, that also gives you a simple ordered set, with
fast O(1) insertion, deletion, iteration and membership testing. However, `dict`
does not provide the list-like random access features of OrderedSet. You
would have to convert it to a list in O(N) to look up the index of an entry or
look up an entry by its index.
## Other implementations

The included implementation of OrderedSet is fully compatible with the following implementation:
* https://pypi.org/project/orderedset/ - by Simon Percivall (faster implementation of `OrderedSet` using Cython, which currently only works for Python<3.9)

The included implementation of StableSet is fully compatible with the following implementation:
* https://pypi.org/project/Ordered-set-37/ - by Xavier Bustamante Talavera (Similar basic implementation for `StableSet`, but named `OrderedSet`)
110 changes: 110 additions & 0 deletions benchmarks/ordered_set_benchmark.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
import timeit
from functools import partial
from random import randint

from ordered_set import OrderedSet as OS2

try:
from orderedset import OrderedSet as OS1
except ImportError:
# currently orderedset fails to install on Python 3.10, 3.11
# https://github.com/simonpercivall/orderedset/issues/36#issuecomment-1424309665
print("orderedset is not installed, using ordered_set twice")
OS1 = OS2
from ordered_set import StableSet as OS3
try:
from sortedcollections import OrderedSet as OS4
except ImportError:
print("sortedcollections is not installed, using ordered_set twice")
OS4 = OS2


item_count = 10_000
item_range = item_count * 2
items = [randint(0, item_range) for _ in range(item_count)]
items_b = [randint(0, item_range) for _ in range(item_count)]

oset1a = OS1(items)
oset2a = OS2(items)
oset1b = OS1(items_b)
oset2b = OS2(items_b)
assert oset1a.difference(oset1b) == oset2a.difference(oset2b)
assert oset1a.intersection(oset1b) == oset2a.intersection(oset2b)

oset1c = OS1(items)
oset2c = OS2(items)
oset1c.add(item_range + 1)
oset2c.add(item_range + 1)
assert oset1c == oset2c

for i in range(item_range):
assert (i in oset1a) == (i in oset2a)
if i in oset1a:
assert oset1a.index(i) == oset2a.index(i)


def init_set(T, items) -> set:
return T(items)


def init_set_list(T, items) -> list:
return list(T(items))


def init_set_d(items) -> dict:
return dict.fromkeys(items)


def init_set_d_list(items) -> list:
return list(dict.fromkeys(items))


def update(s: set, items) -> set:
s.update(items)
return s


def update_d(s: dict, items) -> dict:
d2 = dict.fromkeys(items)
s.update(d2)
return s


ordered_sets_types = [OS1, OS2, OS3, OS4]
set_types = [set] + ordered_sets_types

oss = [init_set(T, items) for T in set_types]
od = init_set_d(items)

osls = [init_set_list(T, items) for T in set_types[1:]] + [init_set_d_list(items)]
for x in osls:
assert osls[0] == x

osls = [update(init_set(T, items), items_b) for T in ordered_sets_types[:-1]] + [
update_d(init_set_d(items), items_b)
]
osls = [list(x) for x in osls]
for x in osls:
assert osls[0] == x

number = 10000
repeats = 4
for i in range(repeats):
print(f"----- {i} ------")

print("-- init set like --")
print(f"d: {timeit.timeit(partial(init_set_d, items),number=number)=}")
for idx, T in enumerate(set_types):
print(f"{idx}: {timeit.timeit(partial(init_set, T, items),number=number)=}")

print("-- unique list --")
print(f"d: {timeit.timeit(partial(init_set_d, items),number=number)=}")
for idx, T in enumerate(set_types):
print(
f"{idx}: {timeit.timeit(partial(init_set_list, T, items),number=number)=}"
)

print("-- update set like --")
print(f"d: {timeit.timeit(partial(update_d, od, items_b),number=number)=}")
for idx, os in enumerate(oss[:-1]):
print(f"{idx}: {timeit.timeit(partial(update, os, items_b),number=number)=}")
Loading