Skip to content

Actions: huggingface/datatrove

All workflows

Actions

Loading...
Loading

Showing runs from all workflows
1,297 workflow runs
1,297 workflow runs

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

Add glob pattern for hash index
Test & Check Code Quality #350: Pull request #313 opened by jordane95
December 11, 2024 03:29 2m 45s jordane95:decont-glob
December 11, 2024 03:29 2m 45s
fix xhtml+xml
Test & Check Code Quality #349: Commit b701935 pushed by guipenedo
December 11, 2024 01:10 2m 59s main
December 11, 2024 01:10 2m 59s
fix xhtml+xml
Secret Leaks #177: Commit b701935 pushed by guipenedo
December 11, 2024 01:10 16s main
December 11, 2024 01:10 16s
PyPI release
PyPI release #11: Manually run by guipenedo
December 6, 2024 18:24 3m 41s main
December 6, 2024 18:24 3m 41s
Update pyproject.toml
Test & Check Code Quality #348: Commit 842b241 pushed by guipenedo
December 6, 2024 18:23 3m 6s main
December 6, 2024 18:23 3m 6s
Update pyproject.toml
Secret Leaks #176: Commit 842b241 pushed by guipenedo
December 6, 2024 18:23 19s main
December 6, 2024 18:23 19s
FineWeb-2: multilingual, numpy 2.0, minhash improvements (#285)
Test & Check Code Quality #347: Commit 8427759 pushed by guipenedo
December 6, 2024 16:04 2m 49s main
December 6, 2024 16:04 2m 49s
FineWeb-2: multilingual, numpy 2.0, minhash improvements (#285)
Secret Leaks #175: Commit 8427759 pushed by guipenedo
December 6, 2024 16:04 37s main
December 6, 2024 16:04 37s
FineWeb-2: multilingual, numpy 2.0, minhash improvements
Test & Check Code Quality #346: Pull request #285 synchronize by guipenedo
December 6, 2024 15:54 2m 37s multilingual
December 6, 2024 15:54 2m 37s
fix terminal punctuation in fineweb quality filter
Secret Leaks #174: Commit f0643b5 pushed by guipenedo
December 6, 2024 15:54 23s multilingual
December 6, 2024 15:54 23s
FineWeb-2: multilingual, numpy 2.0, minhash improvements
Test & Check Code Quality #345: Pull request #285 synchronize by guipenedo
December 6, 2024 15:40 2m 53s multilingual
December 6, 2024 15:40 2m 53s
add rust tool readme
Secret Leaks #173: Commit 32d7321 pushed by guipenedo
December 6, 2024 15:40 23s multilingual
December 6, 2024 15:40 23s
moved rust tool
Secret Leaks #172: Commit 684b5aa pushed by guipenedo
December 6, 2024 15:40 17s multilingual
December 6, 2024 15:40 17s
FineWeb-2: multilingual, numpy 2.0, minhash improvements
Test & Check Code Quality #344: Pull request #285 synchronize by guipenedo
December 3, 2024 17:13 2m 58s multilingual
December 3, 2024 17:13 2m 58s
updated symbollinesformatter
Secret Leaks #171: Commit 10ca7db pushed by guipenedo
December 3, 2024 17:13 19s multilingual
December 3, 2024 17:13 19s
FineWeb-2: multilingual, numpy 2.0, minhash improvements
Test & Check Code Quality #343: Pull request #285 synchronize by guipenedo
December 3, 2024 15:48 2m 45s multilingual
December 3, 2024 15:48 2m 45s
updated url filter blocklists
Secret Leaks #170: Commit 1d25288 pushed by guipenedo
December 3, 2024 15:48 39s multilingual
December 3, 2024 15:48 39s
FineWeb-2: multilingual, numpy 2.0, minhash improvements
Test & Check Code Quality #342: Pull request #285 synchronize by guipenedo
November 29, 2024 22:25 2m 53s multilingual
November 29, 2024 22:25 2m 53s
remove dumb print
Secret Leaks #169: Commit 696e40f pushed by guipenedo
November 29, 2024 22:25 16s multilingual
November 29, 2024 22:25 16s
Resolve issue 308
Test & Check Code Quality #341: Pull request #309 opened by habanoz
November 29, 2024 20:26 2m 5s habanoz:main
November 29, 2024 20:26 2m 5s
FineWeb-2: multilingual, numpy 2.0, minhash improvements
Test & Check Code Quality #340: Pull request #285 synchronize by guipenedo
November 29, 2024 18:36 2m 28s multilingual
November 29, 2024 18:36 2m 28s
reuse word tokenizations between blocks
Secret Leaks #168: Commit b1ccdb8 pushed by guipenedo
November 29, 2024 18:36 21s multilingual
November 29, 2024 18:36 21s
FineWeb-2: multilingual, numpy 2.0, minhash improvements
Test & Check Code Quality #339: Pull request #285 synchronize by guipenedo
November 29, 2024 13:50 2m 41s multilingual
November 29, 2024 13:50 2m 41s
fixes for empty folders
Secret Leaks #167: Commit 610560c pushed by guipenedo
November 29, 2024 13:50 15s multilingual
November 29, 2024 13:50 15s
FineWeb-2: multilingual, numpy 2.0, minhash improvements
Test & Check Code Quality #338: Pull request #285 synchronize by guipenedo
November 28, 2024 14:50 2m 42s multilingual
November 28, 2024 14:50 2m 42s