dedup
Here are 25 public repositories matching this topic...
CLI utility to find near duplicate images and remove all but the best copy.
-
Updated
Nov 25, 2024 - Python
Parallel Patterns Implementation of PARSEC Benchmark Applications
-
Updated
Dec 29, 2021 - C++
String deduplication package for Go
-
Updated
Jan 10, 2024 - Go
Golang structured logging (slog) deduplication and sorting for use with json logging
-
Updated
Mar 29, 2024 - Go
distill large scale web page text
-
Updated
Jul 29, 2023 - C++
📄【优爱酷可视化网站网页数据采集系统】 采用先进的可视化采集技术,智能识别网页元素类型,如:图片、文字、链接、HTML 、文件等,支持运行Javascript脚本、应用正则表达式、自动滚屏、自动翻页、打开弹出窗口并采集数据,支持数据自动去重、仿人工间歇暂停防IP阻塞、自动保存等采集设置;支持浏览器Cookie和缓存等浏览器设置;支持代理轮换科学上网采集;支持“类别/关键字”;支持图像重命名等; 更可支持多线程采集等高级采集选项设置,vip版还可支持定时计划采集。
-
Updated
Aug 12, 2019
Find (partial content) duplicate files.
-
Updated
Dec 10, 2022 - Python
Project to take two similar zipfiles, and to dedupe files that have the same tiemstamp in the older file.
-
Updated
Oct 14, 2018 - Python
Sift duplicate whitespaces away!
-
Updated
Jul 15, 2024 - Rust
BenSP is a suite of parameterizable benchmarks for stream parallelism which is used to evaluate stream processing characteristics.
-
Updated
Aug 3, 2022 - C
A CLI tool for images analysis: checking image integrity, images deduplication, image retrieval.
-
Updated
Mar 27, 2024 - Rust
Detect and optionally delete duplicate files in a directory tree
-
Updated
Jun 6, 2021 - Go
Improve this page
Add a description, image, and links to the dedup topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the dedup topic, visit your repo's landing page and select "manage topics."