Skip to content

Commit

Permalink
* update readme for analyzer
Browse files Browse the repository at this point in the history
  • Loading branch information
HYLcool committed Dec 20, 2024
1 parent f8b9539 commit 45259e5
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 2 deletions.
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -340,7 +340,9 @@ dj-analyze --config configs/demo/analyzer.yaml
dj-analyze --auto --dataset_path xx.jsonl [--auto_num 1000]
```
- **Note:** Analyzer only compute stats of Filter ops. So extra Mapper or Deduplicator ops will be ignored in the analysis process.
- **Note:** Analyzer only compute stats for Filters that produce stats or other OPs that produce tags/categories in meta. So other OPs will be ignored in the analysis process. We use the following registries to decorate OPs:
- `NON_STATS_FILTERS`: decorate Filters that **DO NOT** produce any stats.
- `TAGGING_OPS`: decorate OPs that **DO** produce tags/categories in meta field.
### Data Visualization
Expand Down
4 changes: 3 additions & 1 deletion README_ZH.md
Original file line number Diff line number Diff line change
Expand Up @@ -316,7 +316,9 @@ dj-analyze --config configs/demo/analyzer.yaml
dj-analyze --auto --dataset_path xx.jsonl [--auto_num 1000]
```

* **注意**:Analyzer 只计算 Filter 算子的状态,其他的算子(例如 Mapper 和 Deduplicator)会在分析过程中被忽略。
* **注意**:Analyzer 只用于能在 stats 字段里产出统计信息的 Filter 算子和能在 meta 字段里产出 tags 或类别标签的其他算子。除此之外的其他的算子会在分析过程中被忽略。我们使用以下两种注册器来装饰相关的算子:
* `NON_STATS_FILTERS`:装饰那些**不能**产出任何统计信息的 Filter 算子。
* `TAGGING_OPS`:装饰那些能在 meta 字段中产出 tags 或类别标签的算子。

### 数据可视化

Expand Down

0 comments on commit 45259e5

Please sign in to comment.