Skip to content

Commit

Permalink
Merge pull request #42 from commoncrawl/main
Browse files Browse the repository at this point in the history
feat: make CCBot entry more accurate
  • Loading branch information
glyn authored Sep 27, 2024
2 parents 60bdfa7 + a6de89e commit 2f67e77
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions robots.json
Original file line number Diff line number Diff line change
Expand Up @@ -42,10 +42,10 @@
"respect": "No"
},
"CCBot": {
"description": "Sources data that is made openly available and is used to train AI models.",
"frequency": "Unclear at this time.",
"function": "Provides crawl data for an open source repository that has been used to train LLMs.",
"operator": "[Common Crawl](https://commoncrawl.org)",
"description": "Web archive going back to 2008. [Cited in thousands of research papers per year](https://commoncrawl.org/research-papers).",
"frequency": "Monthly at present.",
"function": "Provides open crawl dataset, used for many purposes, including Machine Learning/AI.",
"operator": "[Common Crawl Foundation](https://commoncrawl.org)",
"respect": "[Yes](https://commoncrawl.org/ccbot)"
},
"ChatGPT-User": {
Expand Down

0 comments on commit 2f67e77

Please sign in to comment.