sotab-benchmark

This repository contains the code for recreating the Schema.org Table Annotation Benchmark .

Schema.org Table Corpus

SOTAB is created based on the Schema.org Table Corpus . To run the code for creating SOTAB, all zip files from the top100 and minimum3 subsets of Schema.org Table Corpus need to be downloaded and put in the directory: data/stc_zip_files/

Run download.sh to download processed datasets for the VizNet corpus. It will also create data directory.

$ bash download.sh

SOTAB creation

To create the SOTAB datasets for Column Type Annotation and Column Property Annotation the notebooks need to be run in the order stated below:

Language Detection
MatchColumnNamesToSchema.org
Expand properties-CreateTables
AnnotatingTables
TableSelection-CPA
Different-Formats-CPA
RandomColumns-CPA
TableSelection-CTA
CreatingSplits-CPA
CreatingSplits-CTA

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
baselines/doduo		baselines/doduo
data		data
output-data/similarities		output-data/similarities
1-MatchColumnNamesToSchema.org.ipynb		1-MatchColumnNamesToSchema.org.ipynb
2-Expand properties-CreateTables.ipynb		2-Expand properties-CreateTables.ipynb
3-LanguageDetection.ipynb		3-LanguageDetection.ipynb
4-GeneratingLabels.ipynb		4-GeneratingLabels.ipynb
5-ColumnAnalysis.ipynb		5-ColumnAnalysis.ipynb
5.1-TableSelection-CPA.ipynb		5.1-TableSelection-CPA.ipynb
5.2-Different-Formats-CPA.ipynb		5.2-Different-Formats-CPA.ipynb
5.3-RandomColumns-CPA.ipynb		5.3-RandomColumns-CPA.ipynb
6-TableSelection-CTA.ipynb		6-TableSelection-CTA.ipynb
7-CreatingSplits-CPA.ipynb		7-CreatingSplits-CPA.ipynb
8-CreatingSplits-CTA.ipynb		8-CreatingSplits-CTA.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sotab-benchmark

Schema.org Table Corpus

SOTAB creation

About

Releases

Packages

Languages

wbsg-uni-mannheim/wdc-sotab

Folders and files

Latest commit

History

Repository files navigation

sotab-benchmark

Schema.org Table Corpus

SOTAB creation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages