Single Column Profiling Algorithms

The research area of data profiling includes a large set of methods and processes to examine a given dataset and determine metadata about it (1). Typically, the results comprise various statistics about the columns and the relationships among them, in particular dependencies. Among the basic statistics about a column are data type, the number of unique values, maximum and minimum values, the number of null values, and the value distribution.

This repository has two parts:

Single Column Data Profiler (SCDP)

It collects the following statistics about each column of the input dataset (*.csv file)

Data type (REAL, SMALLINT, VARCHAR,...)
Exact number and percentage of distinct values
Number and percentage of Nulls
Top 10 frequent items and their frequencies.
Min, Max, Standard deviation, Average
...

Metanome Tool and Profiling Algorithms

Metanome is a framework that handles both algorithms and datasets as external resources. All the algorithms above have been developed to work within Metanome.

Run the algorithms using Metanome GUI

Download latest release of Metanome from Metanome releases page as well as the algorithms from the Algorithm releases page.
Unzip deployment/target/deployment-1.1-SNAPSHOT-package_with_tomcat.zip
Go into the unzipped folder, place the algorithm jar-file into the folder /WEB-INF/classes/algorithms and the datasets in the folder /WEB-INF/classes/inputData
Start the run script, either run.sh or run.bat(Windows Systems)
Open a browser at http://localhost:8080/ and register both the algorithm and the dataset in the Metanome frontend
Choose the algorithm and datasource, setting parameter and then run!

Development

MetanomeTestRunner: is a project to run the algorithms in development phase. As it is a MVN project all the required Metanome libraries will be automatically downloaded. If you want to build your own algorithm, give it a look here.

License

Metanome and all the algorithms developed by the developers group has the following license.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
SingleColumnProfiler		SingleColumnProfiler
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Single Column Profiling Algorithms

Single Column Data Profiler (SCDP)

Metanome Tool and Profiling Algorithms

Run the algorithms using Metanome GUI

Development

License

About

Releases

Packages

Languages

hazourahh/Single-Column-Profiler

Folders and files

Latest commit

History

Repository files navigation

Single Column Profiling Algorithms

Single Column Data Profiler (SCDP)

Metanome Tool and Profiling Algorithms

Run the algorithms using Metanome GUI

Development

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages