Python scripts for disambiguating patent data

The following collection of scripts performs pre- and post-processing on patent data as part of the patent inventor disambiguation process.

CURRENT:

(I) DATASET PREPARATION

(1) XML Parsing

a. Open XMLParse2008.py
b. Set variable flder = <folder that contains all XML raw files>
c. Run XMLParse2008.py

(2) Data Cleaning

- scripts_v2.py should be in same directory as all sqlite3 files from XML Parsing step.
a. Run scripts_v2.py

(3) Table Consolidation

a. Run invpat.py
b.

(II) RESULTS ANALYSIS

From the command line, run bmVerify_v3.py.

Use python bmVerify_v3.py ? or python bmVerify_v3.py help for more information.

(III) Other scripts

Run from command line to create files:

python patentYear.py [year] [src] python createFullSet.py [start_year] [end_year]

Name		Name	Last commit message	Last commit date
Latest commit History 154 Commits
doc		doc
lib		lib
misc		misc
postprocess		postprocess
test		test
.gitignore		.gitignore
Makefile		Makefile
README.markdown		README.markdown
benchmark.py		benchmark.py
build_gns.py		build_gns.py
build_sas.py		build_sas.py
clean.py		clean.py
consolidate.py		consolidate.py
parse.py		parse.py
preprocess.sh		preprocess.sh