README for replicating Howcroft & Demberg 2017

This is the repo for details on how to replicate the findings reported in:

Howcroft, David M., and Vera Demberg. 2017. "Psycholinguistic Models of Sentence Processing Improve Sentence Readability Ranking". Proc. of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. Pages 958-968. Valencia, Spain, April 3-7, 2017. Association for Computational Linguistics.

ACL Anthology || PDF

The first couple of sections describe the resources used and how to install them. The Instructions section then describes how to use them to replicate our results. References come after that, followed with some metadata for this document.

Corpora

English and Simple English Wikipedia (ESEW)

The English and Simple English Wikipedia (ESEW) was developed in Hwang et al. 2015. The resource is available from the authors on the project page

For our work we used only the good alignments. You can download the files from the command-line using:

wget http://ssli.ee.washington.edu/tial/projects/simplification/aligned-good-0.67.txt

The download is about 40 MB in size.

One Stop English (OSE)

The One Stop English corpus was developed by Sowmya Vajjala using data from [onestopenglish.com]. The corpus is available from her BitBucket repo: OSE Corpus.

You can fetch the data with:

wget https://bitbucket.org/nishkalavallabhi/complexity-features/raw/3cf60342c7ec82371ea2d0ef1bb290e7b0c9bac2/corpus/OSE-SentenceAlignedCorpus-ThreeLevel-2013toMid2015-FINAL.txt

The download is about 700 KB in size.

External Resources

For calculating surprisal and embedding depth

Our surprisal and embedding depth features are extracted by running the ModelBlocks parser in complexity output mode. The main distribution for the parser is on Github.

For calculating integration cost

Our integration cost features use a locally-developed tool called icy-parses (formerly icToolDist). This is available on Github as well.

For calculating propositional idea density

Our propositional idea density features depend on the adapted IDD3 repo and therefore also on the Stanford dependency parser.

Instructions

Running setup.sh in a bash-like environment will fetch the corpora and these repos for you.

Under development: This README is still under development and will be supplemented with all of the necessary scripts to automate the replication of our results.

References

Howcroft, David M., and Vera Demberg. 2017. "Psycholinguistic Models of Sentence Processing Improve Sentence Readability Ranking". Proc. of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. Pages 958-968. Valencia, Spain, April 3-7, 2017. Association for Computational Linguistics. ACL Anthology || PDF

Hwang, William, Hannaneh Hajishirzi, Mari Ostendorf, and Wei Wu. 2015. "Aligning Sentences from Standard Wikipedia to Simple Wikipedia". Proc. of the 2015 COnference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). Pages 211-217. Denver, Colorado, USA. Association for Computational Linguistics. ACL Anthology || PDF

Metadata

Written by David M. Howcroft, April 2017.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE.for_code		LICENSE.for_code
LICENSE.for_text		LICENSE.for_text
README.md		README.md
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

README for replicating Howcroft & Demberg 2017

Corpora

English and Simple English Wikipedia (ESEW)

One Stop English (OSE)

External Resources

For calculating surprisal and embedding depth

For calculating integration cost

For calculating propositional idea density

Instructions

References

Metadata

About

Licenses found

Releases

Packages

Languages

License

Licenses found

dmhowcroft/eacl2017-replication

Folders and files

Latest commit

History

Repository files navigation

README for replicating Howcroft & Demberg 2017

Corpora

English and Simple English Wikipedia (ESEW)

One Stop English (OSE)

External Resources

For calculating surprisal and embedding depth

For calculating integration cost

For calculating propositional idea density

Instructions

References

Metadata

About

Topics

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages