-
Notifications
You must be signed in to change notification settings - Fork 49
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
19 changed files
with
91 additions
and
246 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,2 @@ | ||
*~ | ||
__pycache__/ | ||
venv/ | ||
results/*.png | ||
results/*.txt | ||
processed_data/*.dat | ||
.snakemake | ||
*/*.log |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,49 +1,27 @@ | ||
# a list of all the books we are analyzing | ||
DATA = glob_wildcards('data/{book}.txt').book | ||
|
||
# this is for running on HPC resources | ||
localrules: all, make_archive | ||
|
||
# the default rule | ||
rule all: | ||
input: | ||
'zipf_analysis.tar.gz' | ||
expand('statistics/{book}.data', book=DATA), | ||
expand('plot/{book}.png', book=DATA) | ||
|
||
# count words in one of our books | ||
# logfiles from each run are put in .log files" | ||
rule count_words: | ||
input: | ||
wc='source/wordcount.py', | ||
script='statistics/count.py', | ||
book='data/{file}.txt' | ||
output: 'processed_data/{file}.dat' | ||
threads: 4 | ||
log: 'processed_data/{file}.log' | ||
shell: | ||
''' | ||
python {input.wc} {input.book} {output} >> {log} 2>&1 | ||
''' | ||
output: 'statistics/{file}.data' | ||
conda: 'environment.yml' | ||
log: 'statistics/{file}.log' | ||
shell: 'python {input.script} {input.book} > {output}' | ||
|
||
# create a plot for each book | ||
rule make_plot: | ||
input: | ||
plotcount='source/plotcount.py', | ||
book='processed_data/{file}.dat' | ||
output: 'results/{file}.png' | ||
shell: 'python {input.plotcount} {input.book} {output}' | ||
|
||
# generate summary table | ||
rule zipf_test: | ||
input: | ||
zipf='source/zipf_test.py', | ||
books=expand('processed_data/{book}.dat', book=DATA) | ||
output: 'results/results.txt' | ||
shell: 'python {input.zipf} {input.books} > {output}' | ||
|
||
# create an archive with all of our results | ||
rule make_archive: | ||
input: | ||
expand('results/{book}.png', book=DATA), | ||
expand('processed_data/{book}.dat', book=DATA), | ||
'results/results.txt' | ||
output: 'zipf_analysis.tar.gz' | ||
shell: 'tar -czvf {output} {input}' | ||
script='plot/plot.py', | ||
book='statistics/{file}.data' | ||
output: 'plot/{file}.png' | ||
conda: 'environment.yml' | ||
log: 'plot/{file}.log' | ||
shell: 'python {input.script} --data-file {input.book} --plot-file {output}' |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,22 +1,9 @@ | ||
name: coderefinery | ||
name: word-count | ||
channels: | ||
- conda-forge | ||
- defaults | ||
- bioconda | ||
dependencies: | ||
- python>3.7 | ||
- click=7.1.2 | ||
- ipywidgets=7.6.3 | ||
- jupyterlab=3.0.14 | ||
- jupyterlab-git=0.30.0 | ||
- matplotlib=3.4.1 | ||
- numpy=1.20.2 | ||
- pandas=1.2.4 | ||
- pytest=6.2.3 | ||
- seaborn=0.11.1 | ||
- snakemake-minimal=6.2.1 | ||
- sphinx=3.5.4 | ||
- sphinx_rtd_theme=0.5.2 | ||
- pip | ||
# - pip: | ||
# - jupyterlab-github==2.0.0 | ||
- python>3.9 | ||
- click=8.1.3 | ||
- matplotlib=3.7.0 | ||
- snakemake-minimal=7.22.0 |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
import matplotlib.pyplot as plt | ||
import click | ||
|
||
|
||
def plot_bar_chart(x_values, y_values, title, plot_file): | ||
plt.figure(figsize=(10, 5)) | ||
plt.bar(x_values, y_values) | ||
plt.title(title) | ||
plt.savefig(plot_file) | ||
|
||
|
||
@click.command() | ||
@click.option( | ||
"--data-file", required=True, help="Input data file", type=click.Path(exists=True) | ||
) | ||
@click.option("--plot-file", required=True, help="Output plot file") | ||
def main(data_file, plot_file): | ||
# read data from input_file | ||
x_values = [] | ||
y_values = [] | ||
for line in open(data_file, "r").readlines(): | ||
word, count = line.split() | ||
x_values.append(word) | ||
y_values.append(int(count)) | ||
|
||
# now plot the data | ||
plot_bar_chart(x_values, y_values, "10 most common words", plot_file) | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file.
Empty file.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
the 4044 | ||
and 2807 | ||
of 1907 | ||
a 1594 | ||
to 1515 | ||
in 1221 | ||
i 974 | ||
was 695 | ||
it 680 | ||
for 675 |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
the 3822 | ||
of 2460 | ||
and 1723 | ||
to 1479 | ||
a 1308 | ||
in 997 | ||
is 894 | ||
that 652 | ||
by 607 | ||
it 573 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
the 12244 | ||
and 5566 | ||
to 5073 | ||
of 4952 | ||
a 4015 | ||
in 2699 | ||
we 2649 | ||
is 2302 | ||
it 2102 | ||
on 1861 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
the 4242 | ||
and 2469 | ||
of 2190 | ||
a 1319 | ||
to 1292 | ||
in 1175 | ||
i 621 | ||
is 564 | ||
as 524 | ||
on 513 |