benchmark doc

marrlab · Dec 10, 2024 · afb2cc5 · afb2cc5
1 parent 163ca91
commit afb2cc5
Show file tree

Hide file tree

Showing 5 changed files with 18 additions and 9 deletions.
diff --git a/docs/doc_benchmark.md b/docs/doc_benchmark.md
@@ -132,8 +132,9 @@ In case that the benchmark is not entirely completed, the user can obtain partia
 explained below.
 
 
-### Obtain partial results
-If the benchmark is not yet completed (still running or has some failed jobs, e.g. BrokenPipe Error due to multiprocessing in PIL image reading), the `results.csv` file containing the aggregated results will not be created.
+### Aggregate obtained partial results
+If the benchmark is not yet completed (still running or has some failed jobs, e.g. Out of Memory, BrokenPipe Error due to multiprocessing in PIL image reading), the `results.csv` file containing the aggregated results will not be created.
+
 The user can then obtain the aggregated partial results with plots from the partially completed benchmark by running
 the following after cd into the DomainLab directory:
 ```commandline
@@ -144,9 +145,9 @@ e.g. `./zoutput/benchmarks/demo_benchmark`, where `demo_benchmark` is a name def
 
 Alternatively, one could use
 ```examples
-cat ./zoutput/benchmarks/[name of the benchmark]/rule_results/*.csv > result.csv
+sh scripts/sh_benchmark_partial_agg.sh OUTPUT_DIR/rule_results
 ```
-clean up the extra csv head generated and plot the csv using command below
+where rule_results is the subfolder that contains partially finished csv result files. This script will partially aggregate the csv files in a faster fashion, output latex table which summarizes the results named "output_table_perf.tex" (which also contains a text format table before the latex table), and at the end generate plots using the following functionality.
 
 ### Generate plots from .csv file
 If the benchmark is not completed, the `graphics` subdirectory might not be created. The user can then manually

diff --git a/scripts/generate_latex_table.py b/scripts/generate_latex_table.py
@@ -5,7 +5,7 @@
 import pandas as pd
 
 
-def gen_latex_table(raw_df, fname="table_perf.tex",
+def gen_latex_table(raw_df, fname="output_table_perf.tex",
                     group="method", str_perf="acc"):
     """
     aggregate benchmark csv file to generate latex table
@@ -15,6 +15,8 @@ def gen_latex_table(raw_df, fname="table_perf.tex",
     str_table = df_result.to_string()
     print(str_table)
     with open(fname, 'w') as file:
+        file.write(str_table)
+        file.write("\n")
         file.write(latex_table)
 
 

diff --git a/scripts/merge_csvs.sh b/scripts/merge_csvs.sh
@@ -1,10 +1,12 @@
 #!/bin/bash
+# $1 should be a folder with only csv files
+# $2 is the output file name
 
 # Define the directory containing the text files
 directory=$1
 
 # Define the output CSV file
-output_file="merged_data.csv"
+output_file="${2:-merged_data.csv}"
 
 # Initialize the merged CSV file with the header from the first file
 find "$directory" -maxdepth 1 -type f -name "*.csv" | sort | head -1 | xargs head -n 1 > "$output_file"

diff --git a/scripts/sh_benchmark_partial_agg.sh b/scripts/sh_benchmark_partial_agg.sh
@@ -0,0 +1,7 @@
+# $1 should be the rule_results folder which contains several csv files
+# $2 default to be merged_data.csv
+
+file_na_merged_csv="${2:-merged_data.csv}"
+sh scripts/merge_csvs.sh $1 $file_na_merged_csv
+python scripts/generate_latex_table.py $file_na_merged_csv
+python main_out.py --gen_plots $file_na_merged_csv  --outp_dir partial_agg_plots
diff --git a/scripts/sh_genplot.sh b/scripts/sh_genplot.sh