You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adding Jupyter notebook files (.ipynb) to my codebase increased the execution speed of detect-secrets from a few seconds to almost an hour. The CI/CD pipeline was barely usable because detect-secrets was so slow.
I analyzed the problem and found out that the reason for the slow processing are lines with a large number of consecutive spaces.
The jupyter notebooks may contain hundreds of such "problematic" lines, each with over 800 consecutive spaces, ending with quotes or comma.
To reproduce the issue, I generated a file where each line has 100 more spaces than the previous line, ending with some ASCII characters.
Each of those lines is extracted to a file within a dedicated folder which detect-secrets has to analyze.
As you can see in the following output, detect-secrets has almost the same execution time (0.2 seconds) for lines that contain up to 400 consecutive spaces.
However, detect-secrets needs more than ten times as much time (2.2 seconds) for a line with 1000 consecutive spaces.
secrets-scanner@26f48f23a489:/tmp/notebooks/spaces$ ./scan.sh
Scanning file 1 of 10: /tmp/tmp.cTxM6THbjX/split_notebook_aa
Time for /tmp/tmp.cTxM6THbjX/split_notebook_aa: 0.2 seconds
Scanning file 2 of 10: /tmp/tmp.cTxM6THbjX/split_notebook_ab
Time for /tmp/tmp.cTxM6THbjX/split_notebook_ab: 0.2 seconds
Scanning file 3 of 10: /tmp/tmp.cTxM6THbjX/split_notebook_ac
Time for /tmp/tmp.cTxM6THbjX/split_notebook_ac: 0.2 seconds
Scanning file 4 of 10: /tmp/tmp.cTxM6THbjX/split_notebook_ad
Time for /tmp/tmp.cTxM6THbjX/split_notebook_ad: 0.2 seconds
Scanning file 5 of 10: /tmp/tmp.cTxM6THbjX/split_notebook_ae
Time for /tmp/tmp.cTxM6THbjX/split_notebook_ae: 0.3 seconds
Scanning file 6 of 10: /tmp/tmp.cTxM6THbjX/split_notebook_af
Time for /tmp/tmp.cTxM6THbjX/split_notebook_af: 0.5 seconds
Scanning file 7 of 10: /tmp/tmp.cTxM6THbjX/split_notebook_ag
Time for /tmp/tmp.cTxM6THbjX/split_notebook_ag: 0.8 seconds
Scanning file 8 of 10: /tmp/tmp.cTxM6THbjX/split_notebook_ah
Time for /tmp/tmp.cTxM6THbjX/split_notebook_ah: 1.1 seconds
Scanning file 9 of 10: /tmp/tmp.cTxM6THbjX/split_notebook_ai
Time for /tmp/tmp.cTxM6THbjX/split_notebook_ai: 1.6 seconds
Scanning file 10 of 10: /tmp/tmp.cTxM6THbjX/split_notebook_aj
Time for /tmp/tmp.cTxM6THbjX/split_notebook_aj: 2.2 seconds
If the current behavior is a bug, please provide the steps to reproduce and if possible a minimal demo of the problem
Create a file create_notebook_file.shwith following content:
#!/bin/bash
output_file="notebook.ipynb"
> $output_file
spaces=""
for i in {1..10}
do
echo "${spaces}foo" >> $output_file
spaces+=" "
done
Make it executable: chmod +x create_notebook_file.sh
Create a file scan.shwith following content:
I'm submitting a bug report
What is the current behavior?
Adding Jupyter notebook files (
.ipynb
) to my codebase increased the execution speed of detect-secrets from a few seconds to almost an hour. The CI/CD pipeline was barely usable because detect-secrets was so slow.I analyzed the problem and found out that the reason for the slow processing are lines with a large number of consecutive spaces.
The jupyter notebooks may contain hundreds of such "problematic" lines, each with over 800 consecutive spaces, ending with quotes or comma.
To reproduce the issue, I generated a file where each line has 100 more spaces than the previous line, ending with some ASCII characters.
Each of those lines is extracted to a file within a dedicated folder which detect-secrets has to analyze.
As you can see in the following output, detect-secrets has almost the same execution time (0.2 seconds) for lines that contain up to 400 consecutive spaces.
However, detect-secrets needs more than ten times as much time (2.2 seconds) for a line with 1000 consecutive spaces.
Create a file
create_notebook_file.sh
with following content:Make it executable:
chmod +x create_notebook_file.sh
Create a file
scan.sh
with following content:Make it executable:
chmod +x scan.sh
Execute the scripts:
./create_notebook_file.sh
./scan.sh
I'd expect a runtime of O(n) for lines with n consecutive spaces ending with a non-space ASCII.
We are not able to use detect-secrets in our CI/CD pipeline if it takes so long to execute.
.ipynb
)The text was updated successfully, but these errors were encountered: