You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For convenience, when running in Terra (app.terra.bio), we automatically try a few things to open a remote BAM file (e.g., renewing Google Cloud authentication tokens, overriding cURL CA bundle, etc.). This works, but it look pretty ugly, especially when the accesses are parallelized. Then every thread has to go through all the warning messages, and it generates a ton of stderr noise that might confuse a user. For example:
[2024-12-02 06:22:55] Hidive version 0.1.95
[2024-12-02 06:22:55] Cli { command: Fetch { output: "/dev/stdout", loci: ["chr22:42,096,498-42,174,483|CYP2D6-CYP2D7"], padding: 500, seq_paths: ["gs://fc-1ee08173-e353-4494-ad28-7a3d7bd99734/resources/HPRC_grch38/HG00480.bam"] } }
[2024-12-02 06:22:55] Intermediate data will be stored at "/cromwell_root/tmp.zW7LSA".
[2024-12-02 06:22:55] Fetching data...
[E::easy_errno] Libcurl reported error 60 (SSL peer certificate or SSH remote key was not OK)
[E::hts_open_format] Failed to open file "gs://fc-1ee08173-e353-4494-ad28-7a3d7bd99734/resources/HPRC_grch38/HG00480.bam" : Input/output error
[2024-12-02 06:22:57] Read 'gs://fc-1ee08173-e353-4494-ad28-7a3d7bd99734/resources/HPRC_grch38/HG00480.bam', attempt 2 (reauthorizing to GCS)
[E::easy_errno] Libcurl reported error 60 (SSL peer certificate or SSH remote key was not OK)
[E::hts_open_format] Failed to open file "gs://fc-1ee08173-e353-4494-ad28-7a3d7bd99734/resources/HPRC_grch38/HG00480.bam" : Input/output error
[2024-12-02 06:22:59] Read 'gs://fc-1ee08173-e353-4494-ad28-7a3d7bd99734/resources/HPRC_grch38/HG00480.bam', attempt 3 (overriding cURL CA bundle)
[E::hts_open_format] Failed to open file "gs://fc-1ee08173-e353-4494-ad28-7a3d7bd99734/resources/HPRC_grch38/HG00480.bam" : No such file or directory
[E::hts_open_format] Failed to open file "gs://fc-1ee08173-e353-4494-ad28-7a3d7bd99734/resources/HPRC_grch38/HG00480.bam" : No such file or directory
[2024-12-02 06:22:59] Read 'gs://fc-1ee08173-e353-4494-ad28-7a3d7bd99734/resources/HPRC_grch38/HG00480.bam', attempt 2 (reauthorizing to GCS)
[E::hts_open_format] Failed to open file "gs://fc-1ee08173-e353-4494-ad28-7a3d7bd99734/resources/HPRC_grch38/HG00480.bam" : No such file or directory
[2024-12-02 06:23:01] Read 'gs://fc-1ee08173-e353-4494-ad28-7a3d7bd99734/resources/HPRC_grch38/HG00480.bam', attempt 3 (overriding cURL CA bundle)
[E::hts_open_format] Failed to open file "gs://fc-1ee08173-e353-4494-ad28-7a3d7bd99734/resources/HPRC_grch38/HG00480.bam" : No such file or directory
...
// Try to open the BAM file from the URL, with retries for authorization.
let bam = matchIndexedReader::from_url(seqs_url){
Ok(bam) => bam,
Err(_) => {
crate::elog!("Read '{}', attempt 2 (reauthorizing to GCS)",seqs_url);
// If opening fails, try authorizing access to Google Cloud Storage.
gcs_authorize_data_access();
// Try opening the BAM file again.
matchIndexedReader::from_url(seqs_url){
Ok(bam) => bam,
Err(_) => {
crate::elog!("Read '{}', attempt 3 (overriding cURL CA bundle)",seqs_url);
// If it still fails, guess the cURL CA bundle path.
local_guess_curl_ca_bundle();
// Try one last time to open the BAM file.
IndexedReader::from_url(seqs_url)?
}
}
}
};
Ok(bam)
}
Each time we fail to access the file, we add another level of credential guessing. This is called within a block of code that retries remote file accesses with an exponential backoff, in case the problem is actually intermittent connectivity issues to the data, rather than credentials or local configuration issues. It works for now, but it's ugly and brute-force.
We should improve this behavior. The messages are coming from htslib/rust-htslib, not hidive. So we could possibly capture and suppress stderr from htslib/rust-htslib. Or better yet, we could determine what credential renewals or environment configuration we need to make before trying to open the file, rather than our current strategy trial-and-error strategy.
The text was updated successfully, but these errors were encountered:
For convenience, when running in Terra (app.terra.bio), we automatically try a few things to open a remote BAM file (e.g., renewing Google Cloud authentication tokens, overriding cURL CA bundle, etc.). This works, but it look pretty ugly, especially when the accesses are parallelized. Then every thread has to go through all the warning messages, and it generates a ton of stderr noise that might confuse a user. For example:
The problem arises in the following code in:
hidive/src/skydive/src/stage.rs
Lines 48 to 79 in ab1a300
Each time we fail to access the file, we add another level of credential guessing. This is called within a block of code that retries remote file accesses with an exponential backoff, in case the problem is actually intermittent connectivity issues to the data, rather than credentials or local configuration issues. It works for now, but it's ugly and brute-force.
We should improve this behavior. The messages are coming from htslib/rust-htslib, not hidive. So we could possibly capture and suppress stderr from htslib/rust-htslib. Or better yet, we could determine what credential renewals or environment configuration we need to make before trying to open the file, rather than our current strategy trial-and-error strategy.
The text was updated successfully, but these errors were encountered: