You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 29, 2023. It is now read-only.
We were trying to use the tool for directory-level scans (using --dir) over a bunch of cloned repositories. For instance, we tried scanning gitea, it results into following:
$ license-scanner --dir gitea/ Error: failed to normalize data: invalid input text with control characters
We had a similar observation on a few more directories containing some non-textual files such as UI assets, binaries, etc.
Will it be possible to get a Warning for such file occurrences, and those files being ignored, and the scanner continuing to scan the remaining files? Or perhaps a command-line argument to set such a behavior by the tool?
The text was updated successfully, but these errors were encountered:
I had a workaround for this. There is a bit more to it that I need to untangle (probably not specific to this issue), but basically here (below) is where the error can be changed to log-and-continue.
I'll assign this to me. There is some a pending PR and some repo moving again that might delay this though.
diff --git a/normalizer/normalizer.go b/normalizer/normalizer.go
--- a/normalizer/normalizer.go
+++ b/normalizer/normalizer.go
@@ -151,7 +151,13 @@
// Check if the text contains control characters indicative of binary or non-text files.
// match against /[\u0000-\u0007\u000E-\u001B]/
if ControlCharactersRE.MatchString(n.OriginalText) {
- return fmt.Errorf("failed to normalize data: invalid input text with control characters")
+ if n.IsTemplate {
+ return fmt.Errorf("failed to normalize data: invalid input text with control characters")
+ } else {
+ Logger.Errorf("failed to normalize data: invalid input text with control characters")
+ n.NormalizedText = ""
+ return nil // continue
+ }
}
Tested your workaround, seemed to be sorting the issue for now. Also ran across another issue with similar outcome: Error: file too large (4986500 > 1000000)
I tried changes similar to what you suggested for the earlier issue, like so:
diff --git a/identifier/identifier.go b/identifier/identifier.go
index 4750fa7..7bb47bd 100644
--- a/identifier/identifier.go
+++ b/identifier/identifier.go
@@ -109,7 +109,8 @@ func IdentifyLicensesInFile(filePath string, options Options, licenseLibrary *li
return IdentifierResults{}, err
}
if fi.Size() > 1000000 {
- return IdentifierResults{}, fmt.Errorf("file too large (%v > 1000000)", fi.Size())
+ Logger.Errorf("file too large (%v > 1000000)", fi.Size())
+ return IdentifierResults{}, nil
}
b, err := ioutil.ReadFile(filePath)
Could you confirm if this is the right way of handling the problem, or should it have been something else? And also if it is possible to incorporate this change as well?
Hello,
We were trying to use the tool for directory-level scans (using
--dir
) over a bunch of cloned repositories. For instance, we tried scanning gitea, it results into following:$ license-scanner --dir gitea/
Error: failed to normalize data: invalid input text with control characters
We had a similar observation on a few more directories containing some non-textual files such as UI assets, binaries, etc.
Will it be possible to get a Warning for such file occurrences, and those files being ignored, and the scanner continuing to scan the remaining files? Or perhaps a command-line argument to set such a behavior by the tool?
The text was updated successfully, but these errors were encountered: