-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#228] Implemented processing of xls files without transformation to csv #285
base: main
Are you sure you want to change the base?
Conversation
XLS and CSV files are now processed using adapters, will fix the HTML processing soon |
Processing HTML tables works again, though it still has its inelegant solution with a conversion to CSV. |
HTML tables are now being processed directly, without conversion. However, I'm not sure that won't introduce more bugs because HTML doesn't store info of all the cells in every row, so I have to use a workaround to remember if we should have merged cells in every place. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my suggestions.
s-pipes-modules/module-tabular/src/main/java/cz/cvut/spipes/modules/util/FileReaderAdapter.java
Outdated
Show resolved
Hide resolved
s-pipes-modules/module-tabular/src/main/java/cz/cvut/spipes/modules/util/FileReaderAdapter.java
Outdated
Show resolved
Hide resolved
s-pipes-modules/module-tabular/src/main/java/cz/cvut/spipes/modules/TabularModule.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
@Override | ||
public String[] getHeader() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should support skipHeader = true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
...s-modules/module-tabular/src/main/java/cz/cvut/spipes/modules/util/XLSFileReaderAdapter.java
Outdated
Show resolved
Hide resolved
logMissingQuoteError(); | ||
return getExecutionContext(inputModel, outputModel); | ||
} | ||
fileReaderAdapter.initialise(new ByteArrayInputStream(sourceResource.getContent()), sourceResourceFormat, processTableAtIndex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before it was implemented like:
ICsvListReader listReader = getCsvListReader(csvPreference);
if (listReader == null) {
logMissingQuoteError();
return getExecutionContext(inputModel, outputModel);
}
but there is no reason to have
if (listReader == null) {
logMissingQuoteError();
return getExecutionContext(inputModel, outputModel);
}
outside of getCsvListReader() method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not find it anywhere
s-pipes-modules/module-tabular/src/main/java/cz/cvut/spipes/modules/TabularModule.java
Outdated
Show resolved
Hide resolved
1f22c86
to
940f458
Compare
940f458
to
d53c74c
Compare
a4077a9
to
d2c8e84
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comments
s-pipes-modules/module-tabular/src/main/java/cz/cvut/spipes/modules/TabularModule.java
Outdated
Show resolved
Hide resolved
...es-modules/module-tabular/src/main/java/cz/cvut/spipes/modules/util/StreamReaderAdapter.java
Outdated
Show resolved
Hide resolved
d2c8e84
to
2c14733
Compare
aba244e
to
03a9272
Compare
import java.util.List; | ||
|
||
public interface StreamReaderAdapter { | ||
void initialise(InputStream inputStream, ResourceFormat sourceResourceFormat, int tableIndex, boolean acceptInvalidQuoting, Charset inputCharset, StreamResource sourceResource) throws IOException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if it makes sense to have acceptInvalidQuoting
for other adapters beside CSV ?
public interface StreamReaderAdapter { | ||
void initialise(InputStream inputStream, ResourceFormat sourceResourceFormat, int tableIndex, boolean acceptInvalidQuoting, Charset inputCharset, StreamResource sourceResource) throws IOException; | ||
String[] getHeader(Boolean skipHeader) throws IOException; | ||
boolean hasNextRow() throws IOException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why thowing exception?
boolean hasNextRow() throws IOException; | ||
List<String> getNextRow() throws IOException; | ||
List<Region> getMergedRegions(); | ||
String getSheetLabel() throws IOException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it throw exeption?
@Override | ||
public void initialise(InputStream inputStream, ResourceFormat sourceResourceFormat, int tableIndex, | ||
boolean acceptInvalidQuoting, Charset inputCharset, StreamResource sourceResource) throws IOException { | ||
//listReader = new CsvListReader(new InputStreamReader(inputStream), csvPreference); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
return sheet.getSheetName(); | ||
} | ||
|
||
public String fixNumberFormat (String cellValue){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
logMissingQuoteError(); | ||
return getExecutionContext(inputModel, outputModel); | ||
} | ||
streamReaderAdapter.initialise(new ByteArrayInputStream(sourceResource.getContent()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does not make sense to have acceptInvalidQuoting
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to remove parts that are not common for all adapters in initialize method.
logMissingQuoteError(); | ||
return getExecutionContext(inputModel, outputModel); | ||
} | ||
fileReaderAdapter.initialise(new ByteArrayInputStream(sourceResource.getContent()), sourceResourceFormat, processTableAtIndex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not find it anywhere
this.acceptInvalidQuoting = acceptInvalidQuoting; | ||
this.inputCharset = inputCharset; | ||
this.sourceResource = sourceResource; | ||
listReader = getCsvListReader(csvPreference); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see logMissingQuoteError();
} | ||
|
||
@Override | ||
public String[] getHeader(Boolean skipHeader) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public String[] getHeader(Boolean skipHeader) throws IOException { | |
public String[] getHeader(boolean skipHeader) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skipHeader == null should be handled on different level -- i.e. default value of the field skipHeader.
|
||
@Override | ||
public boolean hasNextRow() throws IOException { | ||
return ((listReader.read() != null) || (firstRow != null)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cannot work properly because calling hasNextRow mutlipe times will result in reading different parts of the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
??Maybe?? the idea should be that we will read oneline upfront.
@MSattrtand please describe this in documentation of the module:
|
Resolves #228.
.xls files are now being processed separately, without conversion to .csv. This is preliminary code, I'll work on reducing the amount of duplicate code using adapters and refractor it.