-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Comment in ODS-file gets included in string cells #55200
Comments
pyZerrenner
added
Bug
Needs Triage
Issue that has not been reviewed by a pandas team member
labels
Sep 19, 2023
I can confirm that this bug exists on the main branch |
rhshadrach
added
IO Excel
read_excel, to_excel
and removed
Needs Triage
Issue that has not been reviewed by a pandas team member
labels
Sep 19, 2023
Thanks for the report! Further investigations and PRs to fix are welcome! |
take |
take |
5 tasks
Maybe just skip office:annotation? diff --git a/pandas/io/excel/_odfreader.py b/pandas/io/excel/_odfreader.py
index 277f64f636..48677468c7 100644
--- a/pandas/io/excel/_odfreader.py
+++ b/pandas/io/excel/_odfreader.py
@@ -206,7 +206,11 @@ class ODFReader(BaseExcelReader["OpenDocument"]):
cell_value = cell.attributes.get((OFFICENS, "value"))
return float(cell_value)
elif cell_type == "string":
return self._get_cell_string_value(cell)
elif cell_type == "currency":
cell_value = cell.attributes.get((OFFICENS, "value"))
return float(cell_value)
@@ -228,8 +232,10 @@ class ODFReader(BaseExcelReader["OpenDocument"]):
"""
from odf.element import Element
from odf.namespaces import TEXTNS
+ from odf.office import Annotation
from odf.text import S
+ office_annotation = Annotation().qname
text_s = S().qname
value = []
@@ -239,6 +245,8 @@ class ODFReader(BaseExcelReader["OpenDocument"]):
if fragment.qname == text_s:
spaces = int(fragment.attributes.get((TEXTNS, "c"), 1))
value.append(" " * spaces)
+ elif fragment.qname == office_annotation:
+ continue
else:
# recursive impl needed in case of nested fragments
# with multiple spaces Or extract only |
@dimastbk I tried these changes and it looks good, you should open a MR :) |
5 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
Here is the example *.ods file for the example code: TableWithComment.ods. The file was create using LibreOffice Calc 7.4.7.2
The
print(df)
command produces the following outputIf a cell in the *.ods file has a comment and the cell content is a string (B1 and B7), the comment text and timestamp are appended in front of the cell content. This also applies to the header line. For cells containing numbers, the comment is ignored (A6 and B10).
(Note, that I am referring to comments inserted using LibreOffice Calc itself. This is unrelated to the
comment
argument ofread_excel
.)Expected Behavior
The ods-comments should be ignored and only the cell content read into the dataframe. The expected output from
print(df)
is(This is the output when all comments in the *.ods file are deleted)
Installed Versions
INSTALLED VERSIONS
commit : ba1cccd
python : 3.10.12.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.17763
machine : AMD64
processor : Intel64 Family 6 Model 165 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en
LOCALE : de_DE.cp1252
pandas : 2.1.0
numpy : 1.26.0
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 68.2.2
pip : 23.2.1
Cython : None
pytest : None
hypothesis : None
sphinx : 7.2.6
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.15.0
pandas_datareader : None
bs4 : 4.12.2
bottleneck : None
dataframe-api-compat: None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : 2.4.0
pyqt5 : None
(The installed odfpy version is 1.4.1, which somehow is not shown in the above output)
Libre Office Version
Version: 7.4.7.2 (x64) / LibreOffice Community
Build ID: 723314e595e8007d3cf785c16538505a1c878ca5
CPU threads: 12; OS: Windows 10.0 Build 17763; UI render: Skia/Raster; VCL: win
Locale: de-DE (de_DE); UI: de-DE
Calc: threaded
The text was updated successfully, but these errors were encountered: