Dataset: fix bug revealed by pandas 1.0 #561

mattwelborn · 2020-02-06T16:05:06Z

Description

CI is failing. This seems to be caused by df = df.groupby(["name"])["return_result"].sum(skipna=True). It appears that skipna was never a valid kwarg of groupby().sum(), but that it was not checked until pandas 1.0. This PR removes skipna=True; the default behavior is to skip NaNs already.

Changelog description

Fixed a bug that caused errors with pandas v1.0.

Status

Code base linted
Ready to go

dgasmith · 2020-02-06T16:25:01Z

Can we bump minimum versions?

mattwelborn · 2020-02-06T16:28:12Z

Can we bump minimum versions?

Of pandas? This PR should work with pre-1.0 pandas just fine. skipna was silently ignored in pre-1.0 pandas, so the behavior is the same.

mattwelborn · 2020-02-06T16:36:21Z

@dgasmith Any idea what might be going on here:

_____________________________ test_mol_pagination ______________________________
storage_socket = <qcfractal.storage_sockets.sqlalchemy_socket.SQLAlchemySocket object at 0x7f5a1daab518>
    def test_mol_pagination(storage_socket):
        """
            Test Molecule pagination
        """
    
        assert len(storage_socket.get_molecules()["data"]) == 0
        mol_names = [
            "water_dimer_minima.psimol",
            "water_dimer_stretch.psimol",
            "water_dimer_stretch2.psimol",
            "neon_tetramer.psimol",
        ]
    
        total = len(mol_names)
        molecules = []
        for mol_name in mol_names:
            mol = ptl.data.get_molecule(mol_name)
            molecules.append(mol)
    
        inserted = storage_socket.add_molecules(molecules)
    
        assert inserted["meta"]["n_inserted"] == total
    
>       ret = storage_socket.get_molecules(skip=1)
qcfractal/tests/test_storage.py:1198: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
qcfractal/storage_sockets/sqlalchemy_socket.py:732: in get_molecules
    MoleculeORM, query, limit=limit, skip=skip, exclude=["molecule_hash", "molecular_formula"]
qcfractal/storage_sockets/sqlalchemy_socket.py:382: in get_query_projection
    rdata = [dict(zip(_projection, row)) for row in data]
qcfractal/storage_sockets/sqlalchemy_socket.py:382: in <listcomp>
    rdata = [dict(zip(_projection, row)) for row in data]
../../../miniconda/envs/test/lib/python3.6/site-packages/sqlalchemy/orm/loading.py:101: in instances
    util.raise_from_cause(err)
../../../miniconda/envs/test/lib/python3.6/site-packages/sqlalchemy/util/compat.py:398: in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
../../../miniconda/envs/test/lib/python3.6/site-packages/sqlalchemy/util/compat.py:153: in reraise
    raise value
../../../miniconda/envs/test/lib/python3.6/site-packages/sqlalchemy/orm/loading.py:85: in instances
    for row in fetch
../../../miniconda/envs/test/lib/python3.6/site-packages/sqlalchemy/orm/loading.py:85: in <listcomp>
    for row in fetch
../../../miniconda/envs/test/lib/python3.6/site-packages/sqlalchemy/orm/loading.py:84: in <listcomp>
    keyed_tuple([proc(row) for proc in process])
../../../miniconda/envs/test/lib/python3.6/site-packages/sqlalchemy/sql/type_api.py:1266: in process
    return process_value(impl_processor(value), dialect)
qcfractal/storage_sockets/models/sql_base.py:24: in process_result_value
    return msgpackext_loads(value)
../../../miniconda/envs/test/lib/python3.6/site-packages/qcelemental/util/serialization.py:113: in msgpackext_loads
    return msgpack.loads(data, object_hook=msgpackext_decode, raw=False)
msgpack/_unpacker.pyx:184: in msgpack._cmsgpack.unpackb
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
>   ???
E   TypeError: a bytes-like object is required, not 'NoneType'
msgpack/_unpacker.pyx:135: TypeError

dgasmith · 2020-02-06T21:14:34Z

@mattwelborn Yea, sparse molecule bug.

@doaa-altarawy What do you think of the fix?

mattwelborn · 2020-02-07T13:11:45Z

@dgasmith @doaa-altarawy I think we're back to database bugs now.

codecov · 2020-02-07T16:20:39Z

Codecov Report

Merging #561 into master will decrease coverage by 3.16%.
The diff coverage is 94.11%.

dgasmith · 2020-02-07T17:00:30Z

@mattwelborn Can you reproduce the single failure? I cannot seem to manage it locally.

dgasmith

Merging, this moves the bar forward. I will see if I can track down the remaining error.

Dataset: fix bug revealed by pandas 1.0

f0b2e81

doaa-altarawy approved these changes Feb 6, 2020

View reviewed changes

Storage: Allows handling of sparse molecule and msgpack None issues

eacdede

mattwelborn added 2 commits February 6, 2020 18:14

Dataset: fix another bug revealed by pandas 1.0

0ffc938

make format

c48bc96

Daniel G. A. Smith added 4 commits February 7, 2020 09:51

CI: Provides short errors for readability

ef8a89a

Storage: Patches up custom queries after sparse molecule

e5a6d3e

Testing: Minor patchups after sparse molecule in storage sockets

d45634c

Storage: Allows skipping of version checks when resetting the database

cbdaf7e

dgasmith force-pushed the pandas_1_bug branch from b3d1877 to cbdaf7e Compare February 7, 2020 15:40

CI: Bumps pydantic depends

fc46f64

dgasmith approved these changes Feb 8, 2020

View reviewed changes

dgasmith merged commit 01e3553 into MolSSI:master Feb 8, 2020

dgasmith mentioned this pull request Feb 8, 2020

[DNM] CI debug #557

Closed

dgasmith added this to the v0.13.1 milestone Feb 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset: fix bug revealed by pandas 1.0 #561

Dataset: fix bug revealed by pandas 1.0 #561

mattwelborn commented Feb 6, 2020

dgasmith commented Feb 6, 2020

mattwelborn commented Feb 6, 2020

mattwelborn commented Feb 6, 2020

dgasmith commented Feb 6, 2020

mattwelborn commented Feb 7, 2020

codecov bot commented Feb 7, 2020

dgasmith commented Feb 7, 2020

dgasmith left a comment

Dataset: fix bug revealed by pandas 1.0 #561

Dataset: fix bug revealed by pandas 1.0 #561

Conversation

mattwelborn commented Feb 6, 2020

Description

Changelog description

Status

dgasmith commented Feb 6, 2020

mattwelborn commented Feb 6, 2020

mattwelborn commented Feb 6, 2020

dgasmith commented Feb 6, 2020

mattwelborn commented Feb 7, 2020

codecov bot commented Feb 7, 2020

Codecov Report

dgasmith commented Feb 7, 2020

dgasmith left a comment

Choose a reason for hiding this comment