Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using hs3, MinIO, and store_delayed_as_dataset, the task never finishes #59

Closed
josephwinston opened this issue Jun 3, 2019 · 4 comments

Comments

@josephwinston
Copy link

Below is an example that fails on Linux, MinIO, and kartothek 3.0.1. The program never executes print ("Finished").

The MinIO browser does show the creation of the bucket, the hierarchy, and the file.

I do not see any messages from the `dask-scheduler' except 'heartbeat_worker'.

What am I doing wrong or is there any more information that I can provide to help debug the problem?

import sys

import pandas as pd

from storefact import get_store_from_url
from functools import partial

from kartothek.io.dask.delayed import store_delayed_as_dataset

def main(argv=None):

    if argv is None:
        argv = sys.argv

    verbose = False
    useTemporaryDirectory = True

    df = pd.DataFrame({
        'file' : ['file 1', ],
        'description' : ['description 1'],
        'run' : ['run 1',],
    }
    )

    store_factory = partial(get_store_from_url,
                            'hs3://secret:[email protected]:9000/kart?create_if_missing=true')
    
    print (df.head())

    input_list_of_partitions = [
        {
            'label': 'Run Record Description',
            'data': [
                ('MetaData', df),
            ]
        },
    ]

    print ('Using Dask')
    task = store_delayed_as_dataset(input_list_of_partitions,
                                    store=store_factory,
                                    dataset_uuid='uuid-1',
                                    metadata={'dataset': 'dateset'},  # This is optional dataset metadata
                                    metadata_version=4,
    )

    task.compute()
    print ("Finished")
        
    return

if __name__ == '__main__':
    sys.exit(main() or 0)
@fjetter
Copy link
Collaborator

fjetter commented Jun 3, 2019

Can you try creating any key in the storage with the factory?

store_factory = partial(get_store_from_url,
                            'hs3://secret:[email protected]:9000/kart?create_if_missing=true')
store_factory().put("test", b"value") 

This way we can verify that the store properly connects to S3 and doesn't block

@josephwinston
Copy link
Author

The store_factory().put("test", b"value") correctly writes the value into test. However, this key/value creation also blocks.

The traceback when I interrupt the sample looks like this:

 File "test-hs3.py", line 60, in <module>
    sys.exit(main() or 0)
  File "test-hs3.py", line 32, in main
    store_factory().put("test", b"value") 
  File "/home/josephwinston/anaconda3/envs/kartothek/lib/python3.6/site-packages/simplekv/__init__.py", line 150, in put
    return self._put(key, data)
  File "/home/josephwinston/anaconda3/envs/kartothek/lib/python3.6/site-packages/simplekv/net/botostore.py", line 121, in _put
    data, **self.__upload_args()
  File "/home/josephwinston/anaconda3/envs/kartothek/lib/python3.6/site-packages/boto/s3/key.py", line 1442, in set_contents_from_string
    encrypt_key=encrypt_key)
  File "/home/josephwinston/anaconda3/envs/kartothek/lib/python3.6/site-packages/boto/s3/key.py", line 1309, in set_contents_from_file
    chunked_transfer=chunked_transfer, size=size)
  File "/home/josephwinston/anaconda3/envs/kartothek/lib/python3.6/site-packages/boto/s3/key.py", line 762, in send_file
    chunked_transfer=chunked_transfer, size=size)
  File "/home/josephwinston/anaconda3/envs/kartothek/lib/python3.6/site-packages/boto/s3/key.py", line 963, in _send_file_internal
    query_args=query_args
  File "/home/josephwinston/anaconda3/envs/kartothek/lib/python3.6/site-packages/boto/s3/connection.py", line 671, in make_request
    retry_handler=retry_handler
  File "/home/josephwinston/anaconda3/envs/kartothek/lib/python3.6/site-packages/boto/connection.py", line 1071, in make_request
    retry_handler=retry_handler)
  File "/home/josephwinston/anaconda3/envs/kartothek/lib/python3.6/site-packages/boto/connection.py", line 940, in _mexe
    request.body, request.headers)
  File "/home/josephwinston/anaconda3/envs/kartothek/lib/python3.6/site-packages/boto/s3/key.py", line 891, in sender
    response = http_conn.getresponse()
  File "/home/josephwinston/anaconda3/envs/kartothek/lib/python3.6/http/client.py", line 1331, in getresponse
    response.begin()
  File "/home/josephwinston/anaconda3/envs/kartothek/lib/python3.6/http/client.py", line 297, in begin
    version, status, reason = self._read_status()
  File "/home/josephwinston/anaconda3/envs/kartothek/lib/python3.6/http/client.py", line 258, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/home/josephwinston/anaconda3/envs/kartothek/lib/python3.6/socket.py", line 586, in readinto
    return self._sock.recv_into(b)
KeyboardInterrupt

@crepererum
Copy link
Contributor

To me, that sounds like a bug in underlying library (simplekv or boto) or an S3 server bug. MIGHT be related to mbr/simplekv#84.

@fjetter
Copy link
Collaborator

fjetter commented Jul 23, 2019

Closing this for now since it seems to be related to simplekv or boto. Feel free to reopen in case of new information

@fjetter fjetter closed this as completed Jul 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants