Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP Utils and PathBuilder mods #85

Open
wants to merge 44 commits into
base: main
Choose a base branch
from
Open

HTTP Utils and PathBuilder mods #85

wants to merge 44 commits into from

Conversation

paulhamer-noaa
Copy link
Contributor

Linear Issue

IDSSE-1001

Changes

  • Introduce a HTTP utils based on the aws_utils implementation
  • Update pathbuilder to not fail when parsing file paths that do not conform to expected format

Explanation

Required supporting code for MRMS observation data ingest from NSSL directly

Create a generic HTTP class based on the aws_utils with similar issue/valid time handling
to deal with possible observation datasets.
Copy link
Contributor

@Geary-Layne Geary-Layne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't how much is in common with aws_utils, but if there is a sufficient about it would be better to create a base class for whatever is common and extend for aws and http

python/idsse_common/idsse/common/http_utils.py Outdated Show resolved Hide resolved
python/idsse_common/idsse/common/path_builder.py Outdated Show resolved Hide resolved
python/idsse_common/idsse/common/http_utils.py Outdated Show resolved Hide resolved

logger = logging.getLogger(__name__)

# pylint: disable=duplicate-code, broad-exception-caught
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# pylint: disable=duplicate-code, broad-exception-caught
# pylint: disable=duplicate-code, broad-exception-caught

If we have to disable linter we want to have as small of a scope as possible. So 'broad-exception-caught' should be for the specific line of code. I'm guessing that the 'duplicate-code' is because this module is very similar to the aws_utils.py, if that is the case maybe we should make a recourse_utils.py for the common and both aws_utils and http_utils extend that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I raised the possibility of an interface initially, many of the calls made by DAS would be the same for http and aws, at this point it can probably wait for a while.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the refactor we don't need to disable duplicate code, and broad exception should be applied directly to the except and not globally, and ideally we want to specify the specific exception(s) that might happen

@paulhamer-noaa
Copy link
Contributor Author

I don't how much is in common with aws_utils, but if there is a sufficient about it would be better to create a base class for whatever is common and extend for aws and http

I've added a protocol_utils.py as the base class and updated the aws_utils, http_utils code accordingly.

Geary-Layne
Geary-Layne previously approved these changes Dec 3, 2024
Copy link
Contributor

@Geary-Layne Geary-Layne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. There are a few comments/suggestions.

python/idsse_common/idsse/common/aws_utils.py Outdated Show resolved Hide resolved
python/idsse_common/idsse/common/protocol_utils.py Outdated Show resolved Hide resolved
@@ -361,6 +361,9 @@ def parse_args(key: str, value: str, result: dict):
for i, _dir in enumerate(dirs):
res = re.search(r'{.*}', _dir)
if res:
parse_args(res.group(), vals[i][res.span()[0]:], time_args)
try :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need to incorporate this exception handling in the new implementation of path builder, thanks for finding/fixing. Is there a test that exercises this code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My test was running it against the MRMS http feed so no.

python/idsse_common/test/test_aws_utils.py Show resolved Hide resolved
python/idsse_common/test/test_http_utils.py Outdated Show resolved Hide resolved
python/idsse_common/test/test_http_utils.py Outdated Show resolved Hide resolved
assert len(result) == 1
assert result[0] == EXAMPLE_ISSUE

def test_get_valids_all(http_utils: HttpUtils, httpserver: HTTPServer):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this http_utils.get_valids doesn't find anything, is this a meaningful test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does return a list of files based on the pattern provided so I feel that this is handled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This return an empty list because the sample filename you used don't have and valid info. If the files were structured list this:
'MRMS_MergedReflectivityQC_00.50_{issue.year:04d}{issue.month:02d}{issue.day:02d}'
'-{issue.hour:02d}{issue.minute:02d}{issue.second:02d}'_{valid.year:04d}{valid.month:02d}'
'{valid.day:02d}-{valid.hour:02d}{valid.minute:02d}{valid.second:02d}.grib2.gz'
It should return a valid datetime that matches the issue. We want http_utils to be generic enough to handle more than MRMS so we need to test that it can get valids.

#assert result[0] == (EXAMPLE_VALID, f'{EXAMPLE_URL}{EXAMPLE_PROD_DIR}{EXAMPLE_FILES[1]}')


def test_get_valids_with_wildcards(http_utils_with_wild: HttpUtils, httpserver: HTTPServer):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this http_utils_with_wild.get_valids doesn't find anything, is this a meaningful test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I'll add a meaningful test...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again if the filepath format contained valid this should return something. If you want, I'd be okay if we dropped this test unless it is need for code coverage. If you do drop it make sure to remove http_utils_with_wild feature if it isn't used anywhere else.

@paulhamer-noaa
Copy link
Contributor Author

paulhamer-noaa commented Dec 5, 2024 via email

@paulhamer-noaa
Copy link
Contributor Author

paulhamer-noaa commented Dec 5, 2024 via email

Copy link
Contributor

@Geary-Layne Geary-Layne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still see two changes that needs to be made. 1) Remove module level pylint disable (line 23) from http_utils.py, and 2) change the test_get_valid... so that they return something (not empty lists) in test_http_utils.py.

python/idsse_common/idsse/common/http_utils.py Outdated Show resolved Hide resolved
@paulhamer-noaa
Copy link
Contributor Author

paulhamer-noaa commented Dec 12, 2024 via email

Copy link
Contributor

@Geary-Layne Geary-Layne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good to merge now.

Creating mock file names with both issue and valid was a good choice. I don't understand how wildcards work, but didn't with old aws_utils. Maybe take a look because the mock wildcard instance looks strange to me with the '?' between minute and second.

@fixture
def http_utils_with_wild() -> HttpUtils:
    EXAMPLE_BASE_DIR = 'http://127.0.0.1:5000/data/'
    EXAMPLE_SUB_DIR = '3DRefl/MergedReflectivityQC_00.50/'
    EXAMPLE_FILE_BASE = ('MRMS_MergedReflectivityQC_00.50_{issue.year:04d}{issue.month:02d}{issue.day:02d}'
                         '-{issue.hour:02d}{issue.minute:02d}?{issue.second:02d}')
    EXAMPLE_FILE_EXT = '.grib2.gz'

@paulhamer-noaa
Copy link
Contributor Author

paulhamer-noaa commented Dec 13, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants