Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issues with JSON format and handling of non-existing file #13

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Matmaus
Copy link

@Matmaus Matmaus commented Mar 6, 2023

Hi, I stumbled upon three issues. They are independent, but I think they are too small, so making three separate PRs seems unnecessary to me.

  1. JSON can not be printed
    process_onenote_file does not print document in JSON if json_output parameter is set. Instead, it always returns it. This return value is not even used in the main.
    This fork fixes use of json_output parameter. If True, process_onenote_file will print document in JSON form instead of the default text form. The default text form will be used otherwise. No value will be returned.
    $ git checkout 74d7b05  # upstream main
    $ pyonenote --json --file <any_onenote_file>
    $ # nothing
    $ git checkout e363ca6  # forked main
    $ pyonenote --json --file <any_onenote_file>
    {"headers": {"guidFileType": ...
  2. Fix typo OneDocment -> OneDocument
    I think it was not intended at least based on the module's name.
  3. Fix handling of non-existing file
    $ git checkout 74d7b05  # upstream main
    $ pyonenote --file nonexisting_file
    Traceback (most recent call last):
      File "/home/user/pyOneNote/virtualenv/bin/pyonenote", line 33, in <module>
        sys.exit(load_entry_point('pyOneNote', 'console_scripts', 'pyonenote')())
      File "/home/user/pyOneNote/pyOneNote/Main.py", line 84, in main
        sys.exit("File: %s doesn't exist", args.file)
    TypeError: exit expected at most 1 argument, got 2
    $ git checkout e363ca6  # forked main
    $ pyonenote --file nonexisting_file
    File: 'nonexisting_file' doesn't exist

Matmaus added 3 commits March 6, 2023 11:30
`process_onenote_file` does not print document in JSON if `json_output`
parameter is set. Instead, it always returns it. This return value is
not even used in the main.

This commit fixes use of `json_output` parameter. If `True`,
`process_onenote_file` will print document in JSON form instead of the
default text form. The default text form will be used otherwise. No
value will be returned.
@DissectMalware
Copy link
Owner

This JSON object is meant to be used in another application in which pyonenote is called as a library.
However, I agree because it is a switch, user should see something when they use it.
I will try to think about this a little more to see how we can handle both scenarios.

@Matmaus
Copy link
Author

Matmaus commented Mar 9, 2023

If I may, I would suggest to always return either data (dictionary object) or event better document (OneDocument object). Then any user can easily convert it either into JSON (document.get_json()) or into string (json.dumps(document.get_json())). I think it is better to let an user decide.

Also, I think it could be better to have a separate function which will be expected to be used as API. Some function which never print anything. For example (only first ~11 lines are modified):

def process_onenote_file(file):
    if not check_valid(file):
        log.error("please provide valid One file")
        exit()

    file.seek(0)
    return OneDocument(file)

def process_and_print_onenote_file(file, output_dir, extension, json_output):
    document = process_onenote_file(file)
    data = document.get_json()
    if not json_output:
        print('Headers\n####################################################################')
        indent = '\t'
        for key, header in data['headers'].items():
            print('{}{}: {}'.format(indent, key, header))
        print('\n\nProperties\n####################################################################')
        indent = '\t'
        for propertySet in data['properties']:
            print('{}{}:'.format(indent, propertySet['type']))
            for property_name, property_val in propertySet['val'].items():
                print('{}{}: {}'.format(indent+'\t', property_name, str(property_val)))
            print("")
        print('\n\nEmbedded Files\n####################################################################')
        indent = '\t'
        for name, file in data['files'].items():
            print('{}{}:'.format(indent, name))
            print('\t{}Extension: {}'.format(indent, file['extension']))
            print('{}'.format( get_hex_format(file['content'][:256], 16, indent+'\t')))
        if extension and not extension.startswith("."):
            extension = "." + extension
        counter = 0
        for file_guid, file in document.get_files().items():
            with open(
                    os.path.join(output_dir,
                                 "file_{}{}{}".format(counter, file["extension"], extension)), "wb"
            ) as output_file:
                output_file.write(file["content"])
            counter += 1
    else:
        print(json.dumps(data))

@DissectMalware
Copy link
Owner

I think it is a good suggestion, will refactor the code based on your suggestion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants