Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review #1

Open
wants to merge 39 commits into
base: empty
Choose a base branch
from
Open

Review #1

wants to merge 39 commits into from

Conversation

maltevogl
Copy link
Member

@maltevogl maltevogl commented Apr 20, 2022

The ticket for this code review is: DHCodeReview/DHCodeReview#4

This repository is ready for review.

raffazizzi and others added 30 commits August 28, 2019 17:56
@jdamerow
Copy link

@maltevogl I can't assign you as a reviewer (I assume because you opened the PR?). Consider you assigned ;)

@maltevogl maltevogl self-assigned this May 27, 2022
Copy link
Member Author

@maltevogl maltevogl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understanding the project was challenging, since I do not read arabic and have no idea about the base project. After reading the linked definition of the markdown format, things became a bit better. so keeping that in mind, please ignore any comments based on this potential misunderstanding.
one thing that I would prefer is a documentation generated with sphinx. The current linked doc is not visually clear. This could also help in a better understanding of what the code wants to achieve.
the start of the readme could go into more detail of what the main project is about to situate the repo, and also give an overview of the document class structure.
a main point is that I can not simple run the package with the testdata. this raises an error for me, while the test works fine. So either the usage section needs to adapted, or there is a bug?

```py
import oimdp

md_file = open("mARkdownfile", "r")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use with clause to make it better readable

Suggested change
md_file = open("mARkdownfile", "r")
with open("mARkdownfile", "r") as md_file:
text = md_file.read()
parsed = oimdp.parse(text)


`content`: a list of content structures

`get_clean_text()`: get the text stripped of markup
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running the package routine on the test data throws an error for me? "DoxographicalItem" has no value

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the test explained below works fine ?


## Parsed structure

Please see [the docs](https://openiti.github.io/oimdp/), but here are some highlights:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally prefer using sphinx to create the docs as I find the current doc design hard to understand

return text_only


def parse_line(tagged_il: str, index: int, obj=Line, first_token=None):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This defs are highly complex. Code would be much more maintainable if these structures could be split in subroutines

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc string is not very informative. I would expect a much longer doc for such a long def :-)

from .structures import SectionHeader, Editorial, DictionaryUnit, BioOrEvent
from .structures import DoxographicalItem, MorphologicalPattern, TextPart
from .structures import AdministrativeRegion, RouteOrDistance, Riwayat
from . import tags as t
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import this as a more informative string, that way it becomes easier to read latero n

"""Riwāyāt unit"""


class Document:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the original idea of using object oriented programming to represent the document could be made much stronger. If the main document always needs to have a magic number, this could become a function of the class not its own class. I guess all other things like parts, linenumber and so on are not allways there ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but if possible you could add them as potential structures of the document and fill them in the process of parsing. that way the structure of the document class could become much clearer

from typing import List, Literal


class MagicValue:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is essential for the text document, you could move it to the main document class?

return self.value


class SimpleMetadataField:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why to you need this type of classes in general? They seem to just copy the acutal text in the field and nothing more: Is the purpose a kind of tagging of the text parts?

os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir)))
import unittest
import oimdp
from oimdp.structures import Age, BioOrEvent, Date, DictionaryUnit, Document, DoxographicalItem, Editorial, Hemistich, Hukm, Isnad, Line, Matn, Milestone, MorphologicalPattern, NamedEntity, OpenTagAuto, OpenTagUser, PageNumber, Paragraph, Riwayat, RouteDist, RouteFrom, RouteOrDistance, RouteTowa, SectionHeader, TextPart, Verse
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from oimdp.structures import . should be enough

filepath = os.path.join(
root, "test.md"
)
test_file = open(filepath, "r")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here again with open(): ... possible

@maltevogl
Copy link
Member Author

@raffazizzi @jdamerow Have a look at these things. Maybe we can also discuss some points here, that I have misunderstood. .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet