-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review #1
base: empty
Are you sure you want to change the base?
Conversation
…sts for BioOrEvent
…nic. Added some tests for NEs
… (former NamedEntity)
@maltevogl I can't assign you as a reviewer (I assume because you opened the PR?). Consider you assigned ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understanding the project was challenging, since I do not read arabic and have no idea about the base project. After reading the linked definition of the markdown format, things became a bit better. so keeping that in mind, please ignore any comments based on this potential misunderstanding.
one thing that I would prefer is a documentation generated with sphinx. The current linked doc is not visually clear. This could also help in a better understanding of what the code wants to achieve.
the start of the readme could go into more detail of what the main project is about to situate the repo, and also give an overview of the document class structure.
a main point is that I can not simple run the package with the testdata. this raises an error for me, while the test works fine. So either the usage section needs to adapted, or there is a bug?
```py | ||
import oimdp | ||
|
||
md_file = open("mARkdownfile", "r") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use with clause to make it better readable
md_file = open("mARkdownfile", "r") | |
with open("mARkdownfile", "r") as md_file: | |
text = md_file.read() | |
parsed = oimdp.parse(text) |
|
||
`content`: a list of content structures | ||
|
||
`get_clean_text()`: get the text stripped of markup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running the package routine on the test data throws an error for me? "DoxographicalItem" has no value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the test explained below works fine ?
|
||
## Parsed structure | ||
|
||
Please see [the docs](https://openiti.github.io/oimdp/), but here are some highlights: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would personally prefer using sphinx to create the docs as I find the current doc design hard to understand
return text_only | ||
|
||
|
||
def parse_line(tagged_il: str, index: int, obj=Line, first_token=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This defs are highly complex. Code would be much more maintainable if these structures could be split in subroutines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc string is not very informative. I would expect a much longer doc for such a long def :-)
from .structures import SectionHeader, Editorial, DictionaryUnit, BioOrEvent | ||
from .structures import DoxographicalItem, MorphologicalPattern, TextPart | ||
from .structures import AdministrativeRegion, RouteOrDistance, Riwayat | ||
from . import tags as t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import this as a more informative string, that way it becomes easier to read latero n
"""Riwāyāt unit""" | ||
|
||
|
||
class Document: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the original idea of using object oriented programming to represent the document could be made much stronger. If the main document always needs to have a magic number, this could become a function of the class not its own class. I guess all other things like parts, linenumber and so on are not allways there ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but if possible you could add them as potential structures of the document and fill them in the process of parsing. that way the structure of the document class could become much clearer
from typing import List, Literal | ||
|
||
|
||
class MagicValue: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is essential for the text document, you could move it to the main document class?
return self.value | ||
|
||
|
||
class SimpleMetadataField: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why to you need this type of classes in general? They seem to just copy the acutal text in the field and nothing more: Is the purpose a kind of tagging of the text parts?
os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir))) | ||
import unittest | ||
import oimdp | ||
from oimdp.structures import Age, BioOrEvent, Date, DictionaryUnit, Document, DoxographicalItem, Editorial, Hemistich, Hukm, Isnad, Line, Matn, Milestone, MorphologicalPattern, NamedEntity, OpenTagAuto, OpenTagUser, PageNumber, Paragraph, Riwayat, RouteDist, RouteFrom, RouteOrDistance, RouteTowa, SectionHeader, TextPart, Verse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from oimdp.structures import . should be enough
filepath = os.path.join( | ||
root, "test.md" | ||
) | ||
test_file = open(filepath, "r") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here again with open(): ... possible
@raffazizzi @jdamerow Have a look at these things. Maybe we can also discuss some points here, that I have misunderstood. . |
The ticket for this code review is: DHCodeReview/DHCodeReview#4
This repository is ready for review.