Review #1

maltevogl · 2022-04-20T18:28:41Z

The ticket for this code review is: DHCodeReview/DHCodeReview#4

This repository is ready for review.

…stones for now

…sts for BioOrEvent

…nic. Added some tests for NEs

… (former NamedEntity)

jdamerow · 2022-05-26T18:53:45Z

@maltevogl I can't assign you as a reviewer (I assume because you opened the PR?). Consider you assigned ;)

maltevogl

Understanding the project was challenging, since I do not read arabic and have no idea about the base project. After reading the linked definition of the markdown format, things became a bit better. so keeping that in mind, please ignore any comments based on this potential misunderstanding.
one thing that I would prefer is a documentation generated with sphinx. The current linked doc is not visually clear. This could also help in a better understanding of what the code wants to achieve.
the start of the readme could go into more detail of what the main project is about to situate the repo, and also give an overview of the document class structure.
a main point is that I can not simple run the package with the testdata. this raises an error for me, while the test works fine. So either the usage section needs to adapted, or there is a bug?

maltevogl · 2022-06-17T15:52:39Z

README.md

+```py
+import oimdp
+
+md_file = open("mARkdownfile", "r")


I would use with clause to make it better readable

Suggested change

md_file = open("mARkdownfile", "r")

with open("mARkdownfile", "r") as md_file:

text = md_file.read()

parsed = oimdp.parse(text)

maltevogl · 2022-06-17T15:55:03Z

README.md

+
+`content`: a list of content structures
+
+`get_clean_text()`: get the text stripped of markup


Running the package routine on the test data throws an error for me? "DoxographicalItem" has no value

but the test explained below works fine ?

maltevogl · 2022-06-17T15:57:04Z

README.md

+
+## Parsed structure
+
+Please see [the docs](https://openiti.github.io/oimdp/), but here are some highlights:


I would personally prefer using sphinx to create the docs as I find the current doc design hard to understand

maltevogl · 2022-06-17T16:11:16Z

oimdp/parser.py

+    return text_only
+
+
+def parse_line(tagged_il: str, index: int, obj=Line, first_token=None):


This defs are highly complex. Code would be much more maintainable if these structures could be split in subroutines

The doc string is not very informative. I would expect a much longer doc for such a long def :-)

maltevogl · 2022-06-17T16:14:26Z

oimdp/parser.py

+from .structures import SectionHeader, Editorial, DictionaryUnit, BioOrEvent
+from .structures import DoxographicalItem, MorphologicalPattern, TextPart
+from .structures import AdministrativeRegion, RouteOrDistance, Riwayat
+from . import tags as t


import this as a more informative string, that way it becomes easier to read latero n

maltevogl · 2022-06-20T08:44:45Z

oimdp/structures.py

+    """Riwāyāt unit"""
+
+
+class Document:


I think that the original idea of using object oriented programming to represent the document could be made much stronger. If the main document always needs to have a magic number, this could become a function of the class not its own class. I guess all other things like parts, linenumber and so on are not allways there ?

but if possible you could add them as potential structures of the document and fill them in the process of parsing. that way the structure of the document class could become much clearer

maltevogl · 2022-06-20T08:45:13Z

oimdp/structures.py

+from typing import List, Literal
+
+
+class MagicValue:


Since this is essential for the text document, you could move it to the main document class?

maltevogl · 2022-06-20T08:47:02Z

oimdp/structures.py

+        return self.value
+
+
+class SimpleMetadataField:


Why to you need this type of classes in general? They seem to just copy the acutal text in the field and nothing more: Is the purpose a kind of tagging of the text parts?

maltevogl · 2022-06-20T08:48:57Z

tests/test.py

+    os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir)))
+import unittest 
+import oimdp
+from oimdp.structures import Age, BioOrEvent, Date, DictionaryUnit, Document, DoxographicalItem, Editorial, Hemistich, Hukm, Isnad, Line, Matn, Milestone, MorphologicalPattern, NamedEntity, OpenTagAuto, OpenTagUser, PageNumber, Paragraph, Riwayat, RouteDist, RouteFrom, RouteOrDistance, RouteTowa, SectionHeader, TextPart, Verse


from oimdp.structures import . should be enough

maltevogl · 2022-06-20T08:49:30Z

tests/test.py

+        filepath = os.path.join(
+            root, "test.md"
+        )
+        test_file = open(filepath, "r")


here again with open(): ... possible

maltevogl · 2022-06-20T09:01:19Z

@raffazizzi @jdamerow Have a look at these things. Maybe we can also discuss some points here, that I have misunderstood. .

raffazizzi and others added 30 commits August 28, 2019 17:56

first commit

8291cf9

added hemistich for testing

4340c62

introduced actual unit testing

531cf56

Progress on parsing metatags, pages, paragraphs, verse, lines

0e773e1

organized tags, added more structures and phrase lv parsing

ee04544

Implemented phrase parts, though only ingesting page numbers and mile…

fba193d

…stones for now

better handling of default class argument for Line

2c858e7

added a setup.py for pypi

c5d430f

correct test dir

087405e

added exception

cd7606d

added test from release file

0f7a777

documentation

0c2cfd0

improved pages within paragraphs and biographies (with some tests)

654b0fc

fixed checking and tag replacement order for BioOrEvent; completed te…

16e22fe

…sts for BioOrEvent

made dictionary entry paragraph-like and added tests

ede6e85

flattened structures except for Lines

3167656

promoted editorial section match in ordre of lines loop, added tests

9acaa6d

promoted morphological pattern and wrote test

1fdd4b2

improved support for riwayat and added tests

5e2560e

implemented routes or distances and added tests

879d39c

added tests for section headers

3ff8441

improved section headers handling

dc620d7

improved verse parsing and added tests

9053a16

added test for milestone

781be2d

parsing named entities as line parts. Made an if structure more pytho…

dcc3430

…nic. Added some tests for NEs

stricted datatypes

2e8fe21

more tests for named entites

ac9e257

improved named entity parsing and added tests

e8e2f95

added support for open tags and added tests

05e22f9

added support for open auto tags and test

6d0ca75

raffazizzi and others added 9 commits December 2, 2021 14:32

updated readme

301ddd5

simple script for docs

517387e

updated version

d4e66a5

added license

cffe367

improved named entities parsing. Bumped version

007b4ca

major improvements on named entitiy handling. Introduced Date and Age…

0b769d4

… (former NamedEntity)

upped version

9cb0dfb

requiring python 3.8 upped version

50719ee

Merge branch 'master' into review

a45b743

maltevogl self-assigned this May 27, 2022

maltevogl commented Jun 20, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review #1

Review #1

maltevogl commented Apr 20, 2022 •

edited by jdamerow

Loading

jdamerow commented May 26, 2022

maltevogl left a comment

maltevogl Jun 17, 2022

maltevogl Jun 17, 2022

maltevogl Jun 17, 2022

maltevogl Jun 17, 2022

maltevogl Jun 17, 2022

maltevogl Jun 17, 2022

maltevogl Jun 17, 2022

maltevogl Jun 20, 2022

maltevogl Jun 20, 2022

maltevogl Jun 20, 2022

maltevogl Jun 20, 2022

maltevogl Jun 20, 2022

maltevogl Jun 20, 2022

maltevogl commented Jun 20, 2022

-md_file = open("mARkdownfile", "r")
+with open("mARkdownfile", "r") as md_file:
+     text = md_file.read()
+     parsed = oimdp.parse(text)


		`content`: a list of content structures

		`get_clean_text()`: get the text stripped of markup


		## Parsed structure

		Please see [the docs](https://openiti.github.io/oimdp/), but here are some highlights:

		return text_only


		def parse_line(tagged_il: str, index: int, obj=Line, first_token=None):

Review #1

Are you sure you want to change the base?

Review #1

Conversation

maltevogl commented Apr 20, 2022 • edited by jdamerow Loading

jdamerow commented May 26, 2022

maltevogl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maltevogl commented Jun 20, 2022

maltevogl commented Apr 20, 2022 •

edited by jdamerow

Loading