-
Notifications
You must be signed in to change notification settings - Fork 34
Data Structures
Here is a description of the most important and recurring data structures used in the Python source.
In the source code, following things might be of interest:
-
/lib/orgparser.py
→parse_orgmode_file(…)
- the main routine of the Org-mode parser
-
/lib/htmlizer.py
→sanitize_and_htmlize_blog_content(…)
- the main routine of the HTMLization process
- http://orgmode.org/worg/dev/org-syntax.html
- OLD: list of Org Mode elements: http://article.gmane.org/gmane.emacs.orgmode/67871
- Take a look at an Org-mode test-file (for unit testing) containing all implemented Org-mode syntax elements
Org elements: from ox-ascii.el
(Org-mode)
Org Element | [fn:earmarked] | [fn:lowprio] | implemented since | [fn:internalrepresentation] | HTML5 |
---|---|---|---|---|---|
external hyperlinks | <2014-01-30 Thu> | a | |||
internal links | <2014-03-03 Mon> | a | |||
bold | <2014-01-30 Thu> | b | |||
center-block | x | ||||
clock | x | ||||
code | <2014-01-30 Thu> | code | |||
drawer | x | ||||
dynamic-block | x | ||||
entity | |||||
example-block | <2014-01-30 Thu> | [‘example-block’, ‘name or None’, [u’first line’, u’second line’]] | FIXXME | ||
example “colon-block” | <2014-08-10 Sun> | [‘colon-block’, False, [u’first line’, u’second line’]] | pre | ||
export-block | x | ||||
export-snippet | x | ||||
fixed-width | x | ||||
footnote-definition | x | ||||
footnote-reference | x | ||||
headline | <2014-01-30 Thu> | [‘heading’, {‘level’: 3, ‘title’: u’my title’}] | section+header+h1 | ||
horizontal-rule | <2014-01-31 Fri> | [‘hr’] | (ignored and only interpreted to mark end of standfirst) | ||
inline-src-block | x | ||||
inlinetask | x | ||||
inner-template | x | ||||
italic | x | ||||
item | |||||
keyword | x | ||||
latex-environment | <2014-01-30 Thu> | [fn:pypandoc] [‘latex-block’, ‘name or None’, [u’first line’, u’second line’]] | |||
latex-fragment | x | ||||
line-break | x | ||||
link | x | ||||
paragraph | <2014-01-30 Thu> | [‘par’, u’line1’, u’line2’] | p | ||
plain-list | x | [‘list-itemize’, [u’first line’, u’second line’]] | ul+li | ||
plain-text | <2014-01-30 Thu> | see: paragraph | |||
planning | x | ||||
quote-block | <2014-01-30 Thu> | [‘quote-block’, ‘name or None’, [u’first line’, u’second line’]] | blockquote | ||
quote-section | ? | ||||
radio-target | x | ||||
section | <2014-01-30 Thu> | [‘heading’, {‘title’: u’Sub-heading foo’, ‘level’: 3}] | h2, h3, … | ||
special-block | x | ||||
src-block | <2014-01-30 Thu> | [‘src-block’, ‘name or None’, [u’first line’, u’second line’]] | pre | ||
statistics-cookie | x | ||||
strike-through | x | ||||
subscript | x | ||||
superscript | x | ||||
table | x | [fn:pypandoc] | |||
table-cell | x | ||||
table-row | x | ||||
target | |||||
template | x | ||||
timestamp | x | ||||
underline | x | ||||
verbatim | x | pre | |||
verse-block | <2014-01-30 Thu> | [‘verse-block’, ‘name or None’, [u’first line’, u’second line’]] | pre | ||
html-block | <2014-01-30 Thu> | [‘html-block’, ‘name or None’, [u’first line’, u’second line’]] | pre (if no #+NAME: then insert directly!) | ||
tsfile-links | <2017-06-17 Sat> | [‘cust_link_image’, u’2017-03-11T18.29.20 Stars.jpg’, {u’width’: u’300’, u’alt’: u’Stars in a Tree’, u’align’: u’right’}] | figure, img + attributes, figcaption | ||
the rest | [fn:pypandoc] |
NOTE: OrgParser is using “par” for anything it can not interpret as something else.
[fn:earmarked] Planned to be implemented soon (or at all :-)
[fn:lowprio] This feature is low on my personal development list (way take some time or might never get implemented)
[fn:pypandoc] This element gets converted using pypndoc (and additional sanitizing)
[fn:internalrepresentation] usually in list: blog_data['id-of-entry']['content']
- Blocks: (beginning with
BEGIN_
)
For a complete list of content elements, please take a look at id:implemented-org-elements (above) FIXXME
blog_data
is a Python list containing one dictionary entry per blog entry:
- FIXXME: add examples of:
- category
- other additional data
blog_data = \
[ {'level': 2, ## number of asterisks
'title': u'This is a blog entry about foo',
'usertags': [u'tag1', u'tag2'],
'autotags': {'language': 'english'},
'id': u'lazyblorg-example-entry', ## ID from PROPERTIES-drawer
'finished-timestamp-history': [datetime1, datetime2, datetime3],
'latestupdateTS': datetime, ## most current time-stamp that changed (or overwrote) heading to DONE
'firstpublishTS': datetime, ## oldest time-stamp that changed heading to DONE
'created': datetime,
'content': [ ['par', u'This is the Org-mode content'], ## 'par: paragraph containing anything that is not defined like tables, ...
'\n', ## change of paragraph
['heading', {'level': 3, 'title': u'Another aspect'}],
['html-block', 'its name or None', [u'first line', u'second line', u'', u'last line']],
['list-itemize', [u'first line', u'second line']],
['cust_link_image', u'2017-03-11T18.29.20 Stars.jpg', {u'width': u'300', u'alt': u'Stars in a Tree', u'align': u'right'}]
] #FIXXME: further elements
} ]
Thus:
blog_data[0].keys()
## ... results in:
# ['title',
# 'latestupdateTS',
# 'firstpublishTS',
# 'created',
# 'usertags',
# 'content',
# 'finished-timestamp-history',
# 'level',
# 'id']
blog_data[0]['content'] ## -> list of elements of content
# [['text', u'This is the Org-mode content'],
# ['heading', {'level': 3, 'title': u'Another aspect'}],
# ['list-itemize', [u'first line', u'second line']],
# ['table', u'FIXXME: followed by this table data'],
# ['image', u'FIXXME: followed by this image']]
Example:
>>> metadata {u'2013-08-22-testid': {'title': u"This is the title", 'latestupdateTS': datetime.datetime(2013, 8, 22, 21, 6), 'firstpublishTS': datetime.datetime(2013, 8, 22, 21, 6), 'checksum': 'b757f8478bffd6c70a474f213d6520de', 'created': datetime.datetime(2013, 8, 22, 21, 6)}, u'2013-02-12-lazyblorg-example-entry': {'latestupdateTS': datetime.datetime(2013, 2, 14, 19, 2), 'checksum': '24af2246a5121e829a0dbbd6e2425c15', 'created': datetime.datetime(2013, 2, 12, 10, 58)}}
Keys of the dict: IDs of the entries:
>>> metadata.keys() [u'2013-08-22-testid', u'2013-02-12-lazyblorg-example-entry']
One entry with key=ID holds a dict with following entries:
- ‘title’: string containing the title of the blog entry
- ‘latestupdateTS’: datetime.datetime(2013, 8, 22, 21, 6)
- most recent time-stamp from the LOGBOOK drawer which marked going to a final state
- ‘checksum’: ‘b757f8478bffd6c70a474f213d6520de’
- md5 check-sum of: [title, tags, finished_timestamp_history, content]
- ‘created’: datetime.datetime(2013, 8, 22, 21, 6)
- datetime object of the CREATED property from the PROPERTY drawer
- [ ] FIXXME: why not the first CLOSED time-stamp?
Example:
CLOSED: [2014-01-31 Fri 14:02] :LOGBOOK: - State "DONE" from "DONE" [2014-02-01 Sat 18:42] - State "DONE" from "" [2014-01-30 Thu 14:02] :END: :PROPERTIES: :CREATED: [2014-01-28 Tue 14:02] :ID: 2014-01-27-lb-tests :END:
What happens with the various time-stamps?
- most recent LOGBOOK entry of setting to DONE:
- added to entry[‘finished-timestamp-history’] (which is a list)
- overwrites entry[‘latestupdateTS’] if is newer than the old one
- entry[‘latestupdateTS’] is the most recent LOGBOOK entry of setting to DONE
- overwrites entry[‘firstpublishTS’] if is older than the old one
- CREATED:
- entry[‘created’]
- CLOSED:
- ignored
- ID-timestamp:
- ignored
After parsing entry from above:
- entry[‘created’] = [2014-01-28 Tue 14:02]
- entry[‘latestupdateTS’] = [2014-02-01 Sat 18:42]
- note that
entry['timestamp']
was renamed toentry['latestupdateTS']
on 2017-02-12
- note that
- entry[‘firstpublishTS’] = [2014-01-30 Thu 14:02]
- Oldest entry of entry[‘finished-timestamp-history’] is the publication time-stamp!
- entry[‘finished-timestamp-history’] = [2014-02-01 Sat 18:42] and [2014-01-30 Thu 14:02]
The dict format is:
- dict with year (int) as key, value = list of 12 MONTH
- MONTH: list of 28-31 DAY
- DAY: list of 0 to many entry-IDs
for year in sorted(entries_timeline_by_published.keys()):
for month in enumerate(entries_timeline_by_published[year], start=0):
# month = tuple(index, list of days)
for day in enumerate(month[1], start=0):
# day = tuple(index, list of IDs)
for blogentry in day[1]:
print str(year) + '-' + str(month[0]) + '-' + str(day[0]) + " has entry: " + str(blogentry)
see Utils.__add_entry_to_entries_timeline_by_published(…)
how it is populated
see utils_test.py > test_entries_timeline_by_published_functions(…)
how it’s tested
Using lazyblorg:
- Page Types (must-read)
- Orgmode Elements (must-read)
- FAQs
- Roadmap
- Project Origin
- Similar Projects
Coding: