Data Structures

Here is a description of the most important and recurring data structures used in the Python source.

In the source code, following things might be of interest:

/lib/orgparser.py → parse_orgmode_file(…)
- the main routine of the Org-mode parser
/lib/htmlizer.py → sanitize_and_htmlize_blog_content(…)
- the main routine of the HTMLization process

Org-Mode Element Overview

http://orgmode.org/worg/dev/org-syntax.html
- OLD: list of Org Mode elements: http://article.gmane.org/gmane.emacs.orgmode/67871
Take a look at an Org-mode test-file (for unit testing) containing all implemented Org-mode syntax elements
- This is the resulting HTML file
- Here is the live-HTML version on my blog

Org elements: from ox-ascii.el (Org-mode)

Org Element	[fn:earmarked]	[fn:lowprio]	implemented since	[fn:internalrepresentation]	HTML5
external hyperlinks			<2014-01-30 Thu>		a
internal links			<2014-03-03 Mon>		a
bold			<2014-01-30 Thu>		b
center-block		x
clock		x
code			<2014-01-30 Thu>		code
drawer		x
dynamic-block		x
entity
example-block			<2014-01-30 Thu>	[‘example-block’, ‘name or None’, [u’first line’, u’second line’]]	FIXXME
example “colon-block”			<2014-08-10 Sun>	[‘colon-block’, False, [u’first line’, u’second line’]]	pre
export-block		x
export-snippet		x
fixed-width		x
footnote-definition		x
footnote-reference		x
headline			<2014-01-30 Thu>	[‘heading’, {‘level’: 3, ‘title’: u’my title’}]	section+header+h1
horizontal-rule			<2014-01-31 Fri>	[‘hr’]	(ignored and only interpreted to mark end of standfirst)
inline-src-block		x
inlinetask		x
inner-template		x
italic		x
item
keyword		x
latex-environment			<2014-01-30 Thu>	[fn:pypandoc] [‘latex-block’, ‘name or None’, [u’first line’, u’second line’]]
latex-fragment		x
line-break		x
link	x
paragraph			<2014-01-30 Thu>	[‘par’, u’line1’, u’line2’]	p
plain-list	x			[‘list-itemize’, [u’first line’, u’second line’]]	ul+li
plain-text			<2014-01-30 Thu>	see: paragraph
planning		x
quote-block			<2014-01-30 Thu>	[‘quote-block’, ‘name or None’, [u’first line’, u’second line’]]	blockquote
quote-section		?
radio-target		x
section			<2014-01-30 Thu>	[‘heading’, {‘title’: u’Sub-heading foo’, ‘level’: 3}]	h2, h3, …
special-block		x
src-block			<2014-01-30 Thu>	[‘src-block’, ‘name or None’, [u’first line’, u’second line’]]	pre
statistics-cookie		x
strike-through		x
subscript		x
superscript		x
table	x			[fn:pypandoc]
table-cell	x
table-row	x
target
template		x
timestamp		x
underline		x
verbatim	x				pre
verse-block			<2014-01-30 Thu>	[‘verse-block’, ‘name or None’, [u’first line’, u’second line’]]	pre
html-block			<2014-01-30 Thu>	[‘html-block’, ‘name or None’, [u’first line’, u’second line’]]	pre (if no #+NAME: then insert directly!)
tsfile-links			<2017-06-17 Sat>	[‘cust_link_image’, u’2017-03-11T18.29.20 Stars.jpg’, {u’width’: u’300’, u’alt’: u’Stars in a Tree’, u’align’: u’right’}]	figure, img + attributes, figcaption
the rest				[fn:pypandoc]

NOTE: OrgParser is using “par” for anything it can not interpret as something else.

[fn:earmarked] Planned to be implemented soon (or at all :-)

[fn:lowprio] This feature is low on my personal development list (way take some time or might never get implemented)

[fn:pypandoc] This element gets converted using pypndoc (and additional sanitizing)

[fn:internalrepresentation] usually in list: blog_data['id-of-entry']['content']

Blocks: (beginning with BEGIN_)
- EXPORT (new with Org-mode 9)
- ASCII
- HTML (deprecated since Org-mode 9)
- LATEX (deprecated since Org-mode 9)
- QUOTE
- SRC
- VERSE

Representation of blog data

For a complete list of content elements, please take a look at id:implemented-org-elements (above) FIXXME

blog_data is a Python list containing one dictionary entry per blog entry:

FIXXME: add examples of:
- category
- other additional data

blog_data = \
[ {'level': 2,                                                ## number of asterisks
   'title': u'This is a blog entry about foo',
   'usertags': [u'tag1', u'tag2'],
   'autotags': {'language': 'english'},
   'id': u'lazyblorg-example-entry',                          ## ID from PROPERTIES-drawer
   'finished-timestamp-history': [datetime1, datetime2, datetime3],
   'latestupdateTS': datetime,                                ## most current time-stamp that changed (or overwrote) heading to DONE
   'firstpublishTS': datetime,                                ## oldest time-stamp that changed heading to DONE
   'created': datetime,
   'content': [ ['par', u'This is the Org-mode content'],     ## 'par: paragraph containing anything that is not defined like tables, ...
                '\n',    ## change of paragraph
                ['heading', {'level': 3, 'title': u'Another aspect'}],
                ['html-block', 'its name or None', [u'first line', u'second line', u'', u'last line']],
                ['list-itemize', [u'first line', u'second line']],
                ['cust_link_image', u'2017-03-11T18.29.20 Stars.jpg', {u'width': u'300', u'alt': u'Stars in a Tree', u'align': u'right'}]
              ]                                                    #FIXXME: further elements
} ]

Thus:

blog_data[0].keys()
## ... results in:
# ['title',
#  'latestupdateTS',
#  'firstpublishTS',
#  'created',
#  'usertags',
#  'content',
#  'finished-timestamp-history',
#  'level',
#  'id']

blog_data[0]['content']  ## -> list of elements of content
# [['text', u'This is the Org-mode content'],
#  ['heading', {'level': 3, 'title': u'Another aspect'}],
#  ['list-itemize', [u'first line', u'second line']],
#  ['table', u'FIXXME: followed by this table data'],
#  ['image', u'FIXXME: followed by this image']]

Internal format of meta-data

Example:

>>> metadata
{u'2013-08-22-testid': {'title': u"This is the title", 'latestupdateTS': datetime.datetime(2013, 8, 22, 21, 6), 'firstpublishTS': datetime.datetime(2013, 8, 22, 21, 6), 'checksum': 'b757f8478bffd6c70a474f213d6520de', 'created': datetime.datetime(2013, 8, 22, 21, 6)},
 u'2013-02-12-lazyblorg-example-entry': {'latestupdateTS': datetime.datetime(2013, 2, 14, 19, 2), 'checksum': '24af2246a5121e829a0dbbd6e2425c15', 'created': datetime.datetime(2013, 2, 12, 10, 58)}}

Keys of the dict: IDs of the entries:

>>> metadata.keys()
[u'2013-08-22-testid', u'2013-02-12-lazyblorg-example-entry']

One entry with key=ID holds a dict with following entries:

‘title’: string containing the title of the blog entry
‘latestupdateTS’: datetime.datetime(2013, 8, 22, 21, 6)
- most recent time-stamp from the LOGBOOK drawer which marked going to a final state
‘checksum’: ‘b757f8478bffd6c70a474f213d6520de’
- md5 check-sum of: [title, tags, finished_timestamp_history, content]
‘created’: datetime.datetime(2013, 8, 22, 21, 6)
- datetime object of the CREATED property from the PROPERTY drawer
- [ ] FIXXME: why not the first CLOSED time-stamp?

Time-stamps

Example:

CLOSED: [2014-01-31 Fri 14:02]
:LOGBOOK:
- State "DONE"       from "DONE"       [2014-02-01 Sat 18:42]
- State "DONE"       from ""           [2014-01-30 Thu 14:02]
:END:
:PROPERTIES:
:CREATED:  [2014-01-28 Tue 14:02]
:ID: 2014-01-27-lb-tests
:END:

What happens with the various time-stamps?

most recent LOGBOOK entry of setting to DONE:
- added to entry[‘finished-timestamp-history’] (which is a list)
- overwrites entry[‘latestupdateTS’] if is newer than the old one
  - entry[‘latestupdateTS’] is the most recent LOGBOOK entry of setting to DONE
- overwrites entry[‘firstpublishTS’] if is older than the old one
CREATED:
- entry[‘created’]
CLOSED:
- ignored
ID-timestamp:
- ignored

After parsing entry from above:

entry[‘created’] = [2014-01-28 Tue 14:02]
entry[‘latestupdateTS’] = [2014-02-01 Sat 18:42]
- note that entry['timestamp'] was renamed to entry['latestupdateTS'] on 2017-02-12
entry[‘firstpublishTS’] = [2014-01-30 Thu 14:02]
- Oldest entry of entry[‘finished-timestamp-history’] is the publication time-stamp!
entry[‘finished-timestamp-history’] = [2014-02-01 Sat 18:42] and [2014-01-30 Thu 14:02]

entries_timeline_by_published

The dict format is:

dict with year (int) as key, value = list of 12 MONTH
MONTH: list of 28-31 DAY
DAY: list of 0 to many entry-IDs

for year in sorted(entries_timeline_by_published.keys()):
    for month in enumerate(entries_timeline_by_published[year], start=0):
        # month = tuple(index, list of days)
        for day in enumerate(month[1], start=0):
            # day = tuple(index, list of IDs)
            for blogentry in day[1]:
                print str(year) + '-' + str(month[0]) + '-' + str(day[0]) + " has entry: " + str(blogentry)

see Utils.__add_entry_to_entries_timeline_by_published(…) how it is populated

see utils_test.py > test_entries_timeline_by_published_functions(…) how it’s tested

Home

Using lazyblorg:

Coding:

Provide feedback

Saved searches