Skip to content

Latest commit

 

History

History
377 lines (288 loc) · 17 KB

README.md

File metadata and controls

377 lines (288 loc) · 17 KB

External Dependencies

md2tex utility

  • clang, gcc or any working c compiler.

  • gnu make.

  • glibc, musl, or any c standard library.

drv.ltx driver

  • LuaTeX.

  • LuaTeX-ja and evangelion-jfm.

  • Macro-package float, xurl, listings, graphicx, lua-ul, fontspec, micortype, ragged2e, hyperref and luatexja.

All of these above are contained in a standard full TeXLive (or MacTeX) installation that is released after 2024.

make.py build system

  • Python3 (tested on 3.9.6).

About the md2tex utility

Features

This is a Markdown to (La)TeX parser implemented in C, based on MD4C, with the following features:

  • Compliance: It is compliant to the latest version of CommonMark specification thanks to MD4C. Currently, we supports CommonMark 0.31.

  • Extensions: It supports some widely used extensions, namely: table, strikethrough, underline, and TeX style equations.

  • Lenience: It follows completely the GIGO philosophy (garbage in, garbage out). It sees any sequence of bytes as valid input. It will not throw any error when parsing.

  • Performance: It is very fast, parsing most of our posts in less than 1 ms.

  • Portability: It is tested to run on Mach-O, iOS, and Linux. It should run on Windows, BSDs, and all other POSIX-compliant OSes which has a working C compiler.

  • Encoding: It understands UTF-8 to determine word boundaries, and case-insensitive matching of a link reference label (Unicode case folding). However, it will not translate HTML entities and numeric character references.

Build Instructions

POSIX-compliant OSes

Make sure you have a working C compiler (Clang/GCC/...) and standard library (GLibC/musl/...), together with GNU Make and sed tools.

Then simply run make to generate the executable md2tex.

Windows

You can install Cygwin and follow the same step, or you can use the fragile way.

cc -o md2tex md2tex.c md4c.c -O2 -Wall

MarkDown Spec

Generally, you should conform to the CommonMark spec when writing posts. However, because of the additional extensions, there are some extra rules.

Table

It supports GitHub-style tables.

Note that in the generated (La)TeX manuscript, the align information of the columns are currently ignored, and default to left align. It may be supported in the future.

Strikethrough

It supports ~delete~ (delete).

Underline

Underscore denotes an underline instead of an ordinary emphasis or strong emphasis. Thus, to get italics and bold, use * instead.

TeX-style Equation

TeX like inline math spans ($...$) and display math spans ($$...$$) ate supported. You are not required to escape ordinary dollars signs in most cases.

This is an inline math span: $a + b = c$.

This is a display math span:
$$
  a + b = c.
$$

This is a dollar sign: $, 12$; equivalent to \$, 12\$.

This is a double dollar sign: $$; equivalent to \$\$.

Common knowledge: math spans cannot be nested.
$$foo $bar$ baz$$ is equivalent to \$\$foo $bar$ baz\$\$ which only bar is rendered as equation.

Note: the opening delimiter or closing delimiter cannot be preceded or followed by an alphanumeric character.
x$a + b = c$ or $a + b = c$y will not be rendered as math (all $ rendered as \$)

Appendix

The TeX manuscript generated by this README for reference:

\title{"The c13n System Backend Introduction"}
\author{"rne"}
\date{"Sep 20, 2024"}
\maketitle
\chapter{External Dependencies}
\section{\texttt{md2tex} utility}
\begin{enumerate}
\item \verb!clang!, \verb!gcc! or any working c compiler.
\item \verb!gnu make!.
\item \verb!glibc!, \verb!musl!, or any c standard library.
\end{enumerate}
\section{\texttt{drv.ltx} driver}
\begin{enumerate}
\item LuaTeX.
\item LuaTeX-ja and evangelion-jfm.
\item Macro-package \verb!float!, \verb!xurl!, \verb!listings!, \verb!graphicx!, \verb!lua-ul!, \verb!fontspec!, \verb!micortype!, \verb!ragged2e!, \verb!hyperref! and \verb!luatexja!.
\end{enumerate}
All of these above are contained in a standard \textit{full} TeXLive (or MacTeX) installation that is released after 2024.\par
\section{\texttt{make.py} build system}
\begin{enumerate}
\item Python3 (tested on 3.9.6).
\end{enumerate}
\chapter{About the \texttt{md2tex} utility}
\section{Features}
This is a Markdown to (La)TeX parser implemented in C, based on MD4C, with the following features:\par
\begin{enumerate}
\item \textbf{Compliance}: It is compliant to the latest version of \href{http://spec.commonmark.org/}{CommonMark specification} thanks to MD4C. Currently, we supports CommonMark 0.31.
\item \textbf{Extensions}: It supports some widely used extensions, namely: table, strikethrough, underline, and TeX style equations.
\item \textbf{Lenience}: It follows completely the GIGO philosophy (garbage in, garbage out). It sees any sequence of bytes as valid input. It will not throw any error when parsing.
\item \textbf{Performance}: It is very fast, parsing most of our posts in less than 1 ms.
\item \textbf{Portability}: It is tested to run on Mach-O, iOS, and Linux. It should run on Windows, BSDs, and all other POSIX-compliant OSes which has a working C compiler.
\item \textbf{Encoding}: It understands UTF-8 to determine word boundaries, and case-insensitive matching of a link reference label (Unicode case folding). However, it will \textit{not} translate HTML entities and numeric character references.
\end{enumerate}
\section{Build Instructions}
\subsection{POSIX-compliant OSes}
Make sure you have a working C compiler (Clang/GCC/...) and standard library (GLibC/musl/...), together with GNU Make and sed tools.\par
Then simply run make to generate the executable \verb!md2tex!.\par
\subsection{Windows}
You can install Cygwin and follow the same step, or you can use the fragile way.\par
\begin{lstlisting}[language=sh]
cc -o md2tex md2tex.c md4c.c -O2 -Wall
\end{lstlisting}
\section{MarkDown Spec}
Generally, you should conform to the CommonMark spec when writing posts. However, because of the additional extensions, there are some extra rules.\par
\subsection{Table}
It supports \href{https://github.com/github/docs/blob/main/content/get-started/writing-on-github/working-with-advanced-formatting/organizing-information-with-tables.md}{GitHub-style tables}.\par
Note that in the generated (La)TeX manuscript, the align information of the columns are currently ignored, and default to left align. It may be supported in the future.\par
\subsection{Strikethrough}
It supports \verb!~delete~! (\del{delete}).\par
\subsection{Underline}
Underscore denotes an underline instead of an ordinary emphasis or strong emphasis. Thus, to get \textit{italics} and \textbf{bold}, use \verb!*! instead.\par
\subsection{TeX-style Equation}
TeX like inline math spans (\verb!$...$!) and display math spans (\verb!$$...$$!) ate supported. You are not required to escape ordinary dollars signs in most cases.\par
\begin{lstlisting}[language=Markdown]
This is an inline math span: $a + b = c$.

This is a display math span:
$$
  a + b = c.
$$

This is a dollar sign: $, 12$; equivalent to \$, 12\$.

This is a double dollar sign: $$; equivalent to \$\$.

Common knowledge: math spans cannot be nested.
$$foo $bar$ baz$$ is equivalent to \$\$foo $bar$ baz\$\$ which only bar is rendered as equation.

Note: the opening delimiter or closing delimiter cannot be preceded or followed by an alphanumeric character.
x$a + b = c$ or $a + b = c$y will not be rendered as math (all $ rendered as \$)
\end{lstlisting}
\chapter{About the \texttt{drv[pst|mly].ltx} style sheet}
It works only under LuaLaTeX. No plan for porting. The \verb!pst! is for standalone posts while the \verb!mly! is for monthly.\par
\section{Text style}
We use \verb!\underLine! and \verb!\strikeThrough! to replace the LaTeX2e provided buggy \verb!\underline! and the undefined \verb!\del!.\par
\section{Patch}
Because SAX-like parser cannot elegantly capture the text between span delimiters, control sequence as listed below may appear in several circumstances.\par
\begin{lstlisting}[language=TeX]
\href{<url>}{} % from [](<url>)
\label{} % from an image that does not have a label
\end{lstlisting}
Instead of using further dark magic in the parser (i.e., modify the text callback), we handle this in LaTeX.\par
The control sequence \verb!\href! is a hard one, because it relies on nasty catcode modifications to read in url containing chars that normally need to be escaped in a TeX manuscript, and thus means we cannot simply gobble the url into a parameter and further patch it.
Also because of the complex infrastructure of hyperref, a custom \verb!\href! has been implemented to solve this problem. It should have the exact functionality of \verb!\href!, except when \verb!#2! is empty, we supply one which has value \verb!\url{#1}!.
This command is LuaTeX specific.\par
\begin{lstlisting}[language=TeX]
\def\inner@ifempty#1{\begingroup\toks0={#1}\edef\p@r@m{\the\toks0}%
 \expandafter\endgroup\ifx\p@r@m\empty\expandafter\@firstoftwo\else%
 \expandafter\@secondoftwo\fi}
\def\inner@makeother#1{\catcode`#112\relax}
\def\href{\leavevmode\bgroup\let\do\inner@makeother\dospecials\inner@href}
\begingroup\catcode`[=1\catcode`]=2\catcode`\{=12\catcode`\}=12
 \gdef\inner@href{#1}{#2}[\pdfextension startlink user[/Subtype/Link/A<</Type/Action/S/URI/URI(#1)>>]\inner@ifempty[#2][\url[#1]][#2]\pdfextension endlink \egroup]\endgroup
\end{lstlisting}
Documenting this is unnecessary I think.\par
Without utilizing catcode dark magic, \verb!\label! is a really easy one.\par
\begin{lstlisting}[language=TeX]
\begingroup\catcode`X=3\gdef\expnd@ifempty#1{%
 \ifX\detokenize{#1}X\expandafter\@firstoftwo\else%
 \expandafter\@secondoftwo\fi}\endgroup
\let\furui@label\label
\def\label#1{\expnd@ifempty{#1}\relax\furui@label{#1}}
\end{lstlisting}
\section{Headings}
To make life easier for injecting metadata from \verb!mdx!, also to enable sandboxed pdf generation for monthly, the \verb!drvmly.ltx! redefines some command.\par
\begin{lstlisting}[language=TeX]
\def\mlytitle#1{\title{#1}\author{c13n}\date{\today}\maketitle
                \gdef\title##1{\def\TITLE{##1\rlap{\quad\vtop{\normalsize\hbox{\AUTHOR}\hbox{\DATE}}}}}
                \gdef\author##1{\xdef\AUTHOR{##1}}
                \gdef\date##1{\xdef\DATE{##1}}
                \gdef\maketitle{\part{\TITLE}}}
\end{lstlisting}
\section{Blocks}
We include the \verb!float! package as every float uses \verb![H]!. As graphics are represented using the \verb!\image! control sequence in the parser, we have the following definition.\par
\begin{lstlisting}[language=TeX]
\setkeys{Gin}{width=.75\csname Gin@nat@width\endcsname,keepaspectratio}
\def\image#1{\includegraphics{#1}}
\end{lstlisting}
Another rather simple one is the thematic break \verb!\thematic!.\par
\begin{lstlisting}[language=TeX]
\newcommand{\thematic}{\vspace{2.5ex}\par\noindent%
 \parbox{\textwidth}{\centering{*}\\[-4pt]{*}\enspace{*}\vspace{2ex}}\par}
\end{lstlisting}
\section{CJK}
We currently support 中文 using \verb!luatexja!. To rebuild all posts, you only need a \textit{complete} TeXLive (or MacTeX) installation as the appropriate font is distributed with this repo.\par
Evangelion-JFM is used as the font metric.\par
\section{LuaTeX}
To silence the engine when batch compiling, some callback is modified:\par
\begin{lstlisting}[language=TeX]
\directlua{
 function be_quiet () end
 luatexbase.add_to_callback('start_run', be_quiet, 'stop start run')
 luatexbase.add_to_callback('stop_run', be_quiet, 'stop stop run')
 luatexbase.add_to_callback('start_page_number', be_quiet, 'stop start page')
 luatexbase.add_to_callback('stop_page_number', be_quiet, 'stop stop page')
 luatexbase.add_to_callback('start_file', be_quiet, 'stop start file')
 luatexbase.add_to_callback('stop_file', be_quiet, 'stop stop file')
 luatexbase.add_to_callback('show_warning_message', be_quiet, 'stop show warning message')
}
\end{lstlisting}
\section{LaTeX}
To the same reason, LaTeX (especially \verb!listings!) is asked to keep silent:\par
\begin{lstlisting}[language=TeX]
\RequirePackage[immediate]{silence}
\WarningsOff
\ErrorsOff[Listings]
\end{lstlisting}
\chapter{The build system \texttt{make.py}}
It's written in Python, and thus should be portable.\par
The only thing that should be mentioned about it is that please do not use \verb!webp! as image format.\par
Use \verb!python make.py post! to make posts (existing posts will not trigger rebuild) and \verb!python make.py batch! to make all the monthly.\par
For more details, see the comments. This is the only file which is commented carefully.\par

About the drv[pst|mly].ltx style sheet

It works only under LuaLaTeX. No plan for porting. The pst is for standalone posts while the mly is for monthly.

Text style

We use \underLine and \strikeThrough to replace the LaTeX2e provided buggy \underline and the undefined \del.

Patch

Because SAX-like parser cannot elegantly capture the text between span delimiters, control sequence as listed below may appear in several circumstances.

\href{<url>}{} % from [](<url>)
\label{} % from an image that does not have a label

Instead of using further dark magic in the parser (i.e., modify the text callback), we handle this in LaTeX.

The control sequence \href is a hard one, because it relies on nasty catcode modifications to read in url containing chars that normally need to be escaped in a TeX manuscript, and thus means we cannot simply gobble the url into a parameter and further patch it. Also because of the complex infrastructure of hyperref, a custom \href has been implemented to solve this problem. It should have the exact functionality of \href, except when #2 is empty, we supply one which has value \url{#1}. This command is LuaTeX specific.

\def\inner@ifempty#1{\begingroup\toks0={#1}\edef\p@r@m{\the\toks0}%
 \expandafter\endgroup\ifx\p@r@m\empty\expandafter\@firstoftwo\else%
 \expandafter\@secondoftwo\fi}
\def\inner@makeother#1{\catcode`#112\relax}
\def\href{\leavevmode\bgroup\let\do\inner@makeother\dospecials\inner@href}
\begingroup\catcode`[=1\catcode`]=2\catcode`\{=12\catcode`\}=12
 \gdef\inner@href{#1}{#2}[\pdfextension startlink user[/Subtype/Link/A<</Type/Action/S/URI/URI(#1)>>]\inner@ifempty[#2][\url[#1]][#2]\pdfextension endlink \egroup]\endgroup

Documenting this is unnecessary I think.

Without utilizing catcode dark magic, \label is a really easy one.

\begingroup\catcode`X=3\gdef\expnd@ifempty#1{%
 \ifX\detokenize{#1}X\expandafter\@firstoftwo\else%
 \expandafter\@secondoftwo\fi}\endgroup
\let\furui@label\label
\def\label#1{\expnd@ifempty{#1}\relax\furui@label{#1}}

Headings

To make life easier for injecting metadata from mdx, also to enable sandboxed pdf generation for monthly, the drvmly.ltx redefines some command.

\def\mlytitle#1{\title{#1}\author{c13n}\date{\today}\maketitle
                \gdef\title##1{\def\TITLE{##1\rlap{\quad\vtop{\normalsize\hbox{\AUTHOR}\hbox{\DATE}}}}}
                \gdef\author##1{\xdef\AUTHOR{##1}}
                \gdef\date##1{\xdef\DATE{##1}}
                \gdef\maketitle{\part{\TITLE}}}

Blocks

We include the float package as every float uses [H]. As graphics are represented using the \image control sequence in the parser, we have the following definition.

\setkeys{Gin}{width=.75\csname Gin@nat@width\endcsname,keepaspectratio}
\def\image#1{\includegraphics{#1}}

Another rather simple one is the thematic break \thematic.

\newcommand{\thematic}{\vspace{2.5ex}\par\noindent%
 \parbox{\textwidth}{\centering{*}\\[-4pt]{*}\enspace{*}\vspace{2ex}}\par}

CJK

We currently support 中文 using luatexja. To rebuild all posts, you only need a complete TeXLive (or MacTeX) installation as the appropriate font is distributed with this repo.

Evangelion-JFM is used as the font metric.

LuaTeX

To silence the engine when batch compiling, some callback is modified:

\directlua{
 function be_quiet () end
 luatexbase.add_to_callback('start_run', be_quiet, 'stop start run')
 luatexbase.add_to_callback('stop_run', be_quiet, 'stop stop run')
 luatexbase.add_to_callback('start_page_number', be_quiet, 'stop start page')
 luatexbase.add_to_callback('stop_page_number', be_quiet, 'stop stop page')
 luatexbase.add_to_callback('start_file', be_quiet, 'stop start file')
 luatexbase.add_to_callback('stop_file', be_quiet, 'stop stop file')
 luatexbase.add_to_callback('show_warning_message', be_quiet, 'stop show warning message')
}

LaTeX

To the same reason, LaTeX (especially listings) is asked to keep silent:

\RequirePackage[immediate]{silence}
\WarningsOff
\ErrorsOff[Listings]

The build system make.py

It's written in Python, and thus should be portable.

The only thing that should be mentioned about it is that please do not use webp as image format.

Use python make.py post to make posts (existing posts will not trigger rebuild) and python make.py batch to make all the monthly.

For more details, see the comments. This is the only file which is commented carefully.