An experimental parser for liblouis tables based on instaparse. Given a liblouis table the parser returns plain old Clojure data. This can be used for example to rewrite the table into any other format.
(require '[instaparse.core :as insta])
(def louis-parser (insta/parser (clojure.java.io/resource "louis.bnf")))
(louis-parser (slurp "~/src/liblouis/tables/ar-ar-comp8.utb"))
which will return
[:table
[:comment "afr#1#Afrikaans Uncontracted#za#Afrikaans onverkort"]
,,,
[:comment " <http://www.gnu.org/licenses/>."]
[:include "en-ueb-g1.ctb"]]
find ~/src/liblouis/tables -type f -print | grep -v -e '\.dic' -e 'Makefile' -e 'maketablelist.sh' -e 'README' | sort | xargs lein run
As it stands the parser can parse probably around 95% of the tables in the liblouis distribution. At the moment it has no support for
- continuation lines (da-dk-g16-lit.ctb, da-dk-g26-lit.ctb, da-dk-g26l-lit.ctb)
- huge tables cause a OutOfMemoryError (GC overhead limit exceeded) (zh-chn.ctb, zhcn-g1.ctb, zhcn-g2.ctb, ko-chars.cti, zh-tw.ctb, etc)
A lot of the EBNF grammar was basically re-used from louis-parser and its liblouis table grammar definition in the form of Parsing expression grammar.
Copyright © 2021 Swiss Library for the Blind, Visually Impaired and Print Disabled
This program and the accompanying materials are made available under the terms of the Eclipse Public License 2.0 which is available at http://www.eclipse.org/legal/epl-2.0.
This Source Code may also be made available under the following Secondary Licenses when the conditions for such availability set forth in the Eclipse Public License, v. 2.0 are satisfied: GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) any later version, with the GNU Classpath Exception which is available at https://www.gnu.org/software/classpath/license.html.