Skip to content
This repository has been archived by the owner on Sep 28, 2023. It is now read-only.

takeTill performs exponential backtracking in environments #122

Open
isovector opened this issue Nov 19, 2018 · 5 comments
Open

takeTill performs exponential backtracking in environments #122

isovector opened this issue Nov 19, 2018 · 5 comments

Comments

@isovector
Copy link

isovector commented Nov 19, 2018

parseLaTeX performs exponential amounts of work in the number of environments. I have two files, book2.txt and book3.txt which are roughly the same size in bytes, but the latter has ~3x as many environments. book2.txt parses quickly, but I haven't been patient enough to get book3.txt to.

The shiatsu program below is doing nothing more than reading the file and calling parseLaTeX.

➜  /tmp/ebook$ wc book2.txt
  6347  33541 245180 book2.txt

➜  /tmp/ebook$ cat book2.txt | grep "begin" | wc -l
232

➜  /tmp/ebook$ time shiatsu +RTS -p -RTS book2.txt - &> /dev/null
shiatsu +RTS -p -RTS book2.txt - &> /dev/null  2.23s user 0.10s system 99% cpu 2.343 total
➜  /tmp/ebook$ wc book3.txt                                             
  8385  41123 273877 book3.txt

➜  /tmp/ebook$ cat book3.txt | grep "begin" | wc -l
583

➜  /tmp/ebook$ time shiatsu +RTS -p -RTS book3.txt - &> /dev/null
^C
shiatsu +RTS -p -RTS book3.txt - &> /dev/null  43.29s user 0.33s system 99% cpu 43.780 total

Here's the profile information


COST CENTRE      MODULE                 SRC                                         %time %alloc

takeTill         Text.LaTeX.Base.Parser Text/LaTeX/Base/Parser.hs:354:1-48           74.5   76.6
peekChar         Text.LaTeX.Base.Parser Text/LaTeX/Base/Parser.hs:348:1-62            5.2    4.2
cmdArg           Text.LaTeX.Base.Parser Text/LaTeX/Base/Parser.hs:(222,1)-(232,38)    4.9    4.7
dolMath          Text.LaTeX.Base.Parser Text/LaTeX/Base/Parser.hs:(304,1)-(312,7)     2.2    2.7
envBody.endenv   Text.LaTeX.Base.Parser Text/LaTeX/Base/Parser.hs:189:9-75            2.0    2.0
latexBlockParser Text.LaTeX.Base.Parser Text/LaTeX/Base/Parser.hs:(116,1)-(123,5)     1.7    1.0
isSpecial        Text.LaTeX.Base.Parser Text/LaTeX/Base/Parser.hs:335:1-29            1.5    0.0
cmdArgs          Text.LaTeX.Base.Parser Text/LaTeX/Base/Parser.hs:(217,1)-(219,30)    1.4    1.4
envName          Text.LaTeX.Base.Parser Text/LaTeX/Base/Parser.hs:(180,1)-(185,21)    1.2    1.5
text2            Text.LaTeX.Base.Parser Text/LaTeX/Base/Parser.hs:(144,1)-(147,35)    0.8    1.0
env              Text.LaTeX.Base.Parser Text/LaTeX/Base/Parser.hs:(163,1)-(177,22)    0.8    1.1
comment          Text.LaTeX.Base.Parser Text/LaTeX/Base/Parser.hs:(323,1)-(328,23)    0.7    1.1

                                                                                                                                   individual      inherited
COST CENTRE                          MODULE                  SRC                                               no.      entries  %time %alloc   %time %alloc

  parseLaTeX                         Text.LaTeX.Base.Parser  Text/LaTeX/Base/Parser.hs:96:1-45                 2793          0    0.0    0.0   100.0  100.0
   parseLaTeXWith                    Text.LaTeX.Base.Parser  Text/LaTeX/Base/Parser.hs:(99,1)-(101,63)         2794          1    0.0    0.0   100.0  100.0
    latexParser                      Text.LaTeX.Base.Parser  Text/LaTeX/Base/Parser.hs:112:1-57                2796          0    0.0    0.0   100.0  100.0
     latexBlockParser                Text.LaTeX.Base.Parser  Text/LaTeX/Base/Parser.hs:(116,1)-(123,5)         2799          0    1.6    1.0   100.0  100.0
      text                           Text.LaTeX.Base.Parser  Text/LaTeX/Base/Parser.hs:(131,1)-(138,71)        2801          0    0.2    0.2    76.5   75.5
       peekChar                      Text.LaTeX.Base.Parser  Text/LaTeX/Base/Parser.hs:348:1-62                2803          0    3.4    2.5     3.4    2.5
       takeTill                      Text.LaTeX.Base.Parser  Text/LaTeX/Base/Parser.hs:354:1-48                2808          0   67.3   66.8    72.9   72.8
        verbatimEnvironments         Text.LaTeX.Base.Parser  Text/LaTeX/Base/Parser.hs:72:5-24                 3124      14881    0.0    0.0     0.0    0.0
@isovector
Copy link
Author

isovector commented Nov 19, 2018

Actually, I think this might be mostly my fault. Here's one of the environments I was trying to expand:

\begin{code}
main :: IO ()
main = do
  bool <- read
  withSomeSBool (toSBool bool) $  !\annotate{1}!
    \(_ :: SBool b) ->  !\annotate{2}!
      runLogging @b program  !\annotate{3}!
\end{code}

which is a \VerbatimEnvironment in latex. I hadn't added it to the list of verbatim environments in hatex---which meant a lot of backtracking was going into this, even though it could never succeed.

@Daniel-Diaz
Copy link
Owner

What is the result if you use the following parser configuration?

ParserConf { verbatimEnvironments = ["verbatim", "code"] }

The problem might be with the \(_ :: SBool b) bit. The parser might think \( is the beginning of a mathematical expression.

@isovector
Copy link
Author

It works as expected with the given ParserConf.

@Daniel-Diaz
Copy link
Owner

Great! I'm glad that worked for you.

Is the example code above enough to break the parser using default configuration?

@isovector
Copy link
Author

The example code above does break the parser as expected. However, in long documents, this broken parser seems to result in the backtracking issue described above---it will keep consuming input outside of the code environment trying to fix the parse.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants