Skip to content

Latest commit

 

History

History
847 lines (556 loc) · 19.1 KB

paco.md

File metadata and controls

847 lines (556 loc) · 19.1 KB

Documentation

Table of contents

Usage

You can start using Paco combinators and parsers by including or extending Paco module:

# irb
include Paco

string("Paco").parse("Paco") #=> "Paco"
# extend module
module PacoParser
  extend Paco

  class << self
    def parse(io)
      string("Paco").parse(io)
    end
  end
end

PacoParser.parse("Paco") #=> "Paco"
# include in module
module PacoParser
  class << self
    include Paco

    def parse(io)
      string("Paco").parse(io)
    end
  end
end

PacoParser.parse("Paco") #=> "Paco"
# include in class
class PacoParser
  include Paco

  def initialize(str)
    @str = str
  end

  def parse(io)
    string(@str).parse(io)
  end
end

PacoParser.new("Paco").parse("Paco") #=> "Paco"

Paco::Combinators: Main methods

Paco::Combinators.not_followed_by(parser)

Returns a parser that runs the passed parser without consuming the input, and returns null if the passed parser does not match the input. Fails otherwise.

include Paco

example = letters.skip(not_followed_by(string("?"))).skip(remainder)

example.parse("Paco!") #=> "Paco"
example.parse("Paco?") #=> Paco::ParseError

Paco::Combinators.succeed(result)

Returns a parser that doesn't consume any input and always returns result.

include Paco

example = seq(succeed("Paco"), remainder)

example.parse("<3") #=> ["Paco", "<3"]

Paco::Combinators.failed(message)

Returns a parser that doesn't consume any input and always fails with passed message.

include Paco

failed("error") #=> Paco::ParseError

Paco::Combinators.lookahead(parser)

Returns a parser that runs the passed parser without consuming the input, and returns empty string.

include Paco

example = seq(lookahead(string("42")), digits)

example.parse("42") #=> ["", "42"]
example.parse("424") #=> ["", "424"]
example.parse("444") #=> Paco::ParseError

Paco::Combinators.alt(*parsers)

Accepts one or more parsers, and returns a parser that returns the value of the first parser that succeeds, backtracking in between.

include Paco

example = alt(string("true"), string("false"))

example.parse("true") #=> "true"
example.parse("false") #=> "false"
example.parse("null") #=> Paco::ParseError

Paco::Combinators.seq(*parsers, &block)

Accepts one or more parsers, and returns a parser that expects them to match in order, returns an array of all their results.

If block specified, passes results of the parses as an arguments to a block, and at the end returns its result.

include Paco

example = seq(string("pa"), string("co"))

example.parse("paco") #=> ["pa", "co"]
example.parse("Paco") #=> Paco::ParseError

example = seq_map(string("pa"), string("co")) { |x, y| y + x }

example.parse("paco") #=> "copa"
example.parse("Paco") #=> Paco::ParseError

Paco::Combinators.many(parser)

Expects parser zero or more times, and returns an array of the results.

include Paco

example = many(digit)

example.parse("12") #=> ["1", "2"]
example.parse("") #=> []
example.parse("Paco") #=> Paco::ParseError

Paco::Combinators.sep_by(parser, separator)

Returns a parser that expects zero or more matches for parser, separated by the parser separator. Returns an array of parser results.

include Paco

example = sep_by(digits, string(","))

example.parse("1,1,2,3,5,8,13,21") #=> ["1", "1", "2", "3", "5", "8", "13", "21"]
example.parse("1") #=> ["1"]
example.parse("") #=> []
example.parse("Paco") #=> Paco::ParseError

Paco::Combinators.sep_by!(parser, separator)

Returns a parser that expects one or more matches for parser, separated by the parser separator. Returns an array of parser results.

include Paco

example = sep_by!(digits, string(","))

example.parse("1,1,2,3,5,8,13,21") #=> ["1", "1", "2", "3", "5", "8", "13", "21"]
example.parse("1") #=> ["1"]
example.parse("") #=> Paco::ParseError
example.parse("Paco") #=> Paco::ParseError

Paco::Combinators.wrap(before, after, parser)

Expects the parser before before parser and after after `parser. Returns the result of the parser.

include Paco

example = wrap(string("{"), string("}"), letters)

example.parse("{Paco}") #=> "Paco"
example.parse("{Paco") #=> Paco::ParseError

Paco::Combinators.optional(parser)

Returns parser that returns result of the passed parser or nil if parser fails.

include Paco

example = optional(string("Paco"))

example.parse("Paco") #=> "Paco"
example.parse("") #=> nil
example.parse("paco") #=> Paco::ParseError

Paco::Combinators.lazy(desc = "", &block)

Accepts a block that returns a parser, which is evaluated the first time the parser is used. This is useful for referencing parsers that haven't yet been defined, and for implementing recursive parsers.

include Paco

example = lazy { failed("always fails") }

example.parse("Paco") #=> Paco::ParseError

Paco::Combinators.index

Returns parser that returns Paco::Index representing the current offset into the parse without consuming the input. Paco::Index has a 0-based character offset attribute :pos and 1-based :line and :column attributes.

include Paco

example = seq(one_of("123\n ").many.join, index, remainder)

example.parse("1\n2\n3\n\n Paco") #=> ["1\n2\n3\n\n ", #<struct Paco::Index pos=8, line=5, column=2>, "Paco"] 

Paco::Combinators: Text related methods

Paco::Combinators.string(matcher)

Returns a parser that looks for a passed matcher string and returns its value on success.

include Paco

example = string("Paco")

example.parse("Paco") #=> "Paco"
example.parse("paco") #=> Paco::ParseError

Paco::Combinators.satisfy(&block)

Returns a parser that returns a single character if passed block result is truthy.

include Paco

example = satisfy { |ch| ch == ch.downcase }

example.parse("a") #=> "a"
example.parse("P") #=> Paco::ParseError
example.parse("") #=> Paco::ParseError

Paco::Combinators.take_while(&block)

Returns a parser that returns a string containing all the next characters that are truthy for the passed block.

Alias for Paco::Combinators.satisfy(&block).many.join.

include Paco

example = take_while { |ch| ch == ch.downcase }

example.parse("paco!") #=> "paco!"
example.parse("") #=> ""
example.parse("Paco") #=> Paco::ParseError

Paco::Combinators.one_of(matcher)

Returns a parser that looks for exactly one character from passed matcher, and returns its value on success.

include Paco

example = one_of("abc") # or one_of(%w[a b c]) 

example.parse("a") #=> "d"
example.parse("d") #=> Paco::ParseError
example.parse("") #=> Paco::ParseError

Paco::Combinators.none_of(matcher)

Returns a parser that looks for exactly one character NOT from passed matcher, and returns its value on success.

include Paco

example = none_of("abc") # or none_of(%w[a b c]) 

example.parse("d") #=> "d"
example.parse("a") #=> Paco::ParseError
example.parse("") #=> Paco::ParseError

Paco::Combinators.regexp(regexp, group: 0)

Returns a parser that looks for a match to the regexp and returns the entire text matched. The regexp will always match starting at the current parse location. When group is specified, it returns only the text in the specific regexp match group.

include Paco

example = regexp(/[a-z]*/i)

example.parse("Paco") #=> "Paco"
example.parse("") #=> ""
example.parse("42") #=> Paco::ParseError

Paco::Combinators.regexp_char(regexp)

Returns a parser that checks current character against the passed regexp.

include Paco

example = regexp_char(/\d/)

example.parse("4") #=> "4"
example.parse("42") #=> Paco::ParseError
example.parse("P") #=> Paco::ParseError
example.parse("") #=> Paco::ParseError

Paco::Combinators.any_char

Returns a parser that consumes and returns the next character of the input.

include Paco

any_char.parse("P") #=> "P"
any_char.parse("4") #=> "4"
any_char.parse("Paco") #=> Paco::ParseError
any_char.parse("") #=> Paco::ParseError

Paco::Combinators.remainder

Returns a parser that consumes and returns the entire remainder of the input.

include Paco

remainder.parse("") #=> ""
remainder.parse("Paco") #=> "Paco"

Paco::Combinators.eof

Returns a parser that matches end of file and returns nil.

include Paco

eof.parse("") #=> nil
eof.parse("\n") #=> Paco::ParseError

Paco::Combinators.cr

Returns a parser that checks for the "carriage return" (\r) character.

An alias for Paco::Combinators.string("\r")

include Paco

cr.parse("\r") #=> "\r"
cr.parse("\n") #=> Paco::ParseError
cr.parse("") #=> Paco::ParseError

Paco::Combinators.lf

Returns a parser that checks for the "line feed" (\n) character.

An alias for Paco::Combinators.string("\n")

include Paco

lf.parse("\n") #=> "\n"
lf.parse("\r") #=> Paco::ParseError
lf.parse("") #=> Paco::ParseError

Paco::Combinators.crlf

Returns a parser that checks for the "carriage return" character followed by the "line feed" character (\r\n).

An alias for Paco::Combinators.string("\r\n")

include Paco

crlf.parse("\r\n") #=> "\r\n"
crlf.parse("\r") #=> Paco::ParseError
crlf.parse("") #=> Paco::ParseError

Paco::Combinators.newline

Returns a parser that will match any kind of line ending.

An alias for Combinators.alt(Paco::Combinators.crlf, Paco::Combinators.lf, Paco::Combinators.cr).

include Paco

newline.parse("\r\n") #=> "\r\n"
newline.parse("\n") #=> "\n"
newline.parse("") #=> Paco::ParseError

Paco::Combinators.end_of_line

Returns a parser that will match any kind of line ending including end of file.

An alias for Paco::Combinators.alt(Paco::Combinators.newline, Paco::Combinators.eof).

include Paco

end_of_line.parse("") #=> nil
end_of_line.parse("\n") #=> "\n"
end_of_line.parse("P") #=> Paco::ParseError

Paco::Combinators.letter

Alias for Paco::Combinators.regexp_char(/[a-z]/i).

include Paco

letter.parse("p") #=> "P"
letter.parse("Paco") #=> Paco::ParseError
letter.parse("П") #=> Paco::ParseError
letter.parse("") #=> Paco::ParseError
letter.parse("42") #=> Paco::ParseError

Paco::Combinators.letters

Alias for Paco::Combinators.regexp(/[a-z]+/i).

include Paco

letters.parse("Paco") #=> "Paco"
letters.parse("Пако") #=> Paco::ParseError
letters.parse("") #=> Paco::ParseError
letters.parse("42") #=> Paco::ParseError

Paco::Combinators.opt_letters

Alias for Paco::Combinators.regexp(/[a-z]*/i).

include Paco

opt_letters.parse("Paco") #=> "Paco"
opt_letters.parse("") #=> ""
opt_letters.parse("Пако") #=> Paco::ParseError
opt_letters.parse("42") #=> Paco::ParseError

Paco::Combinators.digit

Alias for Paco::Combinators.regexp_char(/[0-9]/).

include Paco

digit.parse("4") #=> "4"
digit.parse("42") #=> Paco::ParseError
digit.parse("") #=> Paco::ParseError
digit.parse("Paco") #=> Paco::ParseError

Paco::Combinators.digits

Alias for Paco::Combinators.regexp(/[0-9]+/).

include Paco

digits.parse("42") #=> "42"
digits.parse("") #=> Paco::ParseError
digits.parse("Paco") #=> Paco::ParseError

Paco::Combinators.opt_digits

Alias for Paco::Combinators.regexp(/[0-9]*/).

include Paco

opt_digits.parse("42") #=> "42"
opt_digits.parse("") #=> ""
opt_digits.parse("Paco") #=> Paco::ParseError

Paco::Combinators.ws

Alias for Paco::Combinators.regexp(/\s+/).

include Paco

ws.parse("  \n  ") #=> "  \n  "
ws.parse("") #=> Paco::ParseError
ws.parse("Paco") #=> Paco::ParseError

Paco::Combinators.opt_ws

Alias for Paco::Combinators.regexp(/\s*/).

include Paco

opt_ws.parse("  \n  ") #=> "  \n  "
opt_ws.parse("") #=> ""
opt_ws.parse("Paco") #=> Paco::ParseError

Paco::Combinators.spaced(parser)

Alias for parser.trim(Paco::Combinators.opt_ws).

include Paco

example = spaced(letters)

example.parse("    Paco    ") #=> "Paco"
example.parse("    Paco") #=> "Paco"
example.parse("   ") #=> Paco::ParseError

Paco::Parser methods

Each combinator returns a Paco::Parser object and all of its methods (except Paco::Parser#parse) returns Paco::Parser object, so we can chain them.

Paco::Parser#parse(input)

Calls parser on the provided string input and returns a parsed result or raises a ParseError exception if parser fails or if input wasn't consumed completely.

include Paco

example = string("Paco")

example.parse("Paco") #=> "Paco"
example.parse("paco") #=> Paco::ParseError
example.parse("Paco!") #=> Paco::ParseError

Paco::Parser#or(other)

Returns a new parser which tries parser, and if it fails uses other.

include Paco

example = string("true").or(string("false"))

example.parse("true") #=> "true"
example.parse("false") #=> "false"
example.parse("null") #=> Paco::ParseError

Paco::Parser#skip(other)

Expects other parser to follow parser, but returns only the value of parser.

include Paco

example = letters.skip(opt_ws)

example.parse("Paco    ") #=> "Paco"
example.parse("Paco") #=> "Paco"
example.parse("   ") #=> Paco::ParseError

Paco::Parser#next(other)

Expects other parser to follow parser, but returns only the value of other parser.

include Paco

example = opt_ws.next(digits)

example.parse("42") #=> "42"
example.parse("   42") #=> "42"
example.parse("   ") #=> Paco::ParseError

Paco::Parser#fmap(other)

Transforms the output of parser with the given block.

include Paco

example = digits.fmap(&:to_i).fmap { |num| num + 1 }

example.parse("9") #=> 10

Paco::Parser#bind(other)

Returns a new parser which tries parser, and on success calls the block with the result of the parse, which is expected to return another parser, which will be called next. This allows you to dynamically decide how to continue the parse.

include Paco

example = letters.bind { |res| ws.next(string(res.reverse)) }

example.parse("redrum murder") #=> "murder"

Here's a more complicated example:

include Paco

char_pairs = {"[" => "]", "(" => ")", "{" => "}", "<" => ">"}

array_of_strings = string("%").next(any_char).bind do |char|
  end_char = char_pairs[char] || char

  many(satisfy { |ch| ch != end_char }.skip(opt_ws)).skip(string(end_char))
end

array_of_strings.parse("%[a b c]") #=> ["a", "b", "c"]
array_of_strings.parse("%(a b c)") #=> ["a", "b", "c"]
array_of_strings.parse("%|a b c|") #=> ["a", "b", "c"]

Paco::Parser#many

Expects parser zero or more times, and returns an array of the results.

include Paco

example = digit.many

example.parse("12") #=> ["1", "2"]
example.parse("") #=> []
example.parse("Paco") #=> Paco::ParseError

Paco::Parser#times(min, max = nil)

Returns a parser that runs parser between min and max times, and returns an array of the results. When max is not specified, max = min.

include Paco

example = digit.times(2, 3)

example.parse("12") #=> ["1", "2"]
example.parse("123") #=> ["1", "2", "3"]
example.parse("1") #=> Paco::ParseError
example.parse("1234") #=> Paco::ParseError

example = digit.times(2)

example.parse("12") #=> ["1", "2"]
example.parse("1") #=> Paco::ParseError
example.parse("123") #=> Paco::ParseError

Paco::Parser#at_least(num)

Returns a parser that runs parser at least num times, and returns an array of the results.

include Paco

example = digit.at_least(2)

example.parse("1234") #=> ["1", "2", "3", "4"]
example.parse("12") #=> ["1", "2"]
example.parse("1") #=> Paco::ParseError

Paco::Parser#at_most(num)

Returns a parser that runs parser at most num times, and returns an array of the results.

include Paco

example = digit.at_most(2)

example.parse("") #=> []
example.parse("12") #=> ["1", "2"]
example.parse("123") #=> Paco::ParseError

Paco::Parser#result(value)

Returns a new parser with the same behavior, but which returns passed value.

include Paco

example = string("true").result(true)

example.parse("true") #=> true
example.parse("false") #=> Paco::ParseError

Paco::Parser#fallback(value)

Returns a new parser which tries parser and, if it fails, returns value without consuming any input.

include Paco

example = digit.fallback("0")

example.parse("4") #=> "4"
example.parse("") #=> "0"

Paco::Parser#trim(other)

Expects other parser before and after parser. Returns the result of the parser.

include Paco

example = letters.trim(opt_ws)

example.parse("    Paco    ") #=> "Paco"
example.parse("    Paco") #=> "Paco"
example.parse("   ") #=> Paco::ParseError

Paco::Parser#wrap(before, after)

Expects the parser before before parser and after after `parser. Returns the result of the parser.

include Paco

example = letters.wrap(string("{"), string("}"))

example.parse("{Paco}") #=> "Paco"
example.parse("{Paco") #=> Paco::ParseError

Paco::Parser#not_followed_by(other)

Returns a parser that runs the passed other parser without consuming the input, and returns result of the parser if the passed one does not match the input. Fails otherwise.

include Paco

example = letters.not_followed_by(string("?")).skip(remainder)

example.parse("Paco!") #=> "Paco"
example.parse("Paco?") #=> Paco::ParseError

Paco::Parser#join(separator = "")

Returns a parser that runs parser and concatenate it results with the separator.

include Paco

many(letter).join(" ").parse("Paco") #=> "P a c o"
many(letter).join.parse("Paco") #=> "Paco"

Debugging

Pass with_callstack: true to the Paco::Parser#parse method to collect a callstack while parsing. To examine the callstack catch the ParseError exception:

begin
  string("Paco").parse("Paco!", with_callstack: true)
rescue Paco::ParseError => e
  pp e.callstack.stack # You will probably want to use `binding.irb` or `binding.pry`
end
#=>
# [
#   {:pos=>0, :status=>:start, :depth=>1, :parser=>"string(\"Paco\").skip(end of file)"},
#   {:pos=>0, :status=>:start, :depth=>2, :parser=>"seq(string(\"Paco\"), end of file)"},
#   {:pos=>0, :status=>:start, :depth=>3, :parser=>"string(\"Paco\")"},
#   {:pos=>4, :status=>:success, :depth=>2, :result=>"Paco", :parser=>"string(\"Paco\")"},
#   {:pos=>4, :status=>:start, :depth=>3, :parser=>"end of file"},
#   {:pos=>4, :status=>:failure, :depth=>2, :parser=>"end of file"}
# ]

Test helpers

Paco provides a special RSpec helper, add require "paco/rspec" to spec_helper.rb to enable #parse matcher:

subject { string("Paco") }

it { is_expected.to parse("Paco") } # just checks if parser succeeds
it { is_expected.to parse("Paco").as("Paco") } # checks if parser result is eq to value passed to `#as`
it { is_expected.to parse("Paco").fully } # checks if parser result is the same as value passed to `#parse`
it { is_expected.not_to parse("paco") } # checks if parser failed