Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse-zoneinfo: replace rule parser with simple state machine #172

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
82 changes: 41 additions & 41 deletions parse-zoneinfo/src/line.rs
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,6 @@ use regex::{Captures, Regex};

pub struct LineParser {
rule_line: Regex,
day_field: Regex,
hm_field: Regex,
hms_field: Regex,
zone_line: Regex,
Expand Down Expand Up @@ -151,15 +150,6 @@ impl Default for LineParser {
)
.unwrap(),

day_field: Regex::new(
r##"(?x) ^
( ?P<weekday> \w+ )
( ?P<sign> [<>] = )
( ?P<day> \d+ )
$ "##,
)
.unwrap(),

hm_field: Regex::new(
r##"(?x) ^
( ?P<sign> -? )
Expand Down Expand Up @@ -413,6 +403,44 @@ pub enum DaySpec {
FirstOnOrAfter(Weekday, i8),
}

impl FromStr for DaySpec {
type Err = Error;

fn from_str(input: &str) -> Result<Self, Error> {
// Parse the field as a number if it vaguely resembles one.
if input.chars().all(|c| c.is_ascii_digit()) {
return Ok(DaySpec::Ordinal(input.parse().unwrap()));
}
// Check if it stars with ‘last’, and trim off the first four bytes if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Check if it stars with ‘last’, and trim off the first four bytes if
// Check if it starts with ‘last’, and trim off the first four bytes if

// it does. (Luckily, the file is ASCII, so ‘last’ is four bytes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't care about ASCII with strip_prefix, right? This seems an old comment.

else if let Some(remainder) = input.strip_prefix("last") {
let weekday = remainder.parse()?;
return Ok(DaySpec::Last(weekday));
}

let weekday = match input.get(..3) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, didn't know this method!

zic.c has the following comment for parsing a day column:

	/*
	** Day work.
	** Accept things such as:
	**	1
	**	lastSunday
	**	last-Sunday (undocumented; warn about this)
	**	Sun<=20
	**	Sun>=7
	*/

I think we should support parsing full weekday names like zic like we did with the regex, but maybe skip the last-{weekday} case.

Can you add a test for DaySpec::from_str?

Some(wd) => Weekday::from_str(wd)?,
None => return Err(Error::InvalidDaySpec(input.to_string())),
};

let dir = match input.get(3..5) {
Some(">=") => true,
Some("<=") => false,
_ => return Err(Error::InvalidDaySpec(input.to_string())),
};

let day = match input.get(5..) {
Some(day) => u8::from_str(day).map_err(|_| Error::InvalidDaySpec(input.to_string()))?,
None => return Err(Error::InvalidDaySpec(input.to_string())),
} as i8;

Ok(match dir {
true => DaySpec::FirstOnOrAfter(weekday, day),
false => DaySpec::LastOnOrBefore(weekday, day),
})
}
}

impl Weekday {
fn calculate(year: i64, month: Month, day: i8) -> Weekday {
let m = month as i64;
Expand Down Expand Up @@ -920,34 +948,6 @@ impl LineParser {
}
}

fn parse_dayspec(&self, input: &str) -> Result<DaySpec, Error> {
// Parse the field as a number if it vaguely resembles one.
if input.chars().all(|c| c.is_ascii_digit()) {
Ok(DaySpec::Ordinal(input.parse().unwrap()))
}
// Check if it stars with ‘last’, and trim off the first four bytes if
// it does. (Luckily, the file is ASCII, so ‘last’ is four bytes)
else if let Some(remainder) = input.strip_prefix("last") {
let weekday = remainder.parse()?;
Ok(DaySpec::Last(weekday))
}
// Check if it’s a relative expression with the regex.
else if let Some(caps) = self.day_field.captures(input) {
let weekday = caps.name("weekday").unwrap().as_str().parse().unwrap();
let day = caps.name("day").unwrap().as_str().parse().unwrap();

match caps.name("sign").unwrap().as_str() {
"<=" => Ok(DaySpec::LastOnOrBefore(weekday, day)),
">=" => Ok(DaySpec::FirstOnOrAfter(weekday, day)),
_ => unreachable!("The regex only matches one of those two!"),
}
}
// Otherwise, give up.
else {
Err(Error::InvalidDaySpec(input.to_string()))
}
}

fn parse_rule<'a>(&self, input: &'a str) -> Result<Rule<'a>, Error> {
if let Some(caps) = self.rule_line.captures(input) {
let name = caps.name("name").unwrap().as_str();
Expand All @@ -971,7 +971,7 @@ impl LineParser {
}

let month = caps.name("in").unwrap().as_str().parse()?;
let day = self.parse_dayspec(caps.name("on").unwrap().as_str())?;
let day = DaySpec::from_str(caps.name("on").unwrap().as_str())?;
let time = self.parse_timespec_and_type(caps.name("at").unwrap().as_str())?;
let time_to_add = self.parse_timespec(caps.name("save").unwrap().as_str())?;
let letters = match caps.name("letters").unwrap().as_str() {
Expand Down Expand Up @@ -1027,13 +1027,13 @@ impl LineParser {
(Some(y), Some(m), Some(d), Some(t)) => Some(ChangeTime::UntilTime(
y.as_str().parse()?,
m.as_str().parse()?,
self.parse_dayspec(d.as_str())?,
DaySpec::from_str(d.as_str())?,
self.parse_timespec_and_type(t.as_str())?,
)),
(Some(y), Some(m), Some(d), _) => Some(ChangeTime::UntilDay(
y.as_str().parse()?,
m.as_str().parse()?,
self.parse_dayspec(d.as_str())?,
DaySpec::from_str(d.as_str())?,
)),
(Some(y), Some(m), _, _) => Some(ChangeTime::UntilMonth(
y.as_str().parse()?,
Expand Down