Skip to content

Commit

Permalink
problem_format: add C++ templates
Browse files Browse the repository at this point in the history
Many templates have been floating around in the DMOJ community for
validation and input handling in checkers. This commit aims to
consolidate them. It has two main goals:

- Correct. Duh.
- Simple. Other templates that circulate, including the ones I have
  published, are too complex. People naively try and write their own. I
  am sick and tired of reading over incorrect validators.

  These templates forgo some principles of good design (such as
  object-oriented programming) in favour of pure simplicity. They should
  be simple enough that they are understandable by the broader
  community, and are not a black box. Hopefully this also dissuades
  re-writing.
  • Loading branch information
Riolku committed Sep 7, 2024
1 parent 430149b commit efffd68
Show file tree
Hide file tree
Showing 65 changed files with 662 additions and 0 deletions.
21 changes: 21 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
name: build
on: [push, pull_request]
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install clang-format 12
run: |
wget -O clang-format https://github.com/DMOJ/clang-tools-static-binaries/releases/download/master-5ea3d18c/clang-format-12_linux-amd64
chmod a+x ./clang-format
- name: Run clang-format
run: find sample_files/problem_setting \( -name '*.h' -or -name '*.cpp' -or -name '*.c' \) -print0 | xargs -0 ./clang-format --dry-run -Werror --color
cpp_template_tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run C++ template tests
run: |
cd sample_files/problem_setting/test
./run_test.sh
1 change: 1 addition & 0 deletions docs/_sidebar.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
- [Custom graders](problem_format/custom_graders.md)
- [Generators](problem_format/generator.md)
- [Problem examples](problem_format/problem_examples.md)
- [C++ Problem Setting Templates](problem_format/cpp_psetting_templates.md)

- About
- [License](about/LICENSE.md)
63 changes: 63 additions & 0 deletions docs/problem_format/cpp_psetting_templates.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# C++ Problem Setting Templates - `cpp_psetting_templates`

There are three C++ input-handling templates provided for aiding problem setters. They are as follows:

- [Validator Template](https://github.com/DMOJ/docs/blob/master/sample_files/problem_setting/validator.cpp)
- [Identical Checker/Interactor Template](https://github.com/DMOJ/docs/blob/master/sample_files/problem_setting/identical_checker_interactor.cpp)
- [Standard Checker/Interactor Template](https://github.com/DMOJ/docs/blob/master/sample_files/problem_setting/standard_checker_interactor.cpp)

## Validator

This is a template for validating the input data of problems. It aims to be simple and of course, correct. It contains seven functions. The first three are whitespace functions:

- `void readSpace()` expects a space at the current position in the input, and aborts the program if there is not a space.
- `void readNewLine()` expects a newline at the current position in the input.
- `void readEOF()` expects the input file to end immediately at the current position.

The remaining four are for actual content:

- `std::string readToken(char min_char = 0, char max_char = 127)` returns the next token in the input stream. A token is defined as a whitespace-separated string. If the next character in the input is a whitespace character, this method aborts the program. The optional arguments `min_char` and `max_char` can be used to enforce a range on the characters in the token. For instance, `readToken('a', 'z')` reads a lowercase string of english letters.
- `std::string readLine(char min_char = 0, char max_char = 127)` returns the next line in the input stream. Specifically, it reads until it encounters a `\n`, and discards it (the newline is not part of the returned string). `min_char` and `max_char` are the same as for `readToken`. If `readLine` encounters an EOF, it fails.
- `long long readInt(long long lo, long long hi)` parses the next token as an integer. It aborts on overflow, malformed integers, and if the resultant integer is not in the range [lo, hi], inclusive. Leading zeroes and `-0` are not accepted.
- `long double readFloat(long double lo, long double hi, long double eps = 1e-9)` parses the next token as a float. It aborts on overflow, malformed floats, and if the resultant float is not in the range [lo, hi], inclusive, using the provided epsilon to perform the comparison. Scientific notation and NaNs are not accepted, nor are leading zeroes. `-0` is allowed. Trailing zeroes are also permitted.
- `std::vector<T> readIntArray(size_t N, long long lo, long long hi)` parses the next space-separated N integers into an array, and then reads a final newline. It must be given a template argument, which is the type of the array elements. For example, `readIntArray<int>(5, 1, 10)` reads five space-separated integers into a `std::vector<int>`, where each integer is in the range [1, 10], inclusive.

A small caveat: `readToken` and `readLine` will throw if the string exceeds 10 million characters.

`readFloat()` will likely be of no use for many validators, and can be safely deleted. Similarly, `readIntArray` can be deleted if unneeded.

## Checkers/Interactors

The next pair of templates are for checkers/interactors. The difference is the type of whitespace handling: the identical checker/interactor expects whitespace to match exactly. The standard checker/interactor handles whitespace like the `standard` checker.

The checkers and interactors are designed for the `coci` bridged checker/interactor type. However, updating the codes used and the order of command line parameters to work with other types should not be challenging.

Both files can be used for either checkers/interactors, with the following caveat: interactors MUST close `stdout` BEFORE calling `readEOF()`, so that the user process can terminate in case it _also_ expects an EOF. Checker stdout is used for feedback displayed to the user, and as such `stdout` should not be closed in this case. Validators also do not need to worry about this - only interactors do, and they should only call `readEOF()` once they have finished communicating with the user, to clean up and assert that the user didn't send any trailing data.

The general format of the checkers/interactors are the same as the validator, with a few changes:

- `readSpace(), readNewLine(), readEOF()`: Under the identical checker, these return Presentation Error if the check fails. Under the standard checker, these return WA.
- `readToken()`: Under the identical checker, this returns Presentation Error if the token is empty, and WA if any character is not in range.
- `readLine()`: Under the identical checker, this returns Presentation Error if an EOF is encountered, and WA if any character is not in range. This function cannot be used correctly under the standard checker, and so is not provided in that template.
- `readInt(), readIntArray(), readFloat()`: Returns WA if the token is malformed or out of range.

Additionally, two new functions are provided.

- `exitWA()` unconditionally exits with a WA verdict.
- `assertWA(bool)` takes a condition and exits with WA if the condition is false.

Under the identical checker, corresponding functions `exitPE()` and `assertPE` are provided. Standard checkers should not use the Presentation Error code, as the builtin `standard` checker does not use this code.

Finally, there is an empty function `errorHook()`. This function is called whenever the provided functions would exit with an error. It should be used to do custom handling, such as providing partial points for outputting part of an answer, or outputting `-1` in interactors to signal errors to the user submission.

## Standard Checker/Interactor Design

This section is purely for those interested in the design and inner workings of the standard checker/interactor routines.

The general overview is that `readSpace()` should read non-line whitespace characters, `readNewLine` should read whitespace and expect a line whitespace character, and `readEOF` should read all whitespace and check for EOF. Additionally, any leading whitespace in the input should be trimmed.

There are two major challenges with making a standard checker/interactor design ergonomic:
- Under interactors, it is not acceptable to consume all whitespace in the `readNewLine` method, as the user submission will likely output a single line and then wait for the interactor to send another query. If the interactor naively tried to consume all whitespace, it would block, and the user submission would TLE.
- After reading the end of the input, it's most ergonomic to have the checker read a newline, and then call `readEOF()`, as this is the canonical input format. However, the standard checker allows users to forgo the last newline, and if the `readNewLine()` method expected a newline, we would erroneously return WA.

To solve both of these problems, we employ a lazy whitespace checking scheme. `readSpace()` and `readNewLine()` simply set a flag for `readToken()`. `readToken()` then consumes the whitespace and validates it, before reading the token. Additionally, `readEOF()`, if called after `readNewLine()`, ignores the flag and consumes all whitespace, and then checks for EOF.
129 changes: 129 additions & 0 deletions sample_files/problem_setting/identical_checker_interactor.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
#include <cstdio>
#include <cstdlib>
#include <fstream>
#include <regex.h>
#include <stdexcept>
#include <string>
#include <vector>

namespace regex_helpers {
regex_t compile(const char *pattern) {
regex_t re;
if (regcomp(&re, pattern, REG_EXTENDED | REG_NOSUB) != 0) {
throw std::runtime_error("Pattern failed to compile.");
}
return re;
}
bool match(regex_t re, const std::string &text) {
return regexec(&re, text.c_str(), 0, NULL, 0) == 0;
}
} // namespace regex_helpers

void errorHook();
void exitWA() {
errorHook();
std::exit(1);
}
void exitPE() {
errorHook();
std::exit(2);
}
void assertWA(bool condition) {
if (!condition) {
exitWA();
}
}
void assertPE(bool condition) {
if (!condition) {
exitPE();
}
}
void readSpace() { assertPE(getchar() == ' '); }
void readNewLine() { assertPE(getchar() == '\n'); }
void readEOF() { assertPE(getchar() == EOF); }
std::string readToken(char min_char = 0, char max_char = 127) {
static constexpr size_t MAX_TOKEN_SIZE = 1e7;
std::string token;
int c = getchar();
assertPE(!isspace(c));
while (!isspace(c) && c != EOF) {
assertWA(token.size() < MAX_TOKEN_SIZE);
assertWA(min_char <= c && c <= max_char);
token.push_back(char(c));
c = getchar();
}
ungetc(c, stdin);
return token;
}
std::string readLine(char min_char = 0, char max_char = 127) {
static constexpr size_t MAX_LINE_SIZE = 1e7;
std::string line;
int c = getchar();
while (c != '\n') {
assertPE(c != EOF);
assertWA(line.size() < MAX_LINE_SIZE);
assertWA(min_char <= c && c <= max_char);
line.push_back(char(c));
c = getchar();
}
return line;
}
long long readInt(long long lo, long long hi) {
static regex_t re = regex_helpers::compile("^(0|-?[1-9][0-9]*)$");
std::string token = readToken();
assertWA(regex_helpers::match(re, token));

long long parsedInt;
try {
parsedInt = stoll(token);
} catch (const std::invalid_argument &) {
exitWA();
} catch (const std::out_of_range &) {
exitWA();
}
assertWA(lo <= parsedInt && parsedInt <= hi);
return parsedInt;
}
long double readFloat(long double min, long double max,
long double eps = 1e-9) {
static regex_t re = regex_helpers::compile("^-?(0|[1-9][0-9])(\\.[0-9]+)?$");
std::string token = readToken();
assertWA(regex_helpers::match(re, token));
long double parsedDouble;
try {
parsedDouble = stold(token);
} catch (const std::invalid_argument &) {
exitWA();
} catch (const std::out_of_range &) {
exitWA();
}
assertWA(min - eps <= parsedDouble && parsedDouble <= max + eps);
return parsedDouble;
}
template <typename T>
std::vector<T> readIntArray(size_t N, long long lo, long long hi) {
std::vector<T> arr;
arr.reserve(N);
for (size_t i = 0; i < N; i++) {
arr.push_back(readInt(lo, hi));
if (i != N - 1) {
readSpace();
}
}
readNewLine();
return arr;
}
void errorHook() {}

// If this is a checker:
// int main(int argc, char **argv) {
// std::ifstream judge_input(argv[1]);
// freopen(argv[2], "r", stdin);
// std::ifstream judge_answer(argv[3]);
// }

// If this is an interactor:
// int main(int argc, char **argv) {
// std::ifstream judge_input(argv[1]);
// std::ifstream judge_answer(argv[2]);
// }
171 changes: 171 additions & 0 deletions sample_files/problem_setting/standard_checker_interactor.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
#include <cstdio>
#include <cstdlib>
#include <fstream>
#include <regex.h>
#include <stdexcept>
#include <string>
#include <vector>

void assertWA(bool);

// Implementation of the tricky whitespace logic for standard checkers.
namespace standard_whitespace_detail {
enum WhitespaceFlag { NONE = 0, SPACE = 1, NEWLINE = 2, ALL = 3 };
WhitespaceFlag current_flag = ALL; // At checker start, consume all whitespace.

void pokeFlag(WhitespaceFlag flag) {
if (current_flag != NONE && (current_flag != NEWLINE || flag != ALL)) {
throw std::runtime_error("Never call two whitespace methods in a row, "
"except for readNewLine() followed by readEOF().");
}
current_flag = flag;
}

enum ConsumeResult {
NO_WHITESPACE,
NO_LINES,
LINES,
};
ConsumeResult consumeWhitespace() {
int c = getchar();
ConsumeResult result = NO_WHITESPACE;
while (isspace(c) && c != EOF) {
if (result == NO_WHITESPACE) {
result = NO_LINES;
}
if (c == '\r' || c == '\n') {
result = LINES;
}
c = getchar();
}
ungetc(c, stdin);
current_flag = NONE;
return result;
}

void preReadToken() {
switch (current_flag) {
case NONE:
throw std::runtime_error(
"Must not call readInt (or readToken, or readFloat) twice in a row!");
case SPACE:
assertWA(consumeWhitespace() == NO_LINES);
break;
case NEWLINE:
assertWA(consumeWhitespace() == LINES);
break;
case ALL:
consumeWhitespace();
break;
}
}
} // namespace standard_whitespace_detail

namespace regex_helpers {
regex_t compile(const char *pattern) {
regex_t re;
if (regcomp(&re, pattern, REG_EXTENDED | REG_NOSUB) != 0) {
throw std::runtime_error("Pattern failed to compile.");
}
return re;
}
bool match(regex_t re, const std::string &text) {
return regexec(&re, text.c_str(), 0, NULL, 0) == 0;
}
} // namespace regex_helpers

void errorHook();
void exitWA() {
errorHook();
std::exit(1);
}
void assertWA(bool condition) {
if (!condition) {
exitWA();
}
}
void readSpace() {
standard_whitespace_detail::pokeFlag(standard_whitespace_detail::SPACE);
}
void readNewLine() {
standard_whitespace_detail::pokeFlag(standard_whitespace_detail::NEWLINE);
}
void readEOF() {
standard_whitespace_detail::pokeFlag(standard_whitespace_detail::ALL);
standard_whitespace_detail::consumeWhitespace();
assertWA(getchar() == EOF);
}
std::string readToken(char min_char = 0, char max_char = 127) {
standard_whitespace_detail::preReadToken();
static constexpr size_t MAX_TOKEN_SIZE = 1e7;
std::string token;
int c = getchar();
assertWA(!isspace(c));
while (!isspace(c) && c != EOF) {
assertWA(token.size() < MAX_TOKEN_SIZE);
assertWA(min_char <= c && c <= max_char);
token.push_back(char(c));
c = getchar();
}
ungetc(c, stdin);
return token;
}
long long readInt(long long lo, long long hi) {
static regex_t re = regex_helpers::compile("^(0|-?[1-9][0-9]*)$");
std::string token = readToken();
assertWA(regex_helpers::match(re, token));

long long parsedInt;
try {
parsedInt = stoll(token);
} catch (const std::invalid_argument &) {
exitWA();
} catch (const std::out_of_range &) {
exitWA();
}
assertWA(lo <= parsedInt && parsedInt <= hi);
return parsedInt;
}
long double readFloat(long double min, long double max,
long double eps = 1e-9) {
static regex_t re = regex_helpers::compile("^-?(0|[1-9][0-9])(\\.[0-9]+)?$");
std::string token = readToken();
assertWA(regex_helpers::match(re, token));
long double parsedDouble;
try {
parsedDouble = stold(token);
} catch (const std::invalid_argument &) {
exitWA();
} catch (const std::out_of_range &) {
exitWA();
}
assertWA(min - eps <= parsedDouble && parsedDouble <= max + eps);
return parsedDouble;
}
template <typename T>
std::vector<T> readIntArray(size_t N, long long lo, long long hi) {
std::vector<T> arr;
arr.reserve(N);
for (size_t i = 0; i < N; i++) {
arr.push_back(readInt(lo, hi));
if (i != N - 1) {
readSpace();
}
}
readNewLine();
return arr;
}
void errorHook() {}

// If this is a checker:
// int main(int argc, char **argv) {
// std::ifstream judge_input(argv[1]);
// freopen(argv[2], "r", stdin);
// std::ifstream judge_answer(argv[3]);
// }

// If this is an interactor:
// int main(int argc, char **argv) {
// std::ifstream judge_input(argv[1]);
// std::ifstream judge_answer(argv[2]);
// }
Loading

0 comments on commit efffd68

Please sign in to comment.