-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PgEntity convert null value into empty string #75
Comments
Unfortunately, the actual implementation seems to be a best effort / most efficient solution. Here is the question I left on stack-overflow. |
I've also experimented with the approach of using a pcre as a specific composite literal parser. Best I've got so far is the one I've found here https://regex101.com/library/oU3yV6 The idea was to keep the quotes in the result fields to distinguish between the empty string or a null value. With the following modified private function transformData($data, Projection $projection)
{
preg_match_all('/"(?:[^"]|"")*"|[^,\n]+|(?=,)(?<=,)|^|(?<=,)$/', $data, $values);
$definition = $projection->getFieldNames();
$out_values = [];
$values_count = count($values[0]);
for ($index = 0; $index < $values_count; $index++) {
$value = '' !== $values[0][$index]
? strtr(trim($values[0][$index], '"'), ['""' => '"'])
: null;
$out_values[$definition[$index]] = preg_match(':^{.*}$:', $value)
? stripcslashes($value)
: $value;
}
return $out_values;
} |
This reminds me the dark age of Pomm 1.x with the performance disaster associated with PHP’s |
I agree. This was more of an experiment I wanted to share. So what could be a solution then? It is obvious to me that |
The idea raised by Daniel Vérité is to create a state machine for that case. It’s imho a better idea than using a regular expression (which I did in the previous version of Pomm). |
Ok, I’ll sum up this thread of Stack-Overflow here about this bug. There is a mind gap between PHP and Postgres community. To handle polymorphism properly, Pg does type NULL (no defined value) so This behavior complies with the RFC4180. Even though this RFC is only informational and has been drew to make the mime-type There is (almost) no hope to patch the code of the underlying C function. Choices are now:
Any thoughts ? |
I try a short code to have that without regex. Not perfect, but it is a starting point. Examples at the end. Not tested yet on class Runner
{
const CHAR_SEP = ',';
const CHAR_DBQ = '"';
protected $inCell = false;
protected $cellDbq = false;
protected $endCellDbq = false;
protected $cell = null;
protected $lastIdx = 0;
public function __construct(int $lastIdx)
{
$this->lastIdx = $lastIdx;
}
protected function reinit() {
$this->cellDbq = false;
$this->inCell = false;
$this->endCellDbq = false;
}
protected function isLastChar(int $idx): bool
{
return $idx === $this->lastIdx;
}
public function look(int $idx, string $char)
{
if ($this->inCell) {
// Previous state IN CELL
if ($this->cellDbq) {
if (self::CHAR_DBQ === $char) {
$this->endCellDbq = true;
} elseif ($this->endCellDbq && self::CHAR_SEP === $char) {
$this->reinit();
} else {
$this->cell .= $char;
}
} else {
if (self::CHAR_SEP !== $char) {
$this->cell .= $char;
}
if (self::CHAR_SEP === $char) {
$this->reinit();
}
}
} else {
// Previous state NOT IN CELL
if (self::CHAR_SEP === $char) {
$this->reinit();
} else {
$this->inCell = true;
if (self::CHAR_DBQ === $char) {
$this->cellDbq = true;
$this->cell = '';
} else {
$this->cellDbq = false;
$this->cell .= $char;
}
}
}
}
public function canRelease(): bool
{
return !$this->inCell && !$this->cellDbq && !$this->endCellDbq;
}
public function getCell()
{
$out = $this->cell;
$this->cell = null;
return $out;
}
}
class CompositeValueExtractor
{
protected $encoding;
public function __construct(string $forceEncoding = '')
{
if (!empty($forceEncoding)) {
$this->encoding = $forceEncoding;
} else {
$this->encoding = mb_internal_encoding();
}
}
public function __invoke(string $str): array
{
$out = [];
$len = mb_strlen($str, $this->encoding);
$end = $len - 1;
$runner = new Runner($end);
for ($i = 0; $i <= $end; $i++) {
$char = mb_substr($str, $i, 1);
$runner->look($i, $char);
if ($runner->canRelease()) {
$out[] = $runner->getCell();
}
}
// end of string, so getting last cell content
$out[] = $runner->getCell();
return $out;
}
}
$tests = [
'cas"tordu,"in","",one,,"pika, chu",,"out"',
',"something",',
',',
',"",',
'28c2cf5f-4df9-4e6f-a5d6-a41eaba35285,"test","",'
];
$cve = new CompositeValueExtractor();
foreach ($tests as $test) {
$res = $cve($test);
var_dump($res);
} Results are:
What do you think about that? |
In some cases, when using a composite field type and
PgEntity
converter, a nullable field will be converted to an empty string. This is the case when the nullable field is a string type like varchar, text, uuid, ...The problem is that
PgEntity
usesstr_getcsv
to split the composite row value, essential like thisUnfortunately this will return an empty string, where it should not be.
Depending on the type declared in the
RowStructure
definition, an empty string may be interpreted as null value within the hydration plan (e.g. in case of boolean, numeric, int, ...), but it is not possible to determine the correct value for string types just from the empty string.As
PgComposite
also usesstr_getcsv
I would assume the same behaviour for that converter. But I've not tested that.Any idea how to avoid
str_getcsv
?The text was updated successfully, but these errors were encountered: