Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate on different timezone returned by the connector #295

Open
utnaf opened this issue Feb 25, 2021 · 9 comments
Open

Investigate on different timezone returned by the connector #295

utnaf opened this issue Feb 25, 2021 · 9 comments

Comments

@utnaf
Copy link
Contributor

utnaf commented Feb 25, 2021

Given this pyspark script

dtString = "2015-06-24T12:50:35"

# init_test runs the query on the db and returns a DataFrame
df = init_test(
    "CREATE (p:Person {datetime: datetime($datetime)})",
    {"datetime": dtString}
)

dt = datetime.datetime(2015, 6, 24, 12, 50, 35)
dtResult = df.select("datetime").collect()[0].datetime

print(dt)
print(dtResult)

assert dt == dtResult

the assertions fails, the two printed dates are:

2015-06-24 12:50:35                                                             
2015-06-24 14:50:35

Investigate if the two hours difference is a spark connector writing/reading issue or if it's some sort of server / client misconfiguration.

/cc @conker84

@utnaf
Copy link
Contributor Author

utnaf commented Feb 25, 2021

This Java test works

String localDateTime = "2007-12-03T10:15:30";
Dataset<Row> df = initTest("CREATE (p:Person {aTime: localdatetime('"+localDateTime+"')})");

Timestamp result = df.select("aTime").collectAsList().get(0).getTimestamp(0);
assertEquals(Timestamp.from(LocalDateTime.parse(localDateTime).toInstant(ZoneOffset.UTC)), result);

Behind the scenes we convert the DateTime to UTC both when reading https://github.com/neo4j-contrib/neo4j-spark-connector/blob/4.0/common/src/main/scala/org/neo4j/spark/util/Neo4jUtil.scala#L110 from Neo4j

and when writing to https://github.com/neo4j-contrib/neo4j-spark-connector/blob/4.0/common/src/main/scala/org/neo4j/spark/util/Neo4jUtil.scala#L159

Specifying timezones on the python test make the test green.

dtString = "2015-06-24T12:50:35+00:00"
df = init_test(
    "CREATE (p:Person {datetime: datetime('"+dtString+"')})")

dt = datetime.datetime(
    2015, 6, 24, 12, 50, 35, 0, datetime.timezone.utc)
dtResult = df.select("datetime").collect()[
    0].datetime.astimezone(datetime.timezone.utc)

print(dt)
print(dtResult)

assert dt == dtResult

I think we should improve the documentation on how to use timezones. But for me the spark connector works as expected.

Any thoughts? @conker84 @moxious

@moxious
Copy link
Contributor

moxious commented Feb 25, 2021

This is a tricky one but based on what you've said, this doesn't seem surprising to me. Whenever you use the cypher function localdatetime() you are going to be subject to timezone settings on whichever server is running Neo4j. You're certainly not guaranteed to get a UTC time.

Now, what would be surprising is if you created two timestamps with the same value in this way, took the explicit step of converting both to UTC and they still disagree. This might be a simple test to put in place but I bet that passes.

I think the confusion arises from expectations (maybe). You do localdatetime in cypher from the spark connector and it feels like you're doing this on spark, but of course you're not. That conversion is a computation happening on another machine (neo4j) so there's no reason to expect them to agree. Even system clocks can be wrong.

If I'm right about this, I'm not sure even what to put in the documentation. Localdatetime() may function exactly as Neo4j documents it. Datetime in pyspark may function exactly as python documents it.

@utnaf
Copy link
Contributor Author

utnaf commented Jun 10, 2021

I think this will be fixed by #358

@moxious moxious added the bug label Jul 13, 2021
@AnhQuanTran
Copy link

AnhQuanTran commented Sep 14, 2021

@utnaf I use neo4j server version 4.1.6 and neo4j browser 4.2.1 and neo4j-connector-apache-spark_2.12-4.0.2_for_spark_3.jar. In above issue, from version 4.0.1 fixed this error. But i still encountered it
I tried:

  • Set time zone on server setup spark to Asia/Ho_Chi_Minh (check with date command)
  • Set time zone on spark sql to Asia/Ho_Chi_Minh (check with spark.conf.get("spark.sql.session.timeZone"))
  • Set time zone on server setup neo4j to Asia/Ho_Chi_Minh (check with date command)
  • Set time zone of neo4j to Asia/Ho_Chi_Minh (check with query return Datetime())
  • Convert datetime col in dataframe to Asia/Ho_Chi_Minh

When i write to neo4j, time zone auto convert to GMT or UTC (2021-08-19T03:02:28.569000000Z)

@AnhQuanTran
Copy link

@utnaf i tried neo4j-connector-apache-spark_2.12-4.0.3-pre_for_spark_3.jar but it still not work

@utnaf
Copy link
Contributor Author

utnaf commented Sep 14, 2021

Hi @AnhQuanTran, can you give me the code you are trying to execute, the error that is happening and your expected result?

@AnhQuanTran
Copy link

AnhQuanTran commented Sep 16, 2021

@utnaf This is my source code
df if a spark dataframe

df.write.format("org.neo4j.spark.DataSource")
.option("url", neo4j_credential['url'])
.option("authentication.type", "basic")
.option("authentication.basic.username", neo4j_credential['username'])
.option("authentication.basic.password", neo4j_credential['password'])
.option("database", neo4j_credential['database'])
.option("labels", "TEST")
.option('batch.size', 20000)
.mode("append")
.save()

Datetime col in df dataframe
image

Datetime attribute on neo4j show 2021-08-19T03:02:28.569000000Z

No error encountered but datetime attribute auto convert to UTC0, i expect it same on dataframe spark with datetime col is 2021-08-19 10:02:28.569

Where im i wrong, thank you

@conker84
Copy link
Contributor

Yes it's expected because we internally convert Spark Timestamp types to UTC datetime

@AnhQuanTran
Copy link

AnhQuanTran commented Sep 17, 2021

@conker84 so how to keep timezone on neo4j same as dataframe spark. Can i change some option or config neo4j-spark connector?

@conker84 conker84 added enhancement and removed bug labels Jan 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants