Pyspark - from_unixtime not showing the correct datetime

Question

I want to convert the timestamp column which contains epoch time into datetime (human readable). from_unixtime is not giving me the correct date and time. Please help.

df = spark.createDataFrame([('1535934855077532656',), ('1535934855077532656',),('1535935539886503614',)], ['timestamp',])

df.show()

+-------------------+
|          timestamp|
+-------------------+
|1535934855077532656|
|1535934855077532656|
|1535935539886503614|
+-------------------+

df.withColumn('datetime',from_unixtime(df.timestamp,"yyyy-MM-dd HH:mm:ss:SSS")).select(['timestamp','datetime']).show(15,False)

+-------------------+----------------------------+
|timestamp          |datetime                    |
+-------------------+----------------------------+
|1535934855077532656|153853867-12-24 10:24:31:872|
|1535934855077532656|153853867-12-24 10:24:31:872|
|1535935539886503614|153875568-09-17 05:33:49:872|
+-------------------+----------------------------+

user10722113 · Accepted Answer · 2018-11-29 11:04:22Z

9

from_unix_time

Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format.

Your data is clearly not expressed with seconds. Maybe nanoseconds?

 from pyspark.sql.functions import col, from_unixtime


df.withColumn(
    'datetime',
   from_unixtime(df.timestamp / 1000 ** 3,"yyyy-MM-dd HH:mm:ss:SSS")
).show(truncate=False)

# +-------------------+-----------------------+
# |timestamp          |datetime               |
# +-------------------+-----------------------+
# |1535934855077532656|2018-09-03 02:34:15:000|
# |1535934855077532656|2018-09-03 02:34:15:000|
# |1535935539886503614|2018-09-03 02:45:39:000|
# +-------------------+-----------------------+

answered Nov 29, 2018 at 11:04

user10722113

1061 bronze badge

Sign up to request clarification or add additional context in comments.

2 Comments

Sun Over a year ago

Thank you for answering. Sorry, i am new to Pyspark. Yes, my timestamp is in nanosecond. Can we get the nano second precision using from_unixtime ? Secondly, how do i get the time in a specific timezone. e.g JST (Japan Time). Thank you

zero323 Over a year ago

@Sun You can get at most millisecond output with date_format((df.timestamp / 1000.0 ** 3).cast("timestamp"), "yyyy-MM-dd HH:mm:ss:SSS") and TZ is determined through configuration (Spark Strutured Streaming automatically converts timestamp to local time).

Collectives™ on Stack Overflow

Pyspark - from_unixtime not showing the correct datetime

1 Answer 1

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Linked

Related