3

I want to convert the timestamp column which contains epoch time into datetime (human readable). from_unixtime is not giving me the correct date and time. Please help.

df = spark.createDataFrame([('1535934855077532656',), ('1535934855077532656',),('1535935539886503614',)], ['timestamp',])

df.show()
+-------------------+
|          timestamp|
+-------------------+
|1535934855077532656|
|1535934855077532656|
|1535935539886503614|
+-------------------+
df.withColumn('datetime',from_unixtime(df.timestamp,"yyyy-MM-dd HH:mm:ss:SSS")).select(['timestamp','datetime']).show(15,False)
+-------------------+----------------------------+
|timestamp          |datetime                    |
+-------------------+----------------------------+
|1535934855077532656|153853867-12-24 10:24:31:872|
|1535934855077532656|153853867-12-24 10:24:31:872|
|1535935539886503614|153875568-09-17 05:33:49:872|
+-------------------+----------------------------+

1 Answer 1

9

from_unix_time

Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format.

Your data is clearly not expressed with seconds. Maybe nanoseconds?

 from pyspark.sql.functions import col, from_unixtime


df.withColumn(
    'datetime',
   from_unixtime(df.timestamp / 1000 ** 3,"yyyy-MM-dd HH:mm:ss:SSS")
).show(truncate=False)

# +-------------------+-----------------------+
# |timestamp          |datetime               |
# +-------------------+-----------------------+
# |1535934855077532656|2018-09-03 02:34:15:000|
# |1535934855077532656|2018-09-03 02:34:15:000|
# |1535935539886503614|2018-09-03 02:45:39:000|
# +-------------------+-----------------------+
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for answering. Sorry, i am new to Pyspark. Yes, my timestamp is in nanosecond. Can we get the nano second precision using from_unixtime ? Secondly, how do i get the time in a specific timezone. e.g JST (Japan Time). Thank you
@Sun You can get at most millisecond output with date_format((df.timestamp / 1000.0 ** 3).cast("timestamp"), "yyyy-MM-dd HH:mm:ss:SSS") and TZ is determined through configuration (Spark Strutured Streaming automatically converts timestamp to local time).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.