0

I have a table in SQL server. I am trying to append a spark dataframe from databricks to the SQL table. I am able to append the new rows but the problem is that it appends randomly to the SQL table. I want to append the new rows to the end of the SQL table. How do I do that?

SQL table

╔═════════════════════════════╗
║   Name      a   b   c   d   ║
╠═════════════════════════════╣
║1  Lisa      5   4   1   3   ║
║2  Timmy     3   2   7   2   ║  
║3  Rann      2   3   1   5   ║
║4  Bob       1   6   3   4   ║
╚═════════════════════════════╝

Spark dataframe

╔═════════════════════════════╗
║   Name      a   b   c   d   ║
╠═════════════════════════════╣
║1  Mark      10  3   2   4   ║
║2  Rudy      5   6   7   8   ║  
╚═════════════════════════════╝

What I want

╔═════════════════════════════╗
║   Name      a   b   c   d   ║
╠═════════════════════════════╣
║1  Lisa      5   4   1   3   ║
║2  Timmy     3   2   7   2   ║  
║3  Rann      2   3   1   5   ║
║4  Bob       1   6   3   4   ║
║5  Mark      10  3   2   4   ║
║6  Rudy      5   6   7   8   ║
╚═════════════════════════════╝

When I append using this code

df.write.mode("append").jdbc(url=jdbcurl, table="employee") 

I get this result in the SQL server. Mark and Rudy should be at the bottom but it appends randomly. Is there a parameter that I am suppose to add?

╔═════════════════════════════╗
║   Name      a   b   c   d   ║
╠═════════════════════════════╣
║1  Lisa      5   4   1   3   ║
║2  Rudy      5   6   7   8   ║
║3  Timmy     3   2   7   2   ║
║4  Mark      10  3   2   4   ║
║5  Rann      2   3   1   5   ║
║6  Bob       1   6   3   4   ║
╚═════════════════════════════╝
1
  • You can have a column within your dataframe that is integer and represent a primary key in sql table then you can see it in order when you do select. This is just a hack. Commented Jan 14, 2021 at 14:14

1 Answer 1

2

Tables in SQL are not ordered. It's a set. Put in other words, you can feel free to believe they are being added at the end, but there's no guarantee that they will come back out in any order based on insertion order.

If you want to get data out in a specific order, you should run a query with an ORDER BY on it it based on the order you wish to see them. That could be a column represented the time of insertion if you'd like.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.