I'm in charge of developing an application that sometimes needs to process massive amounts of data from a Greenplum (a PostgreSQL-derived) database. The process involves a Java 8 program running on a server that fetches this data, processes it, and sends the results to another Greenplum database.
I already know that sending data is better done in batches, but what about receiving it? Currently, my program fetches all the data from Database A in one shot, which sometimes causes OutOfMemoryErrors because the dataset can be enormous.
I recently read about database cursors and how they are often presented as a magic solution for fetching large datasets. This seems like it could solve my exact problem.
However, I'm concerned about the trade-offs. I don't have administrative access to the servers—they are legacy systems. Database A is read-only for me, and Database B is read-write, and both have critical resource management needs that I cannot disrupt.
If I start using cursors, what would the impact be on Database A? How is the data actually batched on the server side? For context, I don't believe I have useful indexes on the tables I'm querying. I neither need to sort or alike the fetched data: I just need that all data produced by my query is fully processed and without duplication.
EDIT
Thanks everyone - your answers helped me understand the topic better. Through further research, I discovered that my database library actually supports automatic cursor handling when certain requirements are met. Given the massive dataset sizes, I'm now exploring streams and iterators as potential solutions.