Pass parameter from spark to input format

We have files with specific format in HDFS. We want to process data extracted from these files within spark. We have started to write an input format in order to create the RDD. This way we hope will be able to create an RDD from the whole file.

But each processing has to process a small subset of data contained in the file and I know how to extract this subset very efficiently, more than filtering a huge RDD.

How can I pass a query filter in the form of a String from my driver to my input format (the same way hive context does)?

Edit:

My file format is NetCDF which stores huge matrix in a efficient way for a multidimentionnal data, for exemple x,y,z and time. A first approach would be to extract all values from the matrix and produce a RDD line for each value. I'd like my inputformat to extract only a few subset of the matrix (maybe 0.01%) and build a small RDD to work with. The subset could be z = 0 and a small time period. I need to pass the time period to the input format which will retrieve only the values I'm interested in.

I guess Hive context does this when you pass an SQL query to the context. Only values matching the SQL query are present in the RDD, not all lines of the files.

edited Jan 26, 2017 at 15:36

asked Jan 26, 2017 at 13:54

mvera

99612 silver badges24 bronze badges

1

I'm sorry. I don't seem to catch your question quite well. Would you care reviewing it with maybe a concrete case ? Broad question tend to be closed. Thanks

eliasah
– eliasah

2017-01-26 14:29:16 +00:00
Commented Jan 26, 2017 at 14:29
Would indexing seem relevant? stackoverflow.com/a/36113670/983722

Dennis Jaheruddin
– Dennis Jaheruddin

2017-01-26 15:42:22 +00:00
Commented Jan 26, 2017 at 15:42

Add a comment |

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Pass parameter from spark to input format

0

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Linked