Newest 'orc' Questions

Advice

5 votes

1 replies

144 views

Parquet VS ORC In Iceberg

Hi I have been interested lately in learning iceberg. There is something was not able to get so I thought I would ask here. I really wanna know why is Apache parquet the native file format used when ...

katz daniel

13

asked Nov 24, 2025 at 15:00

0 votes

1 answer

58 views

I'm writing repeated string values to a string column in an ORC file using Java and while reading the ORC file back, encounter a NullPointerException

When I am trying to write same value for each row for string column in orc file, only first row is returning the written value, while reading remaining rows, facing null pointer issue. In some cases, ...

user1885418

41

asked Apr 8, 2025 at 15:08

0 votes

1 answer

168 views

Apache ORC buffer size too small

I face the attached problem when reading an orc file: Is it possible to change this buffer size of 65536 to the needed one of 1817279? Which configuration values do I have to adapt in order to set ...

Ruben Hartenstein

1

asked Feb 3, 2025 at 16:21

1 vote

1 answer

152 views

How to use python to create ORC file compressed with ZLIB compression level 9?

I want to create an ORC file compressed with ZLIB compression level 9. Thing is, when using pyarrow.orc, I can only choose between "Speed" and "Compression" mode, and can't control ...

Y.S

1,862

asked Dec 23, 2024 at 10:57

0 votes

0 answers

172 views

Apache Beam code to write output in ORC format

I am new to apache beam, and i have a use case where I need to write a java streaming code to read from a KafkaTopic (from which i extract some CustomObject.class) and output the entries to hdfs in ...

vamsi

325

asked Feb 5, 2024 at 4:46

1 vote

1 answer

2k views

I get a "Fatal Python error: Aborted" and no explanatory error message I can work with when I try to open a simple .orc file with pyarrow

I am using: Win 10 Pro Intel(R) Xeon(R) W-1250 CPU @ 3.30GHz / 16 GB RAM Anaconda Navigator 2.5.0, Python 3.10.13 in venv pyarrow 11.0.0 pandas 2.1.1 Running scripts in Spyder IDE 5.4.3 I want to open/...

Esat Becco

51

asked Jan 9, 2024 at 14:48

0 votes

1 answer

229 views

Read ORC files from AWS S3 bucket in Flink app

We are using Flink version of 1.13.5 and trying to read the ORC files from AWS S3 location. And, we are deploying our application in a self-managed flink cluster. Please find the below code for ...

nirmal

107

asked Nov 10, 2023 at 8:54

0 votes

1 answer

203 views

binary format that allows to store multiple pandas dataframes with different columns, width, rows

I have like 200 pandas dataframe, and every dataframe has some unique column, or maybe completely different columns. example: df1 = pd.DataFrame({ 'Product': ['Apple', 'Banana', 'Orange', 'Mango'],...

Abdulrahman Sheikho

73

asked Nov 4, 2023 at 6:58

0 votes

0 answers

911 views

Detection and Cleaning of Strike-out Texts on Handwriting

I have images where the text is strike-out and replace by next words. Sometimes it's just one line that gets struck out. Other times, multiple lines are. I expected output should be like this. remove ...

Do Chi Bao

31

asked Oct 16, 2023 at 3:54

0 votes

0 answers

79 views

In hadoop, why does the parquet format occupy higher memory than the original txt when I test?

I am testing the impact of different data formats on hive query efficiency(win10,only my desktop). The original data is 400 txt files of almost the same size (total memory 169MB). I first converted to ...

fei yang

3

asked Sep 24, 2023 at 16:45

0 votes

0 answers

102 views

Issue downloading/parsing ORC File from S3, or from Local Path

I have an application deployed that is supposed to parse/download an ORC File from an S3 bucket. I have tried multiple things, one of them being, downloading the File locally in the app, and try to ...

FluffyGus

11

asked Sep 4, 2023 at 15:15

0 votes

0 answers

444 views

How can I optimize orc snappy compression in spark?

My orc with snappy compression dataset was 3.3 GB when it was originally constructed via a series of small writes to 128 Kb files. It totals 400 million rows, has one timestamp column, and the rest ...

user19695124

11

asked Aug 23, 2023 at 13:41

1 vote

0 answers

127 views

Pyspark error while writing large dataframe to file

I am trying to write my dataframe df_trans(which has about 10 mill records) to file and want to compare the performance by writing it to parquet vs orc vs csv. df_trans.write.mode('overwrite').parquet(...

OhMoh24

71

asked Jul 20, 2023 at 21:43

0 votes

0 answers

209 views

To read orc file from GCS bucket

To read orc file from a GCS bucket i'm using below code snippet, where i'm creating hadoop configuration and setting required file system attributes to use gcs bucket val hadoopConf = new ...

Nitish N Banakar

149

asked Jun 9, 2023 at 3:58

2 votes

1 answer

427 views

Reading orc does not trigger projection pushdown and predicate push down

I have a fileA in orc with the following format key id_1 id_2 value value_1 .... value_30 If I use the following config: 'spark.sql.orc.filterPushdown' : true And ...

olaf

347

asked Jun 6, 2023 at 6:49

Collectives™ on Stack Overflow

Parquet VS ORC In Iceberg

I'm writing repeated string values to a string column in an ORC file using Java and while reading the ORC file back, encounter a NullPointerException

Apache ORC buffer size too small

How to use python to create ORC file compressed with ZLIB compression level 9?

Apache Beam code to write output in ORC format

I get a "Fatal Python error: Aborted" and no explanatory error message I can work with when I try to open a simple .orc file with pyarrow

Read ORC files from AWS S3 bucket in Flink app

binary format that allows to store multiple pandas dataframes with different columns, width, rows

Detection and Cleaning of Strike-out Texts on Handwriting

In hadoop, why does the parquet format occupy higher memory than the original txt when I test?

Issue downloading/parsing ORC File from S3, or from Local Path

How can I optimize orc snappy compression in spark?

Pyspark error while writing large dataframe to file

To read orc file from GCS bucket

Reading orc does not trigger projection pushdown and predicate push down

Hot Network Questions