0

I have an application deployed that is supposed to parse/download an ORC File from an S3 bucket.

I have tried multiple things, one of them being, downloading the File locally in the app, and try to create an OrcReader object using the createReader method form Hadoop, using the Hadoop.fs.Path, passing in as an argument the path to the app local file. But every time I'm getting:

- Unknown error occurred
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.LocalFileSystem not found

My code is:

final GetObjectRequest objectRequest = GetObjectRequest.builder()
                                                           .bucket(s3Bucket)
                                                           .key(fullPath)
                                                           .build();
    try (final ResponseInputStream<GetObjectResponse> responseInputStream = s3Client.getObject(objectRequest);
        final FileOutputStream fileOutputStream = new FileOutputStream(downloadPath)) {

      IOUtils.copyLarge(responseInputStream, fileOutputStream);

      Configuration conf = new Configuration();
      conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
      conf.set("fs.https.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName());
      conf.set("fs.https.impl", org.apache.hadoop.fs.http.HttpsFileSystem.class.getName());

      return createReader(new Path(downloadPath.toString()), readerOptions(conf));

But I am still getting the error. This would've been way easier with a CSV, and using BufferedReader but unfortunately that is not the case. Also I don't want to read every line from S3 and copy the contents of the file to a temporary file as this will affect the performance of the application

I do have the orc dependency in my pom, as well as the hadoop-common one.

Any kind of help would be greatly appreciated. Thanks!

2
  • It looks like it has very little to do with what your code does, but rather how it's packaged and run. You are missing hadoop classes in classpath - how do you run the code? Commented Sep 4, 2023 at 18:58
  • A scheduled cronjob runs the above code. What other Hadoop libraries do you suggest I should have? Commented Sep 4, 2023 at 20:38

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.