The Wayback Machine - https://web.archive.org/web/20200915091405/https://github.com/delta-io/delta/issues/314
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation error and recommend to add a new section under Storage/S3 for MinIO #314

Open
ramkumarkb opened this issue Feb 5, 2020 · 7 comments

Comments

@ramkumarkb
Copy link

@ramkumarkb ramkumarkb commented Feb 5, 2020

I have noticed a small error in the documentation around S3 configurations:
https://docs.delta.io/latest/delta-storage.html#amazon-s3

On the read part, it should be load and not save:
spark.read.format("delta").load("s3a://<your-s3-bucket>/<path>/<to>/<delta-table>")

Also, I have successfully tested Delta 0.5.0 with on-premise S3 - https://min.io
There were some quirks around the S3 region settings (by default Hadoop S3 lacks specific region setting API, instead it gets interpreted thru spark.hadoop.fs.s3a.endpoint:

./spark-shell  \
>  --master spark://<spark-master>:7077 \
>  --conf spark.hadoop.fs.s3a.endpoint= http://s3.<aws-region>.<minio-server:port> \
>  --conf spark.hadoop.fs.s3a.access.key=<access.key> \
>  --conf spark.hadoop.fs.s3a.secret.key=<secret.key> \
>  --conf spark.hadoop.fs.s3a.path.style.access=true \
>  --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem

I can contribute a section in the Storage Configuration around how to make Delta work with MinIO-S3 (apart from the AWS-S3 that is currently available), if that would be of any use to the community.

Also, how can one contribute to the docs.delta.io documentation?

@tdas
Copy link
Contributor

@tdas tdas commented Feb 5, 2020

Thank you for reporting the docs error! The docs are unfortunately not open source. I am happy to make the changes if you can write up a summary with the same template as it is for S3 in the docs.

@tdas
Copy link
Contributor

@tdas tdas commented Feb 5, 2020

Actually, on second thought after learning more about minio, I am unsure of how much testing is needed before we can reliably document that delta works reliably on minio. minio is after all a completely different storage system that happens to be s3 API compatible. This does not automatically guarantee that minio satisfies the 3 requirements for Delta to work with a storage system as documented here - https://docs.delta.io/latest/delta-storage.html#storage-configuration
Different file systems have different quirks wrt these 3 requirements hence requires different log store implementations that fixes/works-around those quirks.

In summary, we can document minio only when it has proven by documentation, custom logstore implementation and testing that delta is guaranteed to work correctly on minio. Can you provide them?

@harshavardhana
Copy link

@harshavardhana harshavardhana commented Feb 6, 2020

As per MinIO's design all 3 criterias are satisfied by MinIO.

@ramkumarkb
Copy link
Author

@ramkumarkb ramkumarkb commented Feb 6, 2020

TD,

Thanks for your reply.

As per the MinIO docs - https://docs.minio.io/docs/distributed-minio-quickstart-guide.html

MinIO follows strict read-after-write and list-after-write consistency model for all i/o operations both in distributed and standalone modes.

I have tested most of the Delta table operations (mentioned in the delta docs) with MinIO, and all the operations were successful:

  • create / write table
  • append table
  • batch updates to table
  • stream updates to table
  • history
  • time travel

Are there any specific set of tests that you want to be performed before this can be added to the documentation?

@ramkumarkb
Copy link
Author

@ramkumarkb ramkumarkb commented Feb 13, 2020

Hi TD,

Any updates to this issue? Thank you,

@nitisht
Copy link

@nitisht nitisht commented Feb 13, 2020

@tdas it would be great to add MinIO as S3 compatible target for Delta Lake in the documentation.

@joaquinvanschoren
Copy link

@joaquinvanschoren joaquinvanschoren commented Jul 1, 2020

Is MinIO now a S3 compatible target for Delta Lake? The documentation is not there so I was wondering whether there are any concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
5 participants
You can’t perform that action at this time.