Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upDocumentation error and recommend to add a new section under Storage/S3 for MinIO #314
Comments
|
Thank you for reporting the docs error! The docs are unfortunately not open source. I am happy to make the changes if you can write up a summary with the same template as it is for S3 in the docs. |
|
Actually, on second thought after learning more about minio, I am unsure of how much testing is needed before we can reliably document that delta works reliably on minio. minio is after all a completely different storage system that happens to be s3 API compatible. This does not automatically guarantee that minio satisfies the 3 requirements for Delta to work with a storage system as documented here - https://docs.delta.io/latest/delta-storage.html#storage-configuration In summary, we can document minio only when it has proven by documentation, custom logstore implementation and testing that delta is guaranteed to work correctly on minio. Can you provide them? |
|
As per MinIO's design all 3 criterias are satisfied by MinIO. |
|
TD, Thanks for your reply. As per the MinIO docs - https://docs.minio.io/docs/distributed-minio-quickstart-guide.html
I have tested most of the Delta table operations (mentioned in the delta docs) with MinIO, and all the operations were successful:
Are there any specific set of tests that you want to be performed before this can be added to the documentation? |
|
Hi TD, Any updates to this issue? Thank you, |
|
@tdas it would be great to add MinIO as S3 compatible target for Delta Lake in the documentation. |
|
Is MinIO now a S3 compatible target for Delta Lake? The documentation is not there so I was wondering whether there are any concerns. |

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.

I have noticed a small error in the documentation around S3 configurations:
https://docs.delta.io/latest/delta-storage.html#amazon-s3
On the read part, it should be
loadand notsave:spark.read.format("delta").load("s3a://<your-s3-bucket>/<path>/<to>/<delta-table>")Also, I have successfully tested Delta 0.5.0 with on-premise S3 - https://min.io
There were some quirks around the S3 region settings (by default Hadoop S3 lacks specific region setting API, instead it gets interpreted thru
spark.hadoop.fs.s3a.endpoint:I can contribute a section in the Storage Configuration around how to make Delta work with MinIO-S3 (apart from the AWS-S3 that is currently available), if that would be of any use to the community.
Also, how can one contribute to the docs.delta.io documentation?