The Wayback Machine - https://web.archive.org/web/20200906062550/https://github.com/quantumblacklabs/kedro/issues/26/
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KED-926] Anaconda Intake Integration #26

Open
jamesmyatt opened this issue Jun 20, 2019 · 8 comments
Open

[KED-926] Anaconda Intake Integration #26

jamesmyatt opened this issue Jun 20, 2019 · 8 comments

Comments

@jamesmyatt
Copy link

@jamesmyatt jamesmyatt commented Jun 20, 2019

The intake project, led by Anaconda, provides rich functionality for data catalogs. Consider using that instead of a homebrew approach.
https://www.anaconda.com/intake-taking-the-pain-out-of-data-access/

@yetudada
Copy link
Contributor

@yetudada yetudada commented Jun 25, 2019

Hi @jamesmyatt! Thank you so much for submitting this feature request. We've been checking out Intake since we released. Have you had any experience using it? What makes it great to use?

@jamesmyatt
Copy link
Author

@jamesmyatt jamesmyatt commented Jun 25, 2019

I haven't used it in anger, but I've been following it loosely since Anaconda announced it.

It looks like it has a significant overlap with your catalog and it makes sense to avoid re-inventing the wheel. It also has easy integration with other frameworks like Dask.

@tsanikgr
Copy link
Contributor

@tsanikgr tsanikgr commented Jun 25, 2019

Hi @jamesmyatt, thanks for your suggestion. intake indeed looks very promising and we have it under our radar!

The biggest difference is that intake is for reading data only - the data catalog allows specifying both read & write datasets.

It should be fairly easy creating an IntakeDataSet - I believe integrating the 2 here might be the best approach.
A more involved contribution might be populating a kedro.io.DataCatalog from an intake catalog.

We would love contributions in this space if that is of interest to you! Please let us know if you plan on working on something so that we avoid duplication of work :)

Thank you again and welcome to our community!

@lorenabalan lorenabalan changed the title Consider using intake for data catalog [KED-926] Consider using intake for data catalog Aug 6, 2019
@lorenabalan
Copy link
Contributor

@lorenabalan lorenabalan commented Aug 6, 2019

I've updated the title with our internal ticket number to keep track of this more easily. :)

@yetudada yetudada changed the title [KED-926] Consider using intake for data catalog [KED-926] Anaconda Intake Integration Oct 29, 2019
@yetudada yetudada added this to Parked in Opportunity Roadmap Oct 29, 2019
@ZainPatelQB ZainPatelQB mentioned this issue Feb 28, 2020
0 of 7 tasks complete
@martindurant
Copy link

@martindurant martindurant commented Feb 28, 2020

I have only just now become aware of this issue. Please let me know what you need from Intake to ease its adoption, if you still think it's a good idea. Note that it's probably easy to use your existing prescriptions, but create an Intake Catalog from them. As discussed in the linked issue above, the most immediate advantage might be hooking into fsspec for loading from various storage backends (not that you necessarily need Intake to do this).

(Also, Intake does write, but only in one specific data format for each "container", e.g., parquet for dataframe-like datasets https://intake.readthedocs.io/en/latest/persisting.html#export )

(EDIT: I am the maintainer of Intake, in case that wasn't obvious :) )

@ZainPatelQB
Copy link
Contributor

@ZainPatelQB ZainPatelQB commented Feb 28, 2020

re: your point about using fsspec, that's exactly what we did in our latest release (without using Intake) and it's awesome, thanks for your work on it! 🎉

@martindurant
Copy link

@martindurant martindurant commented Feb 28, 2020

Glad to hear it!

@idanov
Copy link
Contributor

@idanov idanov commented Feb 29, 2020

@martindurant Glad to see you in this thread. We've been thinking internally how we can integrate best with intake and since we've been focusing mainly on other things recently, we haven't progressed much on the ideas front. We're really open to ideas how we can leverage intake beyond fsspec, which we found very useful indeed - great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.