[KED-926] Anaconda Intake Integration #26

jamesmyatt · 2019-06-20T12:10:09Z

The intake project, led by Anaconda, provides rich functionality for data catalogs. Consider using that instead of a homebrew approach.
https://www.anaconda.com/intake-taking-the-pain-out-of-data-access/

yetudada · 2019-06-25T13:25:03Z

Hi @jamesmyatt! Thank you so much for submitting this feature request. We've been checking out Intake since we released. Have you had any experience using it? What makes it great to use?

jamesmyatt · 2019-06-25T15:06:35Z

I haven't used it in anger, but I've been following it loosely since Anaconda announced it.

It looks like it has a significant overlap with your catalog and it makes sense to avoid re-inventing the wheel. It also has easy integration with other frameworks like Dask.

tsanikgr · 2019-06-25T15:48:36Z

Hi @jamesmyatt, thanks for your suggestion. intake indeed looks very promising and we have it under our radar!

The biggest difference is that intake is for reading data only - the data catalog allows specifying both read & write datasets.

It should be fairly easy creating an IntakeDataSet - I believe integrating the 2 here might be the best approach.
A more involved contribution might be populating a kedro.io.DataCatalog from an intake catalog.

We would love contributions in this space if that is of interest to you! Please let us know if you plan on working on something so that we avoid duplication of work :)

Thank you again and welcome to our community!

lorenabalan · 2019-08-06T14:05:54Z

I've updated the title with our internal ticket number to keep track of this more easily. :)

martindurant · 2020-02-28T21:36:04Z

I have only just now become aware of this issue. Please let me know what you need from Intake to ease its adoption, if you still think it's a good idea. Note that it's probably easy to use your existing prescriptions, but create an Intake Catalog from them. As discussed in the linked issue above, the most immediate advantage might be hooking into fsspec for loading from various storage backends (not that you necessarily need Intake to do this).

(Also, Intake does write, but only in one specific data format for each "container", e.g., parquet for dataframe-like datasets https://intake.readthedocs.io/en/latest/persisting.html#export )

(EDIT: I am the maintainer of Intake, in case that wasn't obvious :) )

ZainPatelQB · 2020-02-28T21:44:04Z

re: your point about using fsspec, that's exactly what we did in our latest release (without using Intake) and it's awesome, thanks for your work on it! 🎉

martindurant · 2020-02-28T21:52:56Z

Glad to hear it!

idanov · 2020-02-29T12:25:05Z

@martindurant Glad to see you in this thread. We've been thinking internally how we can integrate best with intake and since we've been focusing mainly on other things recently, we haven't progressed much on the ideas front. We're really open to ideas how we can leverage intake beyond fsspec, which we found very useful indeed - great work!

jamesmyatt added the Issue: Feature Request label Jun 20, 2019

yetudada added the Type: Discussion label Jun 25, 2019

idanov added the good first issue label Jul 15, 2019

lorenabalan changed the title ~~Consider using intake for data catalog~~ [KED-926] Consider using intake for data catalog Aug 6, 2019

yetudada changed the title ~~[KED-926] Consider using intake for data catalog~~ [KED-926] Anaconda Intake Integration Oct 29, 2019

yetudada added the Type: Opportunity Roadmap label Oct 29, 2019

yetudada added this to Parked in Opportunity Roadmap Oct 29, 2019

ZainPatelQB mentioned this issue Feb 28, 2020

HTTP Store zarr-developers/zarr-python#373

Closed

0 of 7 tasks complete

yetudada removed this from Parked in Opportunity Roadmap Jul 24, 2020

yetudada added the Issue: EuroPython Sprint label Jul 24, 2020

yetudada added this to To do in EuroPython 2020 Sprint Jul 24, 2020

laisbsc assigned laisbsc and unassigned laisbsc Jul 25, 2020

quantumblacklabs / kedro

[KED-926] Anaconda Intake Integration #26

[KED-926] Anaconda Intake Integration #26

jamesmyatt commented Jun 20, 2019 •

edited

yetudada commented Jun 25, 2019

jamesmyatt commented Jun 25, 2019

tsanikgr commented Jun 25, 2019 •

edited

lorenabalan commented Aug 6, 2019 •

edited

martindurant commented Feb 28, 2020 •

edited

ZainPatelQB commented Feb 28, 2020

martindurant commented Feb 28, 2020

idanov commented Feb 29, 2020

quantumblacklabs / kedro

Join GitHub today

[KED-926] Anaconda Intake Integration #26

[KED-926] Anaconda Intake Integration #26

Comments

jamesmyatt commented Jun 20, 2019 • edited

yetudada commented Jun 25, 2019

jamesmyatt commented Jun 25, 2019

tsanikgr commented Jun 25, 2019 • edited

lorenabalan commented Aug 6, 2019 • edited

martindurant commented Feb 28, 2020 • edited

ZainPatelQB commented Feb 28, 2020

martindurant commented Feb 28, 2020

idanov commented Feb 29, 2020

jamesmyatt commented Jun 20, 2019 •

edited

tsanikgr commented Jun 25, 2019 •

edited

lorenabalan commented Aug 6, 2019 •

edited

martindurant commented Feb 28, 2020 •

edited