data-catalog

Currently we only support db store publisher (e.g neo4j, mysql,neptune). But it would be pretty easy to support message queue publisher using the interface (e.g SQS, kinesis, Eventhub, kafka) which allows push ETL model support.

There is a pr (amundsen-io/amundsendatabuilder#431) which unfortunately isn't get merged. The pr could be used as an example on how to support t

We have recently made dataset versions traversable via our dataset tab on our lineage page. We would like to do the same for job versions as well. We will want to be able to start with a job, navigate across versions, then navigate again across the runs for that job version. We would also like to see detailed information about job versions on this intermediate page as well. One prereq for this is

@cantzakas

@cantzakas created the SQL query necessary to pull metadata in (hyperqueryhq/whale#140) -- we just have to make the greenplum extractor scaffolding. This should just follow the exact same shape as the Postgres extractor.

Motivation

As odd-platform supports redshift and so on, it would be awesome to support BigQuery integration.

It is not surprising that deep and shallow scan show different results. Shallow scan only looks at column names. Deep scan looks at a sample of the data. I've even noticed that two different runs of deep scan show different results as sample rows are different. This is the challenge with not scanning all of the data. Its a trade-off between performance/cost and accuracy. There is no right answer.

Add more logging in all modules to emit debug signals for improved logging.

Intake-esm adds the attribute intake_esm_varname to datasets, and I have encountered cases where that ends up being None (still looking for the exact model).

Zarr does not like that type of metadata:

import xarray as xr
ds_test = xr.DataArray(5).to_dataset(name='test')
ds_test.attrs['test'] = None

ds_test.to_zarr('test.zarr')

gives

------------------------

Deliverables

add unit tests
add extractor
add README.md in plugins/extractors/mariadb, defining output
register your extractor plugins/extractors/populate.go
add extractor the extractor list in docs/reference/extractor.md

Output must contain a Table

Table

Field	Sample Value
`urn`	`my_database.my_t

What would you like to be added:
It would be great to add support for a datacatalog-connectors-bi for Sisense.

Why is this needed:
Sisense is a popular BI solution, named as a visionary in the Gartner quadrant

It would be nice to have a debug message for:

lookup request with parameters
lookup result(s)

pattern= catalog : dataset name : url : comment
ocean: World Ocean Atlas: https://www.nodc.noaa.gov/OC5/woa18/ : different versions and variables via parameter #15
global carbon budget with https://github.com/edjdavid/intake-excel #22
land: precipitation: https://psl.noaa.gov/data/gridded/tables/precipitation.html:
Mauna Loa CO2 netcdf ftp://aftp.cmdl.noaa.go

Oct	NOV	Dec
	13
2020	2021	2022

data-catalog

Here are 67 public repositories matching this topic...

linkedin / datahub

amundsen-io / amundsen

MarquezProject / marquez

hyperqueryhq / whale

intake / intake

opendatadiscovery / odd-platform

Motivation

tokern / piicatcher

GoogleCloudPlatform / bigquery-data-lineage

aws-samples / aws-dbs-refarch-datalake

intake / intake-esm

odpf / meteor

Deliverables

Output must contain a Table

Table

getmetamapper / metamapper

GoogleCloudPlatform / datacatalog-connectors-rdbms

Bayer-Group / COLID-Documentation

ihsn / nada

GoogleCloudPlatform / datacatalog-connectors-bi

datopian / portal.js.bak

opendatadiscovery / awesome-data-catalogs

SciCatProject / catanie

FINRAOS / herd-mdl

GoogleCloudPlatform / datacatalog-tag-history

related-sciences / articat

aaronspring / remote_climate_data

slaclab / datacat

Bayer-Group / COLID-Setup

Bayer-Group / COLID-Data-Marketplace-Frontend

dbt-content / google-datacatalog-dbt-tag

darenasc / aeda

NCAR / esm-collection-spec

carte-data / carte

Improve this page

Add this topic to your repo