-
Updated
Nov 13, 2021 - Java
data-catalog
Here are 67 public repositories matching this topic...
We have recently made dataset versions traversable via our dataset tab on our lineage page. We would like to do the same for job versions as well. We will want to be able to start with a job, navigate across versions, then navigate again across the runs for that job version. We would also like to see detailed information about job versions on this intermediate page as well. One prereq for this is
@cantzakas created the SQL query necessary to pull metadata in (hyperqueryhq/whale#140) -- we just have to make the greenplum extractor scaffolding. This should just follow the exact same shape as the Postgres extractor.
-
Updated
Nov 3, 2021 - Python
Motivation
As odd-platform supports redshift and so on, it would be awesome to support BigQuery integration.
It is not surprising that deep and shallow scan show different results. Shallow scan only looks at column names. Deep scan looks at a sample of the data. I've even noticed that two different runs of deep scan show different results as sample rows are different. This is the challenge with not scanning all of the data. Its a trade-off between performance/cost and accuracy. There is no right answer.
Add more logging in all modules to emit debug signals for improved logging.
-
Updated
May 13, 2020 - HTML
Intake-esm adds the attribute intake_esm_varname to datasets, and I have encountered cases where that ends up being None (still looking for the exact model).
Zarr does not like that type of metadata:
import xarray as xr
ds_test = xr.DataArray(5).to_dataset(name='test')
ds_test.attrs['test'] = None
ds_test.to_zarr('test.zarr')gives
------------------------
Deliverables
- add unit tests
- add extractor
- add README.md in
plugins/extractors/mariadb, defining output - register your extractor
plugins/extractors/populate.go - add extractor the extractor list in
docs/reference/extractor.md
Output must contain a Table
Table
| Field | Sample Value |
|---|---|
urn |
`my_database.my_t |
-
Updated
Nov 6, 2021 - Python
-
Updated
Nov 9, 2021 - Python
-
Updated
Jul 9, 2021 - HTML
-
Updated
Nov 12, 2021 - PHP
-
Updated
Nov 3, 2021 - JavaScript
-
Updated
Aug 4, 2021
-
Updated
Nov 12, 2021 - TypeScript
-
Updated
Sep 28, 2021 - Java
-
Updated
Jul 21, 2021 - Java
It would be nice to have a debug message for:
- lookup request with parameters
- lookup result(s)
-
pattern= catalog : dataset name : url : comment
-
ocean: World Ocean Atlas: https://www.nodc.noaa.gov/OC5/woa18/ : different versions and variables via parameter #15
-
global carbon budget with https://github.com/edjdavid/intake-excel #22
-
land: precipitation: https://psl.noaa.gov/data/gridded/tables/precipitation.html:
-
Mauna Loa CO2 netcdf ftp://aftp.cmdl.noaa.go
-
Updated
Nov 11, 2021 - Java
-
Updated
Nov 12, 2021 - Shell
-
Updated
Jul 9, 2021 - TypeScript
-
Updated
Jan 19, 2021 - Python
-
Updated
Oct 30, 2021 - Python
-
Updated
Sep 11, 2020
-
Updated
Oct 29, 2021 - Python
Improve this page
Add a description, image, and links to the data-catalog topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the data-catalog topic, visit your repo's landing page and select "manage topics."


Currently we only support db store publisher (e.g neo4j, mysql,neptune). But it would be pretty easy to support message queue publisher using the interface (e.g SQS, kinesis, Eventhub, kafka) which allows push ETL model support.
There is a pr (amundsen-io/amundsendatabuilder#431) which unfortunately isn't get merged. The pr could be used as an example on how to support t