The Wayback Machine - https://web.archive.org/web/20211202182634/https://github.com/topics/data-engineering
Skip to content
#

data-engineering

Here are 1,027 public repositories matching this topic...

superset
kamalkeshavani-aiinside
kamalkeshavani-aiinside commented Nov 30, 2021

A clear and concise description of what the bug is.

How to reproduce the bug

  1. Create a metric with a custom label
  2. Use this metric in Mixed Time-series Chart

Expected results

Custom label for metric is shown in legend.

Actual results

Metric name is shown in legend.

Screenshots

If applicable, add screenshots to help explain your problem.
![image](http

Aylr
Aylr commented Dec 28, 2020

Describe the bug
data docs columns shrink to 1 character width with long query

To Reproduce
Steps to reproduce the behavior:

  1. make a batch from a long query string
  2. run validation
  3. render result to data docs
  4. See screenshot
    <img width="1525" alt="Data_documentation_compiled_by_Great_Expectations" src="https://user-images.githubusercontent.com/928247/103230647-30eca500-4
adchia
adchia commented Nov 4, 2021

When specifying on demand feature views at retrieval time (e.g. get_X_features), the output feature vectors include e.g. request data or dependent feature vectors, even if users did not specify said features.

Expected Behavior

Non-specified dependent feature values are not returned in output

Current Behavior

Non-specified dependent feature values are in output

Steps to reprodu

lakeFS
ozkatz
ozkatz commented Nov 7, 2021

What

being able to take a data object (or prefix, like a partition) and get back the commit that added/modified it.

Why

This is valuable lineage information that is currently available in lakeFS but not exposed easily, and mimics the behavior of git blame

How

Given the lakeFS API already supports listing the log of commits for an object or prefix (🎉), this could be a `

A comprehensive list of 180+ YouTube Channels for Data Science, Data Engineering, Machine Learning, Deep learning, Computer Science, programming, software engineering, etc.

  • Updated Nov 15, 2021
anks7190
anks7190 commented Jan 27, 2021

Hi ,

I am using some basic functions from pyjanitor such as - clean_names() , collapse_levels() in one of my code which I want to productionise.
And there are limitations on the size of the production code base.
Currently ,if I just look at the requirements.txt for just "pyjanitor" , its huge .
I don't think I require all the dependencies in my code.
How can I remove the unnecessary ones ?

davidradl
davidradl commented Nov 17, 2021

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

A large amount of output goes to the log, this should not happen by default.

Expected Behavior

much less content in the output of the FVT and the build bu default

Switch on debug in the logging configuration and then see all the output.

Steps To Reproduce

run the build

Env

Improve this page

Add a description, image, and links to the data-engineering topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-engineering topic, visit your repo's landing page and select "manage topics."

Learn more