data-engineering

A clear and concise description of what the bug is.

How to reproduce the bug

Create a metric with a custom label
Use this metric in Mixed Time-series Chart

Expected results

Custom label for metric is shown in legend.

Actual results

Metric name is shown in legend.

Screenshots

If applicable, add screenshots to help explain your problem.
![image](http

Description

It is not an actual bug but in the documentation here -> https://docs.prefect.io/orchestration/concepts/api.html#queries
flow_run actually needs to be flow_runs.
Otherwise it does not work for me.

Expected Behavior

Documentation should be updated.

Reproduction

Describe the bug
data docs columns shrink to 1 character width with long query

To Reproduce
Steps to reproduce the behavior:

make a batch from a long query string
run validation
render result to data docs
See screenshot
<img width="1525" alt="Data_documentation_compiled_by_Great_Expectations" src="https://user-images.githubusercontent.com/928247/103230647-30eca500-4

Tell us about the problem you're trying to solve

We can probably reduce the docker image size of our java based connectors by using the ADD command instead of COPYing the tar archive. See this PR for an example

Describe the solution you’d like

use the ADD command to reduce the size of the docker images

When specifying on demand feature views at retrieval time (e.g. get_X_features), the output feature vectors include e.g. request data or dependent feature vectors, even if users did not specify said features.

Expected Behavior

Non-specified dependent feature values are not returned in output

Current Behavior

Non-specified dependent feature values are in output

Steps to reprodu

What

being able to take a data object (or prefix, like a partition) and get back the commit that added/modified it.

Why

This is valuable lineage information that is currently available in lakeFS but not exposed easily, and mimics the behavior of git blame

How

Given the lakeFS API already supports listing the log of commits for an object or prefix (🎉), this could be a `

if they are not class methods then the method would be invoked for every test and a session would be created for each of those tests.

`class PySparkTest(unittest.TestCase):
@classmethod
def suppress_py4j_logging(cls):
logger = logging.getLogger('py4j')
logger.setLevel(logging.WARN)

@classmethod
def create_testing_pyspark_session(cls):
    return Sp

Hi ,

I am using some basic functions from pyjanitor such as - clean_names() , collapse_levels() in one of my code which I want to productionise.
And there are limitations on the size of the production code base.
Currently ,if I just look at the requirements.txt for just "pyjanitor" , its huge .
I don't think I require all the dependencies in my code.
How can I remove the unnecessary ones ?

Users can tell Ploomber to track changes to configuration files via resources_, however, to track changes, we compute a file hash which may take too long if the file is large.

We should show a warning if this happens, resources_ should not be used with large files.

After doing metadata docker --start
It shows 0.3.0 instead of 0.6.0

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

A large amount of output goes to the log, this should not happen by default.

Expected Behavior

much less content in the output of the FVT and the build bu default

Switch on debug in the logging configuration and then see all the output.

Steps To Reproduce

run the build

Env

Nov	DEC	Jan
	02
2020	2021	2022

data-engineering

Here are 1,027 public repositories matching this topic...

apache / superset

How to reproduce the bug

Expected results

Actual results

Screenshots

eugeneyan / applied-ml

andkret / Cookbook

datastacktv / data-engineer-roadmap

PrefectHQ / prefect

Description

Expected Behavior

Reproduction

great-expectations / great_expectations

airbytehq / airbyte

Tell us about the problem you're trying to solve

Describe the solution you’d like

Jeffail / benthos

feast-dev / feast

Expected Behavior

Current Behavior

Steps to reprodu

awslabs / aws-data-wrangler

adilkhash / Data-Engineering-HowTo

treeverse / lakeFS

What

Why

How

kantord / just-dashboard

quiltdata / quilt

benthecoder / yt-channels-DS-AI-ML-CS

GoogleCloudPlatform / data-science-on-gcp

san089 / goodreads_etl_pipeline

AlexIoannides / pyspark-example-project

pyjanitor-devs / pyjanitor

ploomber / ploomber

abhishek-ch / around-dataengineering

oleg-agapov / data-engineering-book

san089 / Udacity-Data-Engineering-Projects

gunnarmorling / awesome-opensource-data-engineering

sodadata / soda-sql

open-metadata / OpenMetadata

automaticmode / active_workflow

mlrun / mlrun

odpi / egeria

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Env

dataform-co / dataform

Improve this page

Add this topic to your repo