pyspark

I have a simple regression task (using a LightGBMRegressor) where I want to penalize negative predictions more than positive ones. Is there a way to achieve this with the default regression LightGBM objectives (see https://lightgbm.readthedocs.io/en/latest/Parameters.html)? If not, is it somehow possible to define (many example for default LightGBM model) and pass a custom regression objective?

Hello everyone,
Recently I tried to set up petastorm on my company's hadoop cluster.
However as the cluster uses Kerberos for authentication using petastorm failed.
I figured out that petastorm relies on pyarrow which actually supports kerberos authentication.

I hacked "petastorm/petastorm/hdfs/namenode.py" line 250
and replaced it with

driver = 'libhdfs'
return pyarrow.hdfs.c

if they are not class methods then the method would be invoked for every test and a session would be created for each of those tests.

`class PySparkTest(unittest.TestCase):
@classmethod
def suppress_py4j_logging(cls):
logger = logging.getLogger('py4j')
logger.setLevel(logging.WARN)

@classmethod
def create_testing_pyspark_session(cls):
    return Sp

User story

As a user, I quickly want to connect my Snowflake data warehouse with Kuwala to start applying transformations. I only want to put in my credentials and establish the connection. Once connected, I want to see the database schema to see all available tables. For every existing table, I want to see a preview of the data and the column types.

Acceptance criteria

The

These files belong to the Gimel Discovery Service, which is still Work-In-Progress in PayPal & not yet open sourced. In addition, the logic in these files are outdated & hence it does not make sense to have these files in the repo.

https://github.com/paypal/gimel/search?l=Shell
Remove --> gimel-dataapi/gimel-core/src/main/scripts/tools/bin/hbase/hbase_ddl_creator.sh

https://github.com/paypa

Pivot missing categories breaks FeatureSet/AggregatedFeatureSet

Summary

When defining a feature set, it's expected that pivot will have all categories and, as a consequence, the resulting Source dataframe will be suitable to be transformed. When a different behavior happens, FeatureSet and AggregatedFeatureSet breaks.

Feature related:

Age: legacy

Mar	APR	May
	21
2021	2022	2023

pyspark

Here are 1,928 public repositories matching this topic...

microsoft / SynapseML

JohnSnowLabs / spark-nlp

apache / incubator-linkis

ibis-project / ibis

jadianes / spark-py-notebooks

uber / petastorm

awesome-spark / awesome-spark

hi-primus / optimus

jupyter-incubator / sparkmagic

mahmoudparsian / data-algorithms-book

AlexIoannides / pyspark-example-project

WeBankFinTech / Scriptis

HariSekhon / DevOps-Python-tools

ankurchavda / SparkLearning

ericxiao251 / spark-syntax

lyhue1991 / eat_pyspark_in_10_days

kuwala-io / kuwala

User story

Acceptance criteria

ekampf / PySpark-Boilerplate

huseinzol05 / Gather-Deployment

awesome-spark / spark-gotchas

MrPowers / quinn

CamDavidsonPilon / tdigest

Morphl-AI / MorphL-Community-Edition

XD-DENG / Spark-practice

cluster-apps-on-docker / spark-standalone-cluster-on-docker

paypal / gimel

tirthajyoti / Spark-with-Python

quintoandar / butterfree

Pivot missing categories breaks FeatureSet/AggregatedFeatureSet

Summary

commoncrawl / cc-pyspark

ptyadana / SQL-Data-Analysis-and-Visualization-Projects

Improve this page

Add this topic to your repo