bigdata

Hello,
Considering your amazing efficiency on pandas, numpy, and more, it would seem to make sense for your module to work with even bigger data, such as Audio (for example .mp3 and .wav). This is something that would help a lot considering the nature audio (ie. where one of the lowest and most common sampling rates is still 44,100 samples/sec). For a use case, I would consider vaex.open('Hu

What would you like to be added:

Task-level DAG scheduling policy

Why is this needed:

This feature provides the ability to customize the order in which tasks are launched

The following scenarios come to mind so far:

mpi job. the master needs to wait for the worker to start before starting, If t

This is to track implementation of the ML-Features: https://spark.apache.org/docs/latest/ml-features

Bucketizer has been implemented in dotnet/spark#378 but there are more features that should be implemented.

Feature Extractors
- TF-IDF
- Word2Vec (dotnet/spark#491)
- CountVectorizer (https://github.com/dotnet/spark/p

empty
notEmpty
length
lengthUTF8
char_length, CHAR_LENGTH
character_length, CHARACTER_LENGTH
lower, lcase
upper, ucase
lowerUTF8
upperUTF8
isValidUTF8
toValidUTF8
CopytoValidUTF8(input_string)
repeat
reverse
reverseUTF8
format(pattern, s0, s1, …)
concat
concatA

Jun	JUL	Aug
	30
2020	2021	2022

bigdata

Here are 1,497 public repositories matching this topic...

taosdata / TDengine

0xnr / awesome-bigdata

heibaiying / BigData-Notes

vaexio / vaex

wangzhiwubigdata / God-Of-BigData

douban / dpark

DTStack / flinkx

apache / hudi

apache / avro

volcano-sh / volcano

What would you like to be added:

Why is this needed:

shzlw / poli

dotnet / spark

DTStack / flinkStreamSQL

Netflix / genie

griddb / griddb

jadianes / spark-py-notebooks

allwefantasy / mlsql

Dr11ft / BigDataGuide

hi-primus / optimus

YoongiKim / AutoCrawler

CheckChe0803 / BigData-Interview

microsoft / Mobius

tensorbase / tensorbase

panther-labs / panther

kubernetes-sigs / kube-batch

josonle / Coding-Now

jadianes / spark-movie-lens

gearpump / gearpump

fdv / running-elasticsearch-fun-profit

tal-tech / cds

Improve this page

Add this topic to your repo