-
Updated
Jul 30, 2021 - C
#
bigdata
Here are 1,497 public repositories matching this topic...
An open-source big data platform designed and optimized for the Internet of Things (IoT).
A curated list of awesome big data frameworks, ressources and other awesomeness.
data-science
data
awesome
database
data-stream
bigdata
series-database
data-visualization
data-warehouse
stream-processing
data-analytics
awesome-list
distributed-database
visualize-data
streaming-data
-
Updated
Jul 24, 2021
Based on Apache Flink. support data synchronization/integration and streaming SQL computation.
-
Updated
Jul 29, 2021 - Java
Upserts, Deletes And Incremental Processing on Big Data.
bigdata
stream-processing
data-integration
datalake
apachespark
hudi
apachehudi
incremental-processing
apacheflink
-
Updated
Jul 30, 2021 - Java
hwdef
commented
Jul 19, 2021
What would you like to be added:
Task-level DAG scheduling policy
Why is this needed:
This feature provides the ability to customize the order in which tasks are launched
The following scenarios come to mind so far:
- mpi job. the master needs to wait for the worker to start before starting, If t
An easy-to-use BI server built for SQL lovers. Power data analysis in SQL and gain faster business insights.
sql
spring-boot
dashboard
reactjs
jdbc
reporting
bigdata
data-visualization
business-intelligence
sql-editor
-
Updated
Jul 9, 2021 - Java
GoEddie
commented
Dec 30, 2019
This is to track implementation of the ML-Features: https://spark.apache.org/docs/latest/ml-features
Bucketizer has been implemented in dotnet/spark#378 but there are more features that should be implemented.
- Feature Extractors
- TF-IDF
- Word2Vec (dotnet/spark#491)
- CountVectorizer (https://github.com/dotnet/spark/p
Distributed Big Data Orchestration Service
java
distributed-systems
cloud
microservices
big-data
spring-boot
microservice
bigdata
configuration
orchestration
configuration-management
netflixoss
netflix-oss
-
Updated
Jul 25, 2021 - Java
GridDB is a next-generation open source database that makes time series IoT and big data fast,and easy.
-
Updated
Jul 9, 2021 - C++
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
python
data-science
machine-learning
big-data
spark
notebook
ipython
bigdata
ipython-notebook
pyspark
mllib
data-analysis
-
Updated
Apr 7, 2021 - Jupyter Notebook
The Programming Language Designed For Big Data and AI
-
Updated
Jul 27, 2021 - JavaScript
data-science
machine-learning
spark
bigdata
data-transformation
pyspark
data-extraction
data-analysis
data-wrangling
dask
data-exploration
data-preparation
data-cleaning
data-profiling
data-cleansing
big-data-cleaning
data-cleaner
cudf
dask-cudf
-
Updated
Jul 29, 2021 - Python
Google, Naver multiprocess image web crawler (Selenium)
python
crawler
google
deep-learning
bigdata
thread
selenium
chromedriver
customizable
image-crawler
multiprocess
-
Updated
Jul 1, 2021 - Python
C# and F# language binding and extensions to Apache Spark
streaming
spark
apache-spark
csharp
fsharp
bigdata
dataset
spark-streaming
eventhubs
mapreduce
dataframe
rdd
dstream
mobius
kafka-streaming
near-real-time
-
Updated
Jan 29, 2021 - C#
luluai-cn
commented
Jun 15, 2021
- empty
- notEmpty
- length
- lengthUTF8
- char_length, CHAR_LENGTH
- character_length, CHARACTER_LENGTH
- lower, lcase
- upper, ucase
- lowerUTF8
- upperUTF8
- isValidUTF8
- toValidUTF8
- CopytoValidUTF8(input_string)
- repeat
- reverse
- reverseUTF8
- format(pattern, s0, s1, …)
- concat
- concatA
[DEPRECATED] Detect threats with log data and improve cloud security posture
react
python
go
graphql
aws
security
typescript
serverless
etl
bigdata
compliance
security-automation
auto-remediation
-
Updated
Apr 6, 2021 - Go
A batch scheduler of kubernetes for high performance workload, e.g. AI/ML, BigData, HPC
-
Updated
Jun 6, 2021 - Go
学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、网站、工具。涉及大数据几大组件、Python机器学习和数据分析、Linux、操作系统、算法、网络等
-
Updated
Jun 8, 2021 - Python
An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
-
Updated
Jun 12, 2021 - Jupyter Notebook
Lightweight real-time big data streaming engine over Akka
-
Updated
Jul 20, 2021 - Scala
A book about running Elasticsearch
-
Updated
Mar 17, 2021
Data syncing in golang for ClickHouse.
-
Updated
Jul 16, 2021 - Go
Improve this page
Add a description, image, and links to the bigdata topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the bigdata topic, visit your repo's landing page and select "manage topics."


Hello,
Considering your amazing efficiency on pandas, numpy, and more, it would seem to make sense for your module to work with even bigger data, such as Audio (for example .mp3 and .wav). This is something that would help a lot considering the nature audio (ie. where one of the lowest and most common sampling rates is still 44,100 samples/sec). For a use case, I would consider vaex.open('Hu