-
Updated
Apr 26, 2022 - Python
#
data-mining
Here are 3,880 public repositories matching this topic...
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.
data-science
machine-learning
data-mining
deep-learning
genetic-algorithm
deep-reinforcement-learning
machine-learning-from-scratch
science
data-science
machine-learning
data-mining
deep-learning
analytics
data-visualization
awesome-list
data-scientists
hacktoberfest
-
Updated
May 12, 2022
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
python
machine-learning
information-retrieval
data-mining
ocr
deep-learning
image-processing
cnn
pytorch
lstm
optical-character-recognition
crnn
scene-text
scene-text-recognition
easyocr
-
Updated
May 3, 2022 - Python
mpenkov
commented
Jun 22, 2021
In gensim/models/fasttext.py:
model = FastText(
vector_size=m.dim,
vector_size=m.dim,
window=m.ws,
window=m.ws,
epochs=m.epoch,
epochs=m.epoch,
negative=m.neg,
negative=m.neg,
# FIXME: these next 2 lines read in unsupported FB FT modes (loss=3 softmax or loss=4 onevsall,
# or model=3 supervi
bug
Issue described a bug
difficulty easy
Easy issue: required small fix
good first issue
Issue for new contributors (not required gensim understanding + very simple)
fasttext
Issues related to the FastText model
The "Python Machine Learning (1st edition)" book code repository and info resource
python
data-science
machine-learning
data-mining
neural-network
scikit-learn
machine-learning-algorithms
logistic-regression
-
Updated
Jul 30, 2021 - Jupyter Notebook
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
machine-learning
data-mining
awesome
deep-learning
awesome-list
interpretability
privacy-preserving
production-machine-learning
mlops
privacy-preserving-machine-learning
explainability
responsible-ai
machine-learning-operations
ml-ops
ml-operations
privacy-preserving-ml
large-scale-ml
production-ml
large-scale-machine-learning
-
Updated
May 12, 2022
fingoldo
commented
Mar 24, 2022
Problem:
_catboost.pyx in _catboost._set_features_order_data_pd_data_frame()
_catboost.pyx in _catboost.get_cat_factor_bytes_representation()
CatBoostError: Invalid type for cat_feature[non-default value idx=1,feature_idx=336]=2.0 : cat_features must be integer or string, real number values and NaN values should be converted to string.
Could you also print a feature name, not o
Anomaly detection related books, papers, videos, and toolboxes
machine-learning
data-mining
awesome
awesome-list
outlier-detection
unsupervised-learning
fraud-detection
time-series-analysis
anomaly-detection
fraud
outlier
outlier-ensembles
graph-neural-networks
-
Updated
Apr 6, 2022 - Python
2
msho-nb
commented
Mar 11, 2022
For the autoencoder in pyod, how do I adjust the learning rate?
fkiraly
commented
May 8, 2022
Since recently, the following deprecation warning started appearing in many places:
FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
Due to sensitivity regarding dtype, that would mean tests, utilities, or estimators start breaking where construction starts with an empt
good first issue
Good for newcomers
maintenance
Continuous integration, unit testing & package distribution
-
Updated
Jan 17, 2022
人工智能学习路线图,整理近200个实战案例与项目,免费提供配套教材,零基础入门,就业实战!包括:Python,数学,机器学习,数据分析,深度学习,计算机视觉,自然语言处理,PyTorch tensorflow machine-learning,deep-learning data-analysis data-mining mathematics data-science artificial-intelligence python tensorflow tensorflow2 caffe keras pytorch algorithm numpy pandas matplotlib seaborn nlp cv等热门领域
python
nlp
data-science
machine-learning
data-mining
algorithm
caffe
deep-learning
tensorflow
numpy
cv
keras
mathematics
pandas
pytorch
seaborn
artificial-intelligence
data-analysis
matplotlib
tensorflow2
-
Updated
Feb 6, 2020
A library of extension and helper modules for Python's data analysis and machine learning libraries.
python
data-science
machine-learning
data-mining
supervised-learning
unsupervised-learning
association-rules
-
Updated
May 14, 2022 - Python
visualization
python
data-science
machine-learning
data-mining
random-forest
clustering
numpy
scikit-learn
regression
pandas
data-visualization
classification
scipy
orange
plotting
decision-trees
visual-programming
orange3
-
Updated
May 13, 2022 - Python
Curated list of Python resources for data science.
python
data-science
machine-learning
data-mining
awesome
statistics
deep-learning
data-visualization
artificial-intelligence
datascience
data-analysis
awesome-list
deeplearning
bayes
-
Updated
May 13, 2022
extract text from any document. no muss. no fuss.
-
Updated
May 2, 2022 - HTML
Alink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.
machine-learning
data-mining
statistics
kafka
graph-algorithms
clustering
word2vec
regression
xgboost
classification
recommender
recommender-system
apriori
feature-engineering
flink
fm
flink-ml
graph-embedding
flink-machine-learning
-
Updated
May 10, 2022 - Java
A curated list of awesome machine learning interpretability resources.
python
data-science
machine-learning
data-mining
awesome
r
awesome-list
transparency
fairness
accountability
interpretability
interpretable-deep-learning
interpretable-ai
interpretable-ml
explainable-ml
xai
fatml
interpretable-machine-learning
iml
machine-learning-interpretability
-
Updated
Jan 5, 2022
List of tools & datasets for anomaly detection on time-series data.
machine-learning
data-mining
time-series
data-analysis
awesome-list
outlier-detection
anomaly-detection
temporal-data
-
Updated
Apr 1, 2022
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
-
Updated
Mar 7, 2022 - Python
HTML5 based online tool to extract numerical data from plot images.
-
Updated
May 15, 2022 - JavaScript
Highly cited and useful papers related to machine learning, deep learning, AI, game theory, reinforcement learning
data-science
machine-learning
data-mining
statistics
reinforcement-learning
deep-learning
neural-network
hardware
paper
machine-learning-algorithms
statistical-learning
artificial-intelligence
game-theory
pattern-recognition
literature
silicon
learning-theory
-
Updated
Aug 12, 2021
-
Updated
Apr 17, 2022 - Markdown
10x faster matrix and vector operations
-
Updated
Apr 30, 2022 - C++
nlp
data-science
machine-learning
data-mining
university
deep-learning
mooc
collections
neural-networks
russian
-
Updated
Jan 12, 2021
eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.
cli
tsv
data-science
data-mining
statistics
csv
command-line
d
tabular-data
delimited-files
dlang
sampling
shuffle
uniq
reservoir-sampling
-
Updated
Dec 8, 2021 - D
Dex : The Data Explorer -- A data visualization tool written in Java/Groovy/JavaFX capable of powerful ETL and publishing web visualizations.
visualization
datavis
d3
java
groovy
data-science
data-mining
dataviz
javafx
data-visualization
data-analysis
d3js
datavisualization
-
Updated
Feb 12, 2019 - JavaScript
novel deep learning research works with PaddlePaddle
-
Updated
Mar 29, 2022 - Python
Improve this page
Add a description, image, and links to the data-mining topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the data-mining topic, visit your repo's landing page and select "manage topics."


Summary
mypyshows some issues in LightGBM's Python package.mypy \ --exclude='python-package/compile/|python-package/build' \ --ignore-missing-imports \ python-package/18 errors in 4 files (click me)