data-mining

One unit test in the R package is currently broken. Steps to reproduce on Mac

export CXX=/usr/local/bin/g++-8 CC=/usr/local/bin/gcc-8
Rscript build_r.R
cd R-package/tests
Rscript testthat.R

This results in the following error at the ends of the logs

[LightGBM] [Info] Saving data to binary file /var/folders/xq/wktq4zdx4jd3qdpk34d28m940000gn/T//RtmpiY1DzV/lgb.Dataset_1555

Example (from TfidfTransformer)

if isinstance(docs[0], tuple):
    docs = [docs]
return [self.gensim_model[doc] for doc in docs]

This method expects a list of tuples, instead of an iterable. This means that the entire corpus has to be stored as a lis

This class could be used instead of cd file https://catboost.ai/docs/concepts/input-data_column-descfile.html when creating Pool from filez. The class should have init function, methods load and save, and Pool init method should be able to use object of this class instead of cd file during initialization.

The developer of the website I intend to scrape information from is sloppy and has left a lot of broken links.
When I execute an otherwise effective Ferret script on a list of pages, it stops altogether at every 404.
Is there a DOCUMENT_EXISTS or anything that would help the script go on?

I'm using latest pyod version on pypi. How to generate simulated data where x-axis is time? Thank you.

https://github.com/rasbt/mlxtend/blob/115278bac14d7fc278885c0722da03f1c3b91604/mlxtend/frequent_patterns/apriori.py#L224

Processing 24785850 combinations | Sampling itemset size 6
Traceback (most recent call last):
File "***.py", line 116, in
frequent_itemsets = apriori(df, min_support=0.8, use_colnames=True, verbose=1)

File "C:\ProgramData\Anaconda3\lib\site-

We're undergoing an internal software audit and identified at least one textract component released under the Affero GPL: the EbookLib.

Lawyers are getting a bit antsy over this. In general, compatibility with GPL means that code released under a different license (e.g. MIT) and combined with GPL'd code must be released under GPL. This might create a b

It would be nice to have a list of current contributors and update this list as more people add resources to this repo.

I'm using tsv-utils from the arch linux aur, trying to format some word frequency data from the new general services list dataset. tsv-utils makes at least two errors that I'm able to see when I'm running this commandline:

tsv-select -f 1,7 NGSL+1.01+with+SFI.tsv | tsv-pretty | less

adding -s 5 to tsv-pretty works around this problem. The tsv file was converted from the file NGSL+1.01+with

I'd like to cite Webplotdigitizer in my paper. It would make citation much easier for BibTeX users if you can also make a BibTeX item on the page https://automeris.io/WebPlotDigitizer/citation.html

e.g. something like below

@misc{Rohatgi2019,
  url = {https://automeris.io/WebPlotDigitizer},
  author = {Rohatgi,  Ankit},
  title = {Webplotdigitizer: Version 4.2},
  year = {2019}
}
`

When I click the Server Management button I receive an Internal Server Error
Error i noticed

Right now BigQueryIO doesn't offer a way to specify that the tables, when created, should be marked as time partitioned.

Documentation: https://cloud.google.com/bigquery/docs/creating-partitioned-tables

What I would like is something like:

...
                .apply(BigQueryIO.Write
                        .setTimePartitioning(TimePartitioning.Type.DAY)
                        .with

May	JUN	Jul
	26
2019	2020	2021

data-mining

Here are 2,710 public repositories matching this topic...

eriklindernoren / ML-From-Scratch

academic / awesome-datascience

microsoft / LightGBM

RaRe-Technologies / gensim

rasbt / python-machine-learning-book

catboost / catboost

MontFerret / ferret

jivoi / awesome-ml-for-cybersecurity

EthicalML / awesome-production-machine-learning

yzhao062 / pyod

rasbt / mlxtend

deanmalmgren / textract

yzhao062 / anomaly-detection-resources

r0f1 / datascience

biolab / orange3

WZBSocialScienceCenter / pdftabextract

tangyudi / Ai-Learn

jphall663 / awesome-machine-learning-interpretability

rob-med / awesome-TS-anomaly-detection

demidovakatya / vvedenie-mashinnoe-obuchenie

PatMartin / Dex

eBay / tsv-utils

ankitrohatgi / WebPlotDigitizer

CIRCL / AIL-framework

zhenghaoz / gorse

sepandhaghighi / pycm

GoogleCloudPlatform / DataflowJavaSDK

alan-turing-institute / CleverCSV

tirthajyoti / Papers-Literature-ML-DL-RL-AI

404notf0und / AI-for-Security-Learning

Improve this page

Add this topic to your repo