Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
Anomaly detection using LoOP: Local Outlier Probabilities, a local density based outlier detection method providing an outlier score in the range of [0,1].
An algorithm based on Java implementation, can automatically check the set of outliers in a set of data, eliminate these outliers, and finally get normal data.基于java实现的能够自动检查出一组数据中的异常值的集合,剔除这些异常集,得到正常数据。
Predicting disease spread, a DrivenData competition. I'am currently participating in this competition. I used it as submission for the second capstone project in the course 'Professional Certificate in Data Science' provided by Harvard University (HarvardX) on EDX.
Applied analysis on the Bayesian student-t "Robust" regression model with Jeffrey's prior. Compared its model performance and robustness of posterior distributions with the Gaussian model when outliers are present.