Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
Modeled the credit risk associated with consumer loans. Performed exploratory data analysis (EDA), preprocessing of continuous and discrete variables using various techniques depending on the feature. Checked for missing values and cleaned the data. Built the probability of default model using Logistic Regression. Visualized all the results. Computed Weight of Evidence and price elasticities.
This project commissions to examine the 100,000 credit card application data, detect abnormality and potential fraud in the dataset. All data manipulation and analysis are conducted in R. Featured analysis methods include Principal Component Analysis (PCA), Heuristic Algorithm and Autoencoder.
By the data set from 'Give Me Some Credit' (2012), this work is to use it to illustrate some useful techniques in Credit Scoring Modelling, namely: GLM, SMOTE, CARET, CHAID, and MOB.