Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
Buliding a predictive model to help the banks in predicting the probability of a customer defaulting in the future based on the historic data is the ultimate objective of the project . For which structured methodology is followed, in which the historic data of the customers are cleaned , splitted for testing and training , independency are checked ,classification algorithms were used like J48,Naïve Bayes, MLP,base line accuracy for the available data set is calculated, models were repeatedly tested and trained with Hold out and K-fold cross validation methods ,significant features are selected and the models were re evaluated for accuracy and precision comparison , model errors are address with the ROC .Individual model outputs are captured and compared with each other to get the best model for the Defaulter prediction.
In this project regression model is used that pertains to a summary estimate to how strongly crude death rate is dependenton other factors such as current health expenditure, food safety, physicians density and legislation score and predict the crudedeath rate accordingly. Also, logistic regression model is used that pertains to a summary estimate in order to predict thetype of education on the bases of sex and age.
• Visualized and analyzed data using statistical classification、data cleaning and modeling by Tableau、SPSS. • Used Pearson Correlation to summarize features of consumer behaviors and make predictions. • Created “Intellectual Consumption Index” to evaluate 395 student's behaviors by using factor analysis, collaborated with team members to present to executives and secured 2nd place among 10 teams