Data Mining Applications with R,
Edition 1
By Yanchang Zhao and Yonghua Cen

Publication Date: 12 Dec 2013
Description

Data Mining Applications with R is a great resource for researchers and professionals to understand the wide use of R, a free software environment for statistical computing and graphics, in solving different problems in industry. R is widely used in leveraging data mining techniques across many different industries, including government, finance, insurance, medicine, scientific research and more. This book presents 15 different real-world case studies illustrating various techniques in rapidly growing areas. It is an ideal companion for data mining researchers in academia and industry looking for ways to turn this versatile software into a powerful analytic tool.

R code, Data and color figures for the book are provided at the RDataMining.com website.

Key Features

  • Helps data miners to learn to use R in their specific area of work and see how R can apply in different industries
  • Presents various case studies in real-world applications, which will help readers to apply the techniques in their work
  • Provides code examples and sample data for readers to easily learn the techniques by running the code by themselves
About the author
By Yanchang Zhao, Senior Data Mining Specialist, Australia and Yonghua Cen
Table of Contents

Preface

Background

Objectives and Significance

Target Audience

Acknowledgments

Review Committee

Additional Reviewers

Foreword

References

Chapter 1. Power Grid Data Analysis with R and Hadoop

Abstract

1.1 Introduction

1.2 A Brief Overview of the Power Grid

1.3 Introduction to MapReduce, Hadoop, and RHIPE

1.4 Power Grid Analytical Approach

1.5 Discussion and Conclusions

Appendix

References

Chapter 2. Picturing Bayesian Classifiers: A Visual Data Mining Approach to Parameters Optimization

Abstract

Acknowledgments

2.1 Introduction

2.2 Related Works

2.3 Motivations and Requirements

2.4 Probabilistic Framework of NB Classifiers

2.5 Two-Dimensional Visualization System

2.6 A Case Study: Text Classification

2.7 Conclusions

References

Chapter 3. Discovery of Emergent Issues and Controversies in Anthropology Using Text Mining, Topic Modeling, and Social Network Analysis of Microblog Content

Abstract

3.1 Introduction

3.2 How Many Messages and How Many Twitter-Users in the Sample?

3.3 Who Is Writing All These Twitter Messages?

3.4 Who Are the Influential Twitter-Users in This Sample?

3.5 What Is the Community Structure of These Twitter-Users?

3.6 What Were Twitter-Users Writing About During the Meeting?

3.7 What Do the Twitter Messages Reveal About the Opinions of Their Authors?

3.8 What Can Be Discovered in the Less Frequently Used Words in the Sample?

3.9 What Are the Topics That Can Be Algorithmically Discovered in This Sample?

3.10 Conclusion

References

Chapter 4. Text Mining and Network Analysis of Digital Libraries in R

Abstract

4.1 Introduction

4.2 Dataset Preparation

4.3 Manipulating the Document-Term Matrix

4.4 Clustering Content by Topics Using the LDA

4.5 Using Similarity Between Documents to Explore Document Cohesion

4.6 Social Network Analysis of Authors

4.7 Conclusion

References

Chapter 5. Recommender Systems in R

Abstract

5.1 Introduction

5.2 Business Case

5.3 Evaluation

5.4 Collaborative Filtering Methods

5.5 Latent Factor Collaborative Filtering

5.6 Simplified Approach

5.7 Roll Your Own

5.8 Final Thoughts

References

Chapter 6. Response Modeling in Direct Marketing: A Data Mining-Based Approach for Target Selection

Abstract

6.1 Introduction/Background

6.2 Business Problem

6.3 Proposed Response Model

6.4 Modeling Detail

6.5 Prediction Result

6.6 Model Evaluation

6.7 Conclusion

References

Chapter 7. Caravan Insurance Customer Profile Modeling with R

Abstract

7.1 Introduction

7.2 Data Description and Initial Exploratory Data Analysis

7.3 Classifier Models of Caravan Insurance Holders

7.4 Discussion of Results and Conclusion

Appendix A Details of the Full Data Set Variables

Appendix B Customer Profile Data-Frequency of Binary Values

Appendix C Proportion of Caravan Insurance Holders vis-à-vis other Customer Profile Variables

Appendix D LR Model Details

Appendix E R Commands for Computation of ROC Curves for Each Model Using Validation Dataset

Appendix F Commands for Cross-Validation Analysis of Classifier Models

References

Chapter 8. Selecting Best Features for Predicting Bank Loan Default

Abstract

8.1 Introduction

8.2 Business Problem

8.3 Data Extraction

8.4 Data Exploration and Preparation

8.5 Missing Imputation

8.6 Modeling

8.7 Model Evaluation

8.8 Finding and Model Deployment

8.9 Lessons and Discussions

Appendix Selecting Best Features for Predicting Bank Loan Default

References

Chapter 9. A Choquet Integral Toolbox and Its Application in Customer Preference Analysis

Abstract

9.1 Introduction

9.2 Background

9.3 Rfmtool Package

9.4 Case Study

9.5 Conclusions

References

Chapter 10. A Real-Time Property Value Index Based on Web Data

Abstract

Acknowledgments

10.1 Introduction

10.2 Housing Prices and Indices

10.3 A Data Mining Approach

10.4 Real Estate Pricing Models

10.5 Conclusion

References

Chapter 11. Predicting Seabed Hardness Using Random Forest in R

Abstract

Acknowledgments

11.1 Introduction

11.2 Study Region and Data Processing

11.3 Dataset Manipulation and Exploratory Analyses

11.4 Application of RF for Predicting Seabed Hardness

11.5 Model Validation Using rfcv

11.6 Optimal Predictive Model

11.7 Application of the Optimal Predictive Model

11.8 Discussion and Conclusions

Appendix AA Dataset of Seabed Hardness and 15 Predictors

Appendix BA R Function, rf.cv, Shows the Cross-Validated Prediction Performance of a Predictive Model

References

Chapter 12. Supervised Classification of Images, Applied to Plankton Samples Using R and Zooimage

Abstract

Acknowledgments

12.1 Background

12.2 Challenges

12.3 Data Extraction and Exploration

12.4 Data Preprocessing

12.5 Modeling

12.6 Model Evaluation

12.7 Model Deployment

12.8 Lessons, Discussion, and Conclusions

References

Chapter 13. Crime Analyses Using R

Abstract

13.1 Introduction

13.2 Problem Definition

13.3 Data Extraction

13.4 Data Exploration and Preprocessing

13.5 Visualizations

13.6 Modeling

13.7 Model Evaluation

13.8 Discussions and Improvements

References

Chapter 14. Football Mining with R

Abstract

Acknowledgments

14.1 Introduction to the Case Study and Organization of the Analysis

14.2 Background of the Analysis: The Italian Football Championship

14.3 Data Extraction and Exploration

14.4 Data Preprocessing

14.5 Model Development: Building Classifiers

14.6 Model Deployment

14.7 Concluding Remarks

References

Chapter 15. Analyzing Internet DNS(SEC) Traffic with R for Resolving Platform Optimization

Abstract

15.1 Introduction

15.2 Data Extraction from PCAP to CSV File

15.3 Data Importation from CSV File to R

15.4 Dimension Reduction Via PCA

15.5 Initial Data Exploration Via Graphs

15.6 Variables Scaling and Samples Selection

15.7 Clustering for Segmenting the FQDN

15.8 Building Routing Table Thanks to Clustering

15.9 Building Routing Table Thanks to Mixed Integer Linear Programming

15.10 Building Routing Table Via a Heuristic

15.11 Final Evaluation

15.12 Conclusion

References

Index

Book details
ISBN: 9780124115118
Page Count: 514
Retail Price : £58.99
  • Nisbet, Miner, Elder; Handbook of Statistical Analysis With Data Mining Applications; 9780123747655; 5/2009; 864 pp; USD 92.95; GBP 56.99; EURO 66.95
  • Ross, Probability Models, 10e, 9780123756862, Dec. 2009, 784 pp, $96.95
Audience

Researchers in academia and industry working in the field of data mining, postgraduate students who are interested in data mining, as well as data miners and analysts from industry. Government agencies, banks, insurance, retail, telecom, medicine and scientific research.