Best Free Online Documents, Books and Tutorials about R and Data Mining
If you are looking for Document, Books Or Tutorials about Data Mining and R programming to advance your Knowledge, here is the best list in various formats available for free :
R Programming
- Quick-R
- Computing for Data Analysis (with R): a free online course
YouTube playlists for the videos of the course: week 1; week 2; week 3 and week 4. - Data Analysis (with R): a free online course
- Advanced R, a book for R users who want to improve their programming skills and understanding of the language
- R Reference Card
- Google’s R Style Guide
- R Tips: lots of tips for R programming
- R Tutorial
- Handling and Processing Strings in R: an ebook in PDF format, 105 pages
- The R Manuals, including an Introduction to R, R Language Definition, R Data Import/Export, and other R manuals
- R You Ready?
- R for Beginners
- Econometrics in R
- Using R for Data Analysis and Graphics – Introduction, Examples and Commentary
- Lots of R Contributed Documents, including non-English documents
- The R Journal
- Learn R Toolkit
- Resources to help you learn and use R at UCLA
- R Tutorial – An R Introduction to Statistics
- Cookbook for R
- Slides for a couple of R short courses
- Tips on memory in R
- Slides on building R packages:
http://sites.stat.psu.edu/~dsy109/SOS_Talk.pdf
http://www.hsph.harvard.edu/statinformatics/soft/files/buildingrpackages.pdf - Creating R Packages: A Tutorial
- 60+ R resources to improve your data skills
- R Cheat Sheets by RStudio, including
- Shiny Cheat Sheet
- Data Visualization Cheat Sheet
- Package Development Cheat Sheet
- Data Wrangling Cheat Sheet
- R Markdown Cheat Sheet
- R Markdown Reference Guide
Data Mining
- Introduction to Data Mining by Pang-Ning Tan, Michael Steinbach and Vipin Kumar
Lecture slides (in both PPT and PDF formats) and three sample Chapters on classification, association and clustering available at the above link. - Data Mining – Concepts and Techniques (3rd edition) by Jiawei Han, Micheline Kamber & Jian Pei
Lecture slides in PPT format are provided for 13 chatpers.
- Tutorial on Data Mining Algorithms by Ian Witten
- Mining of Massive Datasets by Anand Rajaraman and Jeff Ullman
The whole book and lecture slides are free and downloadable in PDF format. - Lecture notes of data mining course by Cosma Shalizi at CMU
R code examples are provided in some lecture notes, and also in solutions to home works.
It covers information retrieval, page rank, image search, information theory, categorization, clustering, transformations, principal components, factor analysis, nonlinear dimensionality reduction, regression, classification and regression trees, support vector machines, density estimation, mixture models, causal inference, etc.
- Introduction to Information Retrieval by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze at Stanford University
It covers text classification, clustering, web search, link analysis, etc. The book and lecture slides are free and downloadable in PDF format. - Statistical Data Mining Tutorials by Andrew Moore
Dozens of tutorial slides in PDF format - Tutorial on Spatial and Spatio-Temporal Data Mining
- Tutorial on Discovering Multiple Clustering Solutions
- Open-Source Tools for Data Mining
- An overview of data mining tools
Deep Learning
Decision Trees and Random Forest
Text Mining
- Text Mining Tutorial
It introduces various techniques at different levels of text processing, including word level, sentence level, document level and document-collection level. It covers stemming, stop words, document summarization, visualization, segmentation, categorization and clustering.
- An introduction to text mining by Ian Witten
- Slides for a tutorial on topic modeling by David M. Blei
- A video from a talk on dynamic and correlated topic models
Social Network Analysis and Graph Mining
- Textbook on Introduction to social network methods
- Information Diffusion In Social Networks: Observing and Influencing Societal Interests, a tutorial at VLDB’11
- Tools for large graph mining: structure and diffusion, a tutorial at WWW 2008
- Graph Mining: Laws, Generators and Tools
- Graph-Based User Behavior Modeling: From Prediction to Fraud Detection, a tutorial at KDD 2015
- Automatic Entity Recognition and Typing from Massive Text Corpora: A Phrase and Network Mining Approach, a tutorial at KDD 2015
Association Rules
- Lecture Notes (slides in PDF) on association rules
Association Rules: Basic Concepts and Algorithms
Association Rules: Advanced Concepts and Algorithms - A book chapter on association rules
- A comparison of over 20 interestingness measures for association rules
Outlier Detection
Sentiment Analysis
- A Taste of Sentiment Analysis – 105-page slides in PDF format
- Sentiment Analysis and Subjectivity, a book chapter by Bing Liu
MapReduce
- Data-Intensive Text Processing with MapReduce – a book of 175 pages in PDF format
- Lecture slides of a MapReduce course, which is a part of 2008 Independent Activities Period at MIT
- The paper that first introduced MapReduce in 2004, showing how MapReduce works
Data Mining with R
- An Introduction to Statistical Learning with Applications in R
- Data Mining with R – Learning by Case Studies
- Data Mining Algorithms In R
- One Page R: A Survival Guide to Data Science with R
- Statistics with R
- An Introduction to Data Science (with R)
Classification/Prediction with R
- An Introduction to Recursive Partitioning Using the RPART Routines
- R Functions for Regression Analysis
- Visualizing classifier performance with package ROCR
- Classification and Regression by randomForest in R
- Random forests for categorical dependent variables: an informal quick start R guide
Time Series Analysis with R
- An R Time Series Tutorial
- Time Series Analysis with R
- Using R (with applications in Time Series Analysis)
- Forecasting: Principles and Practice, a free online text book on forecasting with R, by Prof. Rob J Hyndman
- R Functions for Time Series Analysis
Association Rule Mining with R
- Introduction to arules: A computational environment for mining association rules and frequent item sets
- Visualizing Association Rules: Introduction to arulesViz
- Association Rule Algorithms In R
Spatial Data Analysis with R
- Applied Spatio-temporal Data Analysis with FOSS: R+OSGeo
- Spatial Regression Analysis in R – A Workbook, tutorials and worked examples using R and its package spdep for spatial regression analysis
Text Mining with R
- Getting Started with Latent Dirichlet Allocation using RTextTools + topicmodels
- Text Mining Infrastructure in R
- Introduction to the tm Package Text Mining in R
- Text Mining Handbook (with R code examples)
- Distributed Text Mining in R
Social Network Analysis with R
- Slides on Large-scale network analysis (with package igraph)
- R for networks: a short tutorial
- Social Network Analysis in R
- A detailed introduction to Social Network Analysis with package sna
- A statnet Tutorial
- Slides on Social network analysis with R
- Tutorials on using statnet for network analysis
- Package visNetwork: nice interactive visualisation of graph and networks
Data Cleansing and Transformation with R
Case Studies with R
- Experiences with using R in credit risk at ANZ bank, a presentation by Hong Ooi at Melbourne R user group
Big Data, MapReduce and Hadoop with R
- Large Scale Distributed Data Science using Apache Spark, a tutorial at KDD 2015
- Using R with Hadoop
- Getting Started With R and Hadoop
- Tutorial on MapReduce programming in R with package rmr
- A presentation on Distributed Data Analysis with Hadoop and R
- A Video of Wordcount MapReduce in R
- RHIPE Tutorial
- Fetch data from HBASE database from R using rhbase package
Parallel Computing with R
- State of the Art in Parallel Computing with R, provides an excellent overview and comparison of R packages for parallel computing, including packages for computer cluster, grid computing and multi-core systems
- Taking R to the Limit, Part I – Parallelization in R
- Taking R to the Limit, Part II – Large Datasets in R
- Big Data in R
- Massive data, shared and distributed memory,and concurrent programming: bigmemory and foreach
- High Performance Computing with R, by Pragnesh Patel
- High Performance Computing with R, by Drew Schmidt
- High Performance Computing with R, by Dirk Eddelbuettel
- R with High Performance Computing: Parallel processing and large memory
- Parallel Computing in R
- Parallel Computing with R using snow and snowfall
- Interacting with Data using the filehash Package for R
- Tutorial: Parallel computing using R package snowfall
- Easier Parallel Computing in R with snowfall and sfCluster