お問い合わせを送信いただきありがとうございます!当社のスタッフがすぐにご連絡いたします。
予約を送信いただきありがとうございます!当社のスタッフがすぐにご連絡いたします。
コース概要
Introduction to Data Science for Big Data Analytics
- Data Science Overview
- Big Data Overview
- Data Structures
- Drivers and complexities of Big Data
- Big Data ecosystem and a new approach to analytics
- Key technologies in Big Data
- Data Mining process and problems
- Association Pattern Mining
- Data Clustering
- Outlier Detection
- Data Classification
Introduction to Data Analytics lifecycle
- Discovery
- Data preparation
- Model planning
- Model building
- Presentation/Communication of results
- Operationalization
- Exercise: Case study
From this point most of the training time (80%) will be spent on examples and exercises in R and related big data technology.
Getting started with R
- Installing R and Rstudio
- Features of R language
- Objects in R
- Data in R
- Data manipulation
- Big data issues
- Exercises
Getting started with Hadoop
- Installing Hadoop
- Understanding Hadoop modes
- HDFS
- MapReduce architecture
- Hadoop related projects overview
- Writing programs in Hadoop MapReduce
- Exercises
Integrating R and Hadoop with RHadoop
- Components of RHadoop
- Installing RHadoop and connecting with Hadoop
- The architecture of RHadoop
- Hadoop streaming with R
- Data analytics problem solving with RHadoop
- Exercises
Pre-processing and preparing data
- Data preparation steps
- Feature extraction
- Data cleaning
- Data integration and transformation
- Data reduction – sampling, feature subset selection,
- Dimensionality reduction
- Discretization and binning
- Exercises and Case study
Exploratory data analytic methods in R
- Descriptive statistics
- Exploratory data analysis
- Visualization – preliminary steps
- Visualizing single variable
- Examining multiple variables
- Statistical methods for evaluation
- Hypothesis testing
- Exercises and Case study
Data Visualizations
- Basic visualizations in R
- Packages for data visualization ggplot2, lattice, plotly, lattice
- Formatting plots in R
- Advanced graphs
- Exercises
Regression (Estimating future values)
- Linear regression
- Use cases
- Model description
- Diagnostics
- Problems with linear regression
- Shrinkage methods, ridge regression, the lasso
- Generalizations and nonlinearity
- Regression splines
- Local polynomial regression
- Generalized additive models
- Regression with RHadoop
- Exercises and Case study
Classification
- The classification related problems
- Bayesian refresher
- Naïve Bayes
- Logistic regression
- K-nearest neighbors
- Decision trees algorithm
- Neural networks
- Support vector machines
- Diagnostics of classifiers
- Comparison of classification methods
- Scalable classification algorithms
- Exercises and Case study
Assessing model performance and selection
- Bias, Variance and model complexity
- Accuracy vs Interpretability
- Evaluating classifiers
- Measures of model/algorithm performance
- Hold-out method of validation
- Cross-validation
- Tuning machine learning algorithms with caret package
- Visualizing model performance with Profit ROC and Lift curves
Ensemble Methods
- Bagging
- Random Forests
- Boosting
- Gradient boosting
- Exercises and Case study
Support vector machines for classification and regression
- Maximal Margin classifiers
- Support vector classifiers
- Support vector machines
- SVM’s for classification problems
- SVM’s for regression problems
- Exercises and Case study
Identifying unknown groupings within a data set
- Feature Selection for Clustering
- Representative based algorithms: k-means, k-medoids
- Hierarchical algorithms: agglomerative and divisive methods
- Probabilistic base algorithms: EM
- Density based algorithms: DBSCAN, DENCLUE
- Cluster validation
- Advanced clustering concepts
- Clustering with RHadoop
- Exercises and Case study
Discovering connections with Link Analysis
- Link analysis concepts
- Metrics for analyzing networks
- The Pagerank algorithm
- Hyperlink-Induced Topic Search
- Link Prediction
- Exercises and Case study
Association Pattern Mining
- Frequent Pattern Mining Model
- Scalability issues in frequent pattern mining
- Brute Force algorithms
- Apriori algorithm
- The FP growth approach
- Evaluation of Candidate Rules
- Applications of Association Rules
- Validation and Testing
- Diagnostics
- Association rules with R and Hadoop
- Exercises and Case study
Constructing recommendation engines
- Understanding recommender systems
- Data mining techniques used in recommender systems
- Recommender systems with recommenderlab package
- Evaluating the recommender systems
- Recommendations with RHadoop
- Exercise: Building recommendation engine
Text analysis
- Text analysis steps
- Collecting raw text
- Bag of words
- Term Frequency –Inverse Document Frequency
- Determining Sentiments
- Exercises and Case study
35 時間
お客様の声 (2)
Intensity, Training materials and expertise, Clarity, Excellent communication with Alessandra
Marija Hornis Dmitrovic - Marija Hornis
コース - Data Science for Big Data Analytics
The example and training material were sufficient and made it easy to understand what you are doing.