**Introductory** **Section – first 2 days of the course**

**Day 1**

**Data Analysis Methodology – CRISP-DM **(Cross-Industry Standard Process for Data Mining)

- Business Understanding
- Data understanding
- Data Preparation
- Modeling
- Evaluation
- Deployment

Introduction to Minitab and Excel (History in Minitab & Change settings in Excel)

**Distributions**

- Types of data: Continuous, Attribute (Ordinal, Discrete and categorical)
- Normal Distribution
- Binomial distribution
- Poisson distribution

**Day 2**

**Hypothesis testing – Part I**

When you want to compare averages or medians of some sample of data to decide if they are statistically different.

When you want to compare the standard deviation of some sample of data to decide if their variation is statistically different.

When you want to compare proportions or percentages that came from different samples of data to decide if they are statistically different.

**Principles of Hypothesis testing: **What is it and why and when do we use hypothesis testing
**1 Sample T-Test**: for comparing the averages of one sample against a specific target or historical average
**2 Sample T-Test**: for comparing the averages of two samples against each other.
**One way ANOVA**: for comparing the averages of 3 or more samples against each other
**Pared T-Test:** for comparing the averages of two samples that contain data that is linked in pairs.

**Exploratory data analysis via graphical tools**

- Time series
- Scatter
- Pareto
- Box and Whiskers

**Advanced** **Section – Continued for another 2 days**

**Day 3**

**Measurement System Analysis (MSA)**

This is a technique for understanding the quality of data by challenging its sources and their potential errors.

- Gage R&R (Repeatability and Reproducibility) study for continuous data
- Attribute R&R study for attribute data

**Hypothesis Testing – Part II**

What is it and why and when do we use hypothesis testing

**Levenes Test**: for comparing the standard deviations of 2 or more samples that are not normally distributed

**F-Test**: for comparing the standard deviations of 2 samples that are normally distributed

**Bartletts Test**: for comparing the standard deviations of 3 or more samples that are normally distributed.

**1 Proportion Test**: for comparing a proportion against a specific target or historical proportion

**2 Proportions Test**: for comparing 2 proportions against each other

**Chi-Square Test:** for comparing 3 or more proportions against each other.

**Graphical Tools – Part II**

Histogram: The data is summarised into bars with the most frequent values being represented by the higher bars. The overall shape of the distribution can be assessed.

**Probability Plot:** used to decide if a sample data fits a specific distribution

**Matrix plot:** produces and array of scatter plots

**Box Plot**: Shows the distribution of a sample data as a box and whiskers

**Individual value Plot**: for comparing the distribution of several samples against each other

**Fitted line Plot:** is a scatter plot in which the relationship between the input and the output is represented mathematically by a single line (a regression line) can be linear or curved.

**Statistical Process Control (SPC) **

**I-MR chart** for analysing individual data points of continuous data

**U- Chart:** for analysing the count or defects per unit

**Xbar R Chart**: for analysing the averages of small sub-groups (2 to 5)

**P chart**: for analysing the proportions or percentages

**Xbar S Chart**: for analysing the averages of large sub-groups (more than 6)

**Regression**

- Simple regression: model the relationship between one X variable and a response variable Y
- Multiple Regression: to model the relationship between two to five X variables and a response variable Y
- Optimise response: using multiple regression to model the relationship between two to five X variables and a response variable Y and identify X values that optimise Y.

**Capability Analysis**

**Capability Analysis**: determine whether the process is capable of producing output that meets customer requirements.
**Binomial Capability**: determine whether the % defective meets customer requirements
**Poison Capability**: determine whether the defect rate (DPU) meets customer requirements