Compute statistics related to the location of the process (center of the distribution), such as the mean and the median. The median is not sensitive to extreme values and should be considered when processes naturally produce extreme values.

Compute statistics related to the spread of the process, such as the range, standard deviation, and variance.

Compute confidence intervals around descriptive statistics to quantify the uncertainty in them. Smaller samples have larger uncertainty.

In the Analyze phase, it is important to use tools and methods in the proper sequence. For example, graphical tools should always be used before statistical tools. Likewise, testing for normality and stability should always be done before hypothesis testing. This guide provides a general sequence in terms of tool usage during the analysis of data.

BOX PLOT: Estimate the process distribution's shape, center, upper tip, lower tip, and outliers. The box plot is most useful when multiple processes need to be compared.

HISTOGRAM: Same as the box plot. Graphic is more intuitively useful.

MOVING AVERAGES CHART: View data over time and damp out data volatility.

PARETO CHART: Categorize and prioritize things.

RUN CHART: Looks for nonrandom patterns of behavior.

SCATTER PLOT: Shows the strength of relationship between two variables.

TREND CHART: View data over time.

Use control charts to determine if the process is stable over time.

Can determine whether improvement opportunity rests with special and/or common cause variation. If special cause variation is the primary issue, work with the process owners. If common cause variation is the primary issue, work with management.

Stability is also a prerequisite for testing normality, standard capability calculations, and hypothesis testing.

Use tests of normality to determine if the process data is normally distributed.

Use graphical tools such as histograms and box plots for a visual indication. Use statistical tools such as Anderson-Darling and the moments test for skewness and kurtosis.

Normality is needed for X and Rm control charts as well as many statistical tests. If non-normal, do not use X and Rm charts without transforming data. Do use nonparametric tests for comparing multiple processes.

Compute Cp and Cpk to determine how capable the process is of producing product/service that meets the customer's requirements.

Standard calculations for Cp and Cpk assume a stable, normally distributed process.

Compare Cp to Cpk to determine opportunity for improvement from centering the process vs. reducing common cause variation.

Compare Cpk to Ppk to determine opportunity from "instability" in the process.

Use hypothesis tests to answer questions such as:

- Are some processes better than others?
- Statistically, how many processes are there?
- Did the process meet a goal/target?

All hypothesis tests assume the data comes from stable processes.

If all processes are normally distributed, use parametric tests such as the t-test and ANOVA. If one or more processes is/are non-normal, use nonparametric tests such as Mood's median test and the Sign test.

Use correlation to determine the strength of relationship between variables and learn which inputs are the key drivers of the process output.

Correlation can be seen graphically by creating scatter plots of each input variable vs. the output variable.

Determine strength of relationship statistically by computing the correlation coefficient (R) and the
coefficient of determination (R^{2})

Use regression to develop a mathematical model of the process so that it can be optimized.

Begin with the development of linear models. Stepwise regression can be used to determine which variables
are statistically significant. If linear models do not return a sufficiently high R^{2}, consider:

- The use of second- or third-order models
- Introducing interactions

Use Design of Experiments to develop a mathematical model of the process so that it can be optimized.

Generally, DOE is used only when regression does not sufficiently address optimization and/or the cost of DOE is relatively low. DOE, when properly run, has less noise in the data, resulting in more robust and accurate models relative to regression.

Use sequential design for maximum efficiency.