Data Management  50% 
Navigate within the Data Management Studio Interface

Register a new QKB

Create and connect to a repository

Define a data connection

Specify Data Management Studio options

Access the QKB

Create a name value macro pair

Access the business rules manager

Access the appropriate monitoring report

Attach and detach primary tabs
Create, design and be able to explore data explorations and interpret results
Define and create data collections from exploration results
Create and explore a data profile

Create a data profile from different sources (text file, filtered table, SQL query)

Interpret results (frequency distribution & pattern)

Use collections from profile results
Design data standardization schemes

Build a scheme from profile results

Build a scheme manually

Update existing schemes
Create Data Jobs

Rename output fields

Add nodes and preview nodes

Run a data job

View a log and settings

Work with data job settings and data job displays

Best practices (how do you ensure that you are following a particular best practice): examples: insert notes, establish naming conventions

Work with branching

Join tables

Apply the Field layout node to control field order

Work with the Data Validation node:

Add it to the job flow

Specify properties/review properties

Edit settings for the Data Validation node

Work with data inputs

Work with data outputs

Profile data from within data jobs

Interact with the Repository from within Data Jobs

Determine how data is processed

Data job variables

Set Sorting properties for the Data Sorting node

Set appropriate advanced properties options for the Data Sorting Node
Apply a Standardization definition and scheme

Use a definition

Use a scheme

Be able to determine the differences between definition and scheme

Explain what happens when you use both a definition and scheme

Review and interpret standardization results

Be able to explain the different steps involved in the process of standardization
Apply Parsing definitions

Distinguish between different data types and their tokens

Review and interpret parsing results

Be able to explain the different steps involved in the process of parsing

Use parsing definition
Compare and contrast the differences between identification analysis and right fielding nodes

Review results

Explain the technique used for identification (process of the definition)
Apply the Gender Analysis node to determine gender

Use gender definition

Interpret results

Explain different techniques for accomplishing gender analysis
Create an Entity Resolution Job

Use a node in the data job that is the clustering node and explain why you would want to use it

Survivorship (surviving record identification)

Record rules

Field rules

Options for survivorship

Discuss and apply the Cluster Diff node

Apply Crossfield matching (new option)
Use the Match Codes Node to select match definitions for selected fields

Outline the various uses for match codes (join)

Use the definition

Interpret the results

Match versus match parsed

Explain the process for creating a match code

Select sensitivity for a selected match definition

Apply matching best practices
Define and create business rules

Use Business Rules Manager

Create a new business rule

Name/label rule

Specify type of rule

Define checks

Specify fields

Distinguish between different types of business rules

Apply business rules

Profile

Execute business rule node

Use of Expression Builder

Apply best practices
Describe the organization, structure and basic navigation of the QKB

Identify and describe locale levels (global, language, country)

Navigate the QKB (tab structure, copy definitions, etc.)

Identify data types and tokens
Be able to articulate when to use the various components of the QKB

Components include:

Regular expressions

Schemes

Phonetics library

Vocabularies

Grammar

Chop Tables
Define the processing steps and components used in the different definition types

Identify/describe the different definition types

Parsing

Standardization

Match

Identification

Casing

Extraction

Locale guess

Gender

Patterns

ANOVA and Regression  30% 
Verify the assumptions of ANOVA

Explain the central limit theorem and when it must be applied

Examine the distribution of continuous variables (histogram, boxwhisker, QQ plots)

Describe the effect of skewness on the normal distribution

Define H0, H1, Type I/II error, statistical power, pvalue

Describe the effect of sample size on pvalue and power

Interpret the results of hypothesis testing

Interpret histograms and normal probability charts

Draw conclusions about your data from histogram, boxwhisker, and QQ plots

Identify the kinds of problems may be present in the data: (biased sample, outliers, extreme values)

For a given experiment, verify that the observations are independent

For a given experiment, verify the errors are normally distributed

Use the UNIVARIATE procedure to examine residuals

For a given experiment, verify all groups have equal response variance

Use the HOVTEST option of MEANS statement in PROC GLM to asses response variance
Analyze differences between population means using the GLM and TTEST procedures

Use the GLM Procedure to perform ANOVA

CLASS statement

MODEL statement

MEANS statement

OUTPUT statement

Evaluate the null hypothesis using the output of the GLM procedure

Interpret the statistical output of the GLM procedure (variance derived from MSE, F value, pvalue R**2, Levene's test)

Interpret the graphical output of the GLM procedure

Use the TTEST Procedure to compare means
Perform ANOVA post hoc test to evaluate treatment affect

use the LSMEANS statement in the GLM or PLM procedure to perform pairwise comparisons

use PDIFF option of LSMEANS statement

use ADJUST option of the LSMEANS statement (TUKEY and DUNNETT)

Interpret diffograms to evaluate pairwise comparisons

Interpret control plots to evaluate pairwise comparisons

Compare/Contrast use of pairwise TTests, Tukey and Dunnett comparison methods

PLM
Detect and analyze interactions between factors

Use the GLM procedure to produce reports that will help determine the significance of the interaction between factors.

MODEL statement

LSMEANS with SLICE=option (Also using PROC PLM)

ODS SELECT

Interpret the output of the GLM procedure to identify interaction between factors:

pvalue

F Value

R Squared

TYPE I SS

TYPE III SS
Fit a multiple linear regression model using the REG and GLM procedures

Use the REG procedure to fit a multiple linear regression model

Use the GLM procedure to fit a multiple linear regression model
Analyze the output of the REG, PLM, and GLM procedures for multiple linear regression models

Interpret REG or GLM procedure output for a multiple linear regression model: convert models to algebraic expressions

Convert models to algebraic expressions

Identify missing degrees of freedom

Identify variance due to model/error, and total variance

Calculate a missing F value

Identify variable with largest impact to model

For output from two models, identify which model is better

Identify how much of the variation in the dependent variable is explained by the model

Conclusions that can be drawn from REG, GLM, or PLM output: (about H0, model quality, graphics)
Use the REG or GLMSELECT procedure to perform model selection

Use the SELECTION option of the model statement in the GLMSELECT procedure

Compare the different model selection methods (STEPWISE, FORWARD, BACKWARD)

Enable ODS graphics to display graphs from the REG or GLMSELECT procedure

Identify best models by examining the graphical output (fit criterion from the REG or GLMSELECT procedure)

Assign names to models in the REG procedure (multiple model statements)
Assess the validity of a given regression model through the use of diagnostic and residual analysis

Explain the assumptions for linear regression

From a set of residuals plots, asses which assumption about the error terms has been violated

Use REG procedure MODEL statement options to identify influential observations (Student Residuals, Cook's D, DFFITS, DFBETAS)

Explain options for handling influential observations

Identify colinearity problems by examining REG procedure output

Use MODEL statement options to diagnose collinearity problems (VIF, COLLIN, COLLINOINT)
Perform logistic regression with the LOGISTIC procedure

Identify experiments that require analysis via logistic regression

Identify logistic regression assumptions

logistic regression concepts (log odds, logit transformation, sigmoidal relationship between p and X)

Use the LOGISTIC procedure to fit a binary logistic regression model (MODEL and CLASS statements)
Optimize model performance through input selection

Use the LOGISTIC procedure to fit a multiple logistic regression model

LOGISCTIC procedure SELECTION=SCORE option

Perform Model Selection (STEPWISE, FORWARD, BACKWARD) within the LOGISTIC procedure
Interpret the output of the LOGISTIC procedure

Interpret the output from the LOGISTIC procedure for binary logistic regression models:

Model Convergence section

Testing Global Null Hypothesis table

Type 3 Analysis of Effects table

Analysis of Maximum Likelihood Estimates table

Association of Predicted Probabilities and Observed Responses

Visual Data Exploration  20% 
Examine, modify, and create data items

Create and use parameterized data items

Examine data item properties and measure details

Change data item properties

Create custom sorts

Create distinct counts

Create aggregated measures

Create calculated items

Create hierarchies

Create custom categories
Select and work with data sources

Work with multiple data sources

Change data sources

Refresh data sources
Create, modify, and interpret automatic chart visualizations in Visual Analytics Explorer

Identify default visualizations

Identify the properties available in an automatic chart
Create, modify, and interpret graph and table visualizations in Visual Analytics Explorer

Work with list table visualizations

Work with crosstab visualizations

Work with bar chart visualizations

Work with line chart visualizations

Work with scatter plot visualizations

Work with bubble plot visualizations

Work with histogram visualizations

Work with box plot visualizations

Work with heat map visualizations

Work with geo map visualizations

Work with treemap visualizations

Work with correlation matrix visualizations
Enhance visualizations with analytics within Visual Analytics Explorer

Add fit lines to visualizations

Create forecasts

Interpret word clouds
Interact with visualizations and explorations within Visual Analytics Explorer

Control appearance of visualizations within explorations

Add comments to visualizations and explorations

Use filters on data source and visualizations

Share explorations

Share visualizations
