# EMC Data Science Associate (E20-007) Certification Exam Sample Questions

Here are the sample questions which will help you be familiar with EMC Data Science Associate (EMCDSA) (E20-007) exam style and structure. We encourage you to try our Demo EMC Data Science Associate Certification Practice Exam to measure your understanding of exam structure in an environment which simulates the EMC Data Science and Big Data Analytics Certification test environment.

To make your preparation more easy for EMC Data Science and Big Data Analytics (E20-007) exam, we strongly recommend you to use our Premium EMC Data Science Associate Certification Practice Exam. According to our survey with certified candidates, you can easily score 85% in your actual EMC Certification exam if you can score 100% in our premium EMC Data Science Associate Certification practice exams.

## EMC E20-007 Sample Questions:

Q1: Your organization has a website where visitors randomly receive one of two coupons. It is also possible that visitors to the website will not receive a coupon.

You have been asked to determine if offering a coupon to visitors to your website has any impact on their purchase decision. Which analysis method should you use?

Options:
A. One-way ANOVA

B. K-means clustering

C. Association rules

D. Student T-test

Q2: Your customer provided you with 2, 000 unlabeled records and asked you to separate them into three groups. What is the correct analytical method to use?

Options:
A. K-means clustering

B. Naive Bayesian classification

C. Linear regression

D. Logistic regression

Q3: Your company has 3 different sales teams. Each team's sales manager has developed incentive offers to increase the size of each sales transaction.

Any sales manager whose incentive program can be shown to increase the size of the average sales transaction will receive a bonus.

Data are available for the number and average sale amount for transactions offering one of the incentives as well as transactions offering no incentive.

The VP of Sales has asked you to determine analytically if any of the incentive programs has resulted in a demonstrable increase in the average sale amount.

Which analytical technique would be appropriate in this situation?

Options:
A. Wilcox son Rank Sum Test

B. Student's t-test

C. One-way ANOVA

D. Multi-way ANOVA

Q4: Your colleague, who is new to Hadoop, approaches you with a question. They want to know how best to access their data. This colleague has a strong background in data flow languages and programming. Which query interface would you recommend?

Options:
A. Howl

B. Pig

C. Hive

D. HBase

Q5: Your colleague, who is new to Hadoop, approaches you with a question. They want to know how best to access their data.

This colleague has previously worked extensively with SQL and databases. Which query interface would you recommend?

Options:
A. Howl

B. Hive

C. HBase

D. Pig

Q6: You submit a MapReduce job to a Hadoop cluster and notice that although the job was successfully submitted, it is not completing. What should you do?

Options:
A. Ensure that the TaskTracker is running.

B. Ensure that the JobTracker is running

C. Ensure that the NameNode is running

D. Ensure that a DataNode is running

Q7: You submit a Map Reduce job to a Hadoop cluster. However, you notice that although the job was successfully submitted, it is not completing. What should be done to identify the issue?

Options:
A. Ensure NameNode is running

B. Ensure DataNode is running

D. Ensure JobTracker is running

Q8: You have used k-means clustering to classify behavior of 100, 000 customers for a retail store. You decide to use household income, age, gender and yearly purchase amount as measures.

You have chosen to use 8 clusters and notice that 2 clusters only have 3 customers assigned. What should you do?

Options:
A. Decrease the number of measures used

B. Decrease the number of clusters

C. Increase the number of clusters

Q9: You have two tables of customers in your database. Customers in cust_table_1 were sent an e-mail promotion last year, and customers in cust_table_2 received a newsletter last year.

Customers can only be entered in once per table. You want to create a table that includes all customers, and any of the communications they received last year. Which type of join would you use for this table?

Options:
A. Full outer join

B. Left outer join

C. Inner join

D. Cross join

Q10 : You have run the association rules algorithm on your data set, and the two rules {banana, apple} => {grape} and {apple, orange}=> {grape} have been found to be relevant. What else must be true?

Options:
A. {banana, apple, grape, orange} must be a frequent itemset.

B. {banana, apple} => {orange} must be a relevant rule.

C. {grape} => {banana, apple} must be a relevant rule.

D. {grape, apple, orange} must be a frequent itemset.