01. In the Map Reduce framework, what is the purpose of the Reduce function?
a) It aggregates the results of the Map function and generates processed output
b) It distributes the input to multiple nodes for processing
c) It writes the output of the Map function to storage
d) It breaks the input into smaller components and distributes to other nodes in the cluster
02. What is an example of a null hypothesis?
a) that a newly created model provides a prediction of a null sample mean
b) that a newly created model provides a prediction of a null population mean
c) that a newly created model does not provide better predictions than the currently existing model
d) that a newly created model provides a prediction that will be well fit to the null distribution
03. You submit a Map Reduce job to a Hadoop cluster. However, you notice that although the job was successfully submitted, it is not completing.
What should be done to identify the issue?
a) Ensure DataNode is running
b) Ensure NameNode is running
c) Ensure JobTracker is running
d) Ensure TaskTracker is running
04. How are window functions different from regular aggregate functions?
a) Rows retain their separate identities and the window function can access more than the current row.
b) Rows are grouped into an output row and the window function can access more than the current row.
c) Rows retain their separate identities and the window function can only access the current row.
d) Rows are grouped into an output row and the window function can only access the current row.
05. Your colleague, who is new to Hadoop, approaches you with a question. They want to know how best to access their data. This colleague has a strong background in data flow languages and programming.
Which query interface would you recommend?
a) Hive
b) Pig
c) HBase
d) Howl
06. Before you build an ARMA model, how can you tell if your time series is weakly stationary?
a) The mean of the series is close to 0.
b) There appears to be a constant variance around a constant mean.
c) The series is normally distributed.
d) There appears to be no apparent trend component.
07. You submit a MapReduce job to a Hadoop cluster and notice that although the job was successfully submitted, it is not completing.
What should you do?
a) Ensure that the TaskTracker is running.
b) Ensure that the JobTracker is running
c) Ensure that the NameNode is running
d) Ensure that a DataNode is running
08. How does Pig’s use of a schema differ from that of a traditional RDBMS?
a) Pig's schema requires that the data is physically present when the schema is defined
b) Pig's schema supports a single data type
c) Pig's schema is optional
d) Pig's schema is required for ETL
09. What is the primary function of the NameNode in Hadoop?
a) Keeps track of which MapReduce jobs have been assigned to each TaskTracker
b) Monitors the state of each JobTracker node and signals an event if unavailable
c) Runs some number of mapping tasks against its assigned data
d) Acts as a regulator/resolver among clients and DataNodes
10. For which class of problem is Map Reduce most suitable?
a) Embarrassingly parallel
b) Minimal result data
c) Simple marginalization tasks
d) Non-overlapping queries