01. A bank wants to build a system that tracks all ATM and online transactions in realtime. They want to build a personalized model of their customer’s financial activity by incorporating enterprise data as well as social media data.
The system must be able to learn and adapt over a period of time. These personalized models will be used for real time promotions as well as for any fraud or crime detections.
Given these requirements, which of the following would recommend?
a) Spark
b) Hadoop
c) Cloudand
d) Netezza
02. Which of the following is a requirement for data retention and archival?
a) A format and storage repository for archived data
b) Public cloud
c) Hosting location
d) Solid-state technology
03. A large Retailer (online and “brick & mortar”) processes data for analyzing marketing campaigns for their loyalty club members. The current process takes weeks for processing only 10% of social data.
What is the most cost effective platform for processing and analyzing campaign results from social data on a daily basis using 100% dataset?
a) Enterprise Data Warehouse
b) BigInsights Open Data Platform
c) High Speed Mainfraime Processing
d) In Memory Computing
04. The AQL query language is the easiest and most flexible tool to pull structured output from which of the following?
a) Hive data structures
b) Unstructured text
c) Hbase schemas
d) JDBC connected relational data marts
05. Which of the following statements is TRUE regarding cloud computing solutions?
a) Cloud security is planned, developed, and layered on top of an application after the application development process is complete
b) Stateless applications are better candidates for cloud services than applications that maintain state
c) Cloud solutions rely on scaling up (vertical) scaling vs. scale out (horizontal) scaling
d) Server virtualization is a requirement in a cloud implementation
06. A telecommunication company needs a Big Data solution that could store and analyze multiple years worth of call detail records (CDRs, aprox. 17 billion events per day) containing switch, billing, and network event data for its millions of subscribers.
Which of the following would you recommend for these requirements?
a) Infosphere DataStage
b) DB2
c) Pure Data System for Analytics
d) SPSS
07. A media company wants to measure the effectiveness of their advertising campaign. Before they release a movie they prepare and run a campaign for promotion.
Based on the response on Twitter and Facebook they want to decide whether or not they should continue a particular campaign.
Which of the following should be selected to meet these requirements?
a) Hadoop
b) Streams
c) Unica
d) Pure Data for Analytics
08. In designing a new Hadoop system for a customer, the option of using SAN versus DAS was brought up. Which of the following would justify choosing SAN storage?
a) SAN storage provides better performance than DAS
b) SAN storage reduces and removes a lot of the HDFS complexity and management issues
c) SAN storage removes the Single Point of Failure for the NameNode
d) SAN storage supports replication, reducing the need for 3-way replication
09. As you explore the data for a BigSheets workbook, you must run the workbook against the full data set to get the most current results for analysis.
Which statement is TRUE regarding running and visualizing data in a workbook?
a) By default, the first sheet in your workbook is named the Results sheet
b) When you save and run the workbook, the data in a Child Workbook is the output for that workbook
c) When you add sheets to workbooks, saving the sheets runs the individual data for the sheet but not for the full workbook
d) You can create graphs for more than one sheet within the same workbook
10. Faced with a wide area network implementation, you have a need for asynchronous remote updates. Which one of the following would best address this use case?
a) GPFS Active File Management allows data access and modifications even when remote storage cluster is unavailable
b) HDFS Cluster rebalancing is compatible with data rebalancing schemes. A scheme might automatically move data from one DataNode to another if the free space on a DataNode falls below a certain threshold
c) GPFS File clones can be created from a regular file or a file in a snapshot using the mmclone command
d) HDFS NameNode The NameNode keeps an image of the entire file system namespace and file Blockmap in memory. This key metadata item is designed to be compact, such that a NameNode with 4 GB of RAM is plenty to support a huge number of files and directories