Skip to content

Latest commit

 

History

History
executable file
·
513 lines (367 loc) · 13.5 KB

hadoop-quiz.md

File metadata and controls

executable file
·
513 lines (367 loc) · 13.5 KB

Hadoop

Q1. Partitioner controls the partitioning of what data?

  • final keys
  • final values
  • intermediate keys
  • intermediate values

Q2. SQL Windowing functions are implemented in Hive using which keywords?

  • UNION DISTINCT, RANK
  • OVER, RANK
  • OVER, EXCEPT
  • UNION DISTINCT, RANK

Q3. Rather than adding a Secondary Sort to a slow Reduce job, it is Hadoop best practice to perform which optimization?

  • Add a partitioned shuffle to the Map job.
  • Add a partitioned shuffle to the Reduce job.
  • Break the Reduce job into multiple, chained Reduce jobs.
  • Break the Reduce job into multiple, chained Map jobs.

Q4. Hadoop Auth enforces authentication on protected resources. Once authentication has been established, it sets what type of authenticating cookie?

  • encrypted HTTP
  • unsigned HTTP
  • compressed HTTP
  • signed HTTP

Q5. MapReduce jobs can be written in which language?

  • Java or Python
  • SQL only
  • SQL or Java
  • Python or SQL

Q6. To perform local aggregation of the intermediate outputs, MapReduce users can optionally specify which object?

  • Reducer
  • Combiner
  • Mapper
  • Counter

Q7. To verify job status, look for the value ___ in the ___.

  • SUCCEEDED; syslog
  • SUCCEEDED; stdout
  • DONE; syslog
  • DONE; stdout

Q8. Which line of code implements a Reducer method in MapReduce 2.0?

  • public void reduce(Text key, Iterator values, Context context){…}
  • public static void reduce(Text key, IntWritable[] values, Context context){…}
  • public static void reduce(Text key, Iterator values, Context context){…}
  • public void reduce(Text key, IntWritable[] values, Context context){…}

Q9. To get the total number of mapped input records in a map job task, you should review the value of which counter?

  • FileInputFormatCounter
  • FileSystemCounter
  • JobCounter
  • TaskCounter (NOT SURE)

Q10. Hadoop Core supports which CAP capabilities?

  • A, P
  • C, A
  • C, P
  • C, A, P

Q11. What are the primary phases of a Reducer?

  • combine, map, and reduce
  • shuffle, sort, and reduce
  • reduce, sort, and combine
  • map, sort, and combine

Q12. To set up Hadoop workflow with synchronization of data between jobs that process tasks both on disk and in memory, use the ___ service, which is ___.

  • Oozie; open source
  • Oozie; commercial software
  • Zookeeper; commercial software
  • Zookeeper; open source

Q13. For high availability, use multiple nodes of which type?

  • data
  • name
  • memory
  • worker

Q14. DataNode supports which type of drives?

  • hot swappable
  • cold swappable
  • warm swappable
  • non-swappable

Q15. Which method is used to implement Spark jobs?

  • on disk of all workers
  • on disk of the master node
  • in memory of the master node
  • in memory of all workers

Q16. In a MapReduce job, where does the map() function run?

  • on the reducer nodes of the cluster
  • on the data nodes of the cluster (NOT SURE)
  • on the master node of the cluster
  • on every node of the cluster

Q17. To reference a master file for lookups during Mapping, what type of cache should be used?

  • distributed cache
  • local cache
  • partitioned cache
  • cluster cache

Q18. Skip bad records provides an option where a certain set of bad input records can be skipped when processing what type of data?

  • cache inputs
  • reducer inputs
  • intermediate values
  • map inputs

Q19. Which command imports data to Hadoop from a MySQL database?

  • spark import --connect jdbc:mysql://mysql.example.com/spark --username spark --warehouse-dir user/hue/oozie/deployments/spark
  • sqoop import --connect jdbc:mysql://mysql.example.com/sqoop --username sqoop --warehouse-dir user/hue/oozie/deployments/sqoop
  • sqoop import --connect jdbc:mysql://mysql.example.com/sqoop --username sqoop --password sqoop --warehouse-dir user/hue/oozie/deployments/sqoop
  • spark import --connect jdbc:mysql://mysql.example.com/spark --username spark --password spark --warehouse-dir user/hue/oozie/deployments/spark

Q20. In what form is Reducer output presented?

  • compressed (NOT SURE)
  • sorted
  • not sorted
  • encrypted

Q21. Which library should be used to unit test MapReduce code?

  • JUnit
  • XUnit
  • MRUnit
  • HadoopUnit

Q22. If you started the NameNode, then which kind of user must you be?

  • hadoop-user
  • super-user
  • node-user
  • admin-user

Q23. State _ between the JVMs in a MapReduce job

Q24. To create a MapReduce job, what should be coded first?

  • a static job() method
  • a Job class and instance (NOT SURE)
  • a job() method
  • a static Job class

Q25. To connect Hadoop to AWS S3, which client should you use?

  • S3A
  • S3N
  • S3
  • the EMR S3

Q26. HBase works with which type of schema enforcement?

  • schema on write
  • no schema
  • external schema
  • schema on read

Q27. HDFS file are of what type?

  • read-write
  • read-only
  • write-only
  • append-only

Q28. A distributed cache file path can originate from what location?

  • hdfs or top
  • http
  • hdfs or http
  • hdfs

Q29. Which library should you use to perform ETL-type MapReduce jobs?

  • Hive
  • Pig
  • Impala
  • Mahout

Q30. What is the output of the Reducer?

  • a relational table
  • an update to the input file
  • a single, combined list
  • a set of <key, value> pairs

map function processes a certain key-value pair and emits a certain number of key-value pairs and the Reduce function processes values grouped by the same key and emits another set of key-value pairs as output.

Q31. To optimize a Mapper, what should you perform first?

  • Override the default Partitioner.
  • Skip bad records.
  • Break up Mappers that do more than one task into multiple Mappers.
  • Combine Mappers that do one task into large Mappers.

Q32. When implemented on a public cloud, with what does Hadoop processing interact?

  • files in object storage
  • graph data in graph databases
  • relational data in managed RDBMS systems
  • JSON data in NoSQL databases

Q33. In the Hadoop system, what administrative mode is used for maintenance?

  • data mode
  • safe mode
  • single-user mode
  • pseudo-distributed mode

Q34. In what format does RecordWriter write an output file?

  • <key, value> pairs
  • keys
  • values
  • <value, key> pairs

Q35. To what does the Mapper map input key/value pairs?

  • an average of keys for values
  • a sum of keys for values
  • a set of intermediate key/value pairs
  • a set of final key/value pairs

Q36. Which Hive query returns the first 1,000 values?

  • SELECT…WHERE value = 1000
  • SELECT … LIMIT 1000
  • SELECT TOP 1000 …
  • SELECT MAX 1000…

Q37. To implement high availability, how many instances of the master node should you configure?

Q38. Hadoop 2.x and later implement which service as the resource coordinator?

  • kubernetes
  • JobManager
  • JobTracker
  • YARN

Q39. In MapReduce, _ have _

  • tasks; jobs
  • jobs; activities
  • jobs; tasks
  • activities; tasks

Q40. What type of software is Hadoop Common?

  • database
  • distributed computing framework
  • operating system
  • productivity tool

Q41. If no reduction is desired, you should set the numbers of _ tasks to zero

  • combiner
  • reduce
  • mapper
  • intermediate

Q42. MapReduce applications use which of these classes to report their statistics?

  • mapper
  • reducer
  • combiner
  • counter

Q43. _ is the query language, and _ is storage for NoSQL on Hadoop

  • HDFS; HQL
  • HQL; HBase
  • HDFS; SQL
  • SQL; HBase

Q44. MapReduce 1.0 _ YARN

  • does not include
  • is the same thing as
  • includes
  • replaces

Q45. Which type of Hadoop node executes file system namespace operations like opening, closing, and renaming files and directories?

  • ControllerNode
  • DataNode
  • MetadataNode
  • NameNode

Q46. HQL queries produce which job types?

  • Impala
  • MapReduce
  • Spark
  • Pig

Q47. Suppose you are trying to finish a Pig script that converts text in the input string to uppercase. What code is needed on line 2 below?

1 data = LOAD '/user/hue/pig/examples/data/midsummer.txt'... 2

  • as (text:CHAR[]); upper_case = FOREACH data GENERATE org.apache.pig.piggybank.evaluation.string.UPPER(TEXT);
  • as (text:CHARARRAY); upper_case = FOREACH data GENERATE org.apache.pig.piggybank.evaluation.string.UPPER(TEXT);
  • as (text:CHAR[]); upper_case = FOREACH data org.apache.pig.piggybank.evaluation.string.UPPER(TEXT);
  • as (text:CHARARRAY); upper_case = FOREACH data org.apache.pig.piggybank.evaluation.string.UPPER(TEXT);

Q48. In a MapReduce job, which phase runs after the Map phase completes?

  • Combiner
  • Reducer
  • Map2
  • Shuffle and Sort

Q49. Where would you configure the size of a block in a Hadoop environment?

  • dfs.block.size in hdfs-site.xmls
  • orc.write.variable.length.blocks in hive-default.xml
  • mapreduce.job.ubertask.maxbytes in mapred-site.xml
  • hdfs.block.size in hdfs-site.xml

Q50. Hadoop systems are _ RDBMS systems.

  • replacements for
  • not used with
  • substitutes for
  • additions for

Q51. Which object can be used to distribute jars or libraries for use in MapReduce tasks?

  • distributed cache
  • library manager
  • lookup store
  • registry

Q52. To view the execution details of an Impala query plan, which function would you use ?

  • explain
  • query action
  • detail
  • query plan

Q53. Which feature is used to roll back a corrupted HDFS instance to a previously known good point in time?

  • partitioning
  • snapshot
  • replication
  • high availability

Reference

Q54. Hadoop Common is written in which language?

  • C++
  • C
  • Haskell
  • Java

Q55. Which file system does Hadoop use for storage?

  • NAS
  • FAT
  • HDFS
  • NFS

Q56. What kind of storage and processing does Hadoop support?

  • encrypted
  • verified
  • distributed
  • remote

Q57. Hadoop Common consists of which components?

  • Spark and YARN
  • HDFS and MapReduce
  • HDFS and S3
  • Spark and MapReduce

Q58. Most Apache Hadoop committers' work is done at which commercial company?

  • Cloudera
  • Microsoft
  • Google
  • Amazon

Q59. To get information about Reducer job runs, which object should be added?

  • Reporter
  • IntReadable
  • IntWritable
  • Writer

Q60. After changing the default block size and restarting the cluster, to which data does the new size apply?

  • all data
  • no data
  • existing data
  • new data

Q61. Which statement should you add to improve the performance of the following query?

SELECT
  c.id,
  c.name,
  c.email_preferences.categories.surveys
FROM customers c;
  • GROUP BY
  • FILTER
  • SUB-SELECT
  • SORT

Q62. What custom object should you implement to reduce IO in MapReduce?

  • Comparator
  • Mapper
  • Combiner
  • Reducer

Q63. You can optimize Hive queries using which method?

  • secondary indices
  • summary statistics
  • column-based statistics
  • a primary key index

Q64. If you are processing a single action on each input, what type of job should you create?

  • partition-only
  • map-only
  • reduce-only
  • combine-only

Q65. The simplest possible MapReduce job optimization is to perform which of these actions?

  • Add more master nodes.
  • Implement optimized InputSplits.
  • Add more DataNodes.
  • Implement a custom Mapper.

Q66. When you implement a custom Writable, you must also define which of these object?

  • a sort policy
  • a combiner policy
  • a compression policy
  • a filter policy

Q67. To copy a file into the Hadoop file system, what command should you use?

  • hadoop fs -copy
  • hadoop fs -copy
  • hadoop fs -copyFromLocal
  • hadoop fs -copyFromLocal

Q68. Delete a Hive _ table and you will delete the table _.

  • managed; metadata
  • external; data and metadata
  • external; metadata
  • managed; data

Q69. To see how Hive executed a JOIN operation, use the _ statement and look for the _ value.

  • EXPLAIN; JOIN Operator
  • QUERY; MAP JOIN Operator
  • EXPLAIN; MAP JOIN Operator
  • QUERY; JOIN Operator

Q70. Pig operates in mainly how many nodes?

  • Two
  • Three
  • Four
  • Five

Q71. After loading data, _ and then run a(n) _ query for interactive queries.

  • invalidate metadata; Impala
  • validate metadata; Impala
  • invalidate metadata; Hive
  • validate metadata; Hive