Spring Sale Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: buysanta

Exact2Pass Menu

CCA Spark and Hadoop Developer Exam

Last Update 22 hours ago Total Questions : 96

The CCA Spark and Hadoop Developer Exam content is now fully updated, with all current exam questions added 22 hours ago. Deciding to include CCA175 practice exam questions in your study plan goes far beyond basic test preparation.

You'll find that our CCA175 exam questions frequently feature detailed scenarios and practical problem-solving exercises that directly mirror industry challenges. Engaging with these CCA175 sample sets allows you to effectively manage your time and pace yourself, giving you the ability to finish any CCA Spark and Hadoop Developer Exam practice test comfortably within the allotted time.

Question # 1

Problem Scenario 76 : You have been given MySQL DB with following details.

user=retail_dba

password=cloudera

database=retail_db

table=retail_db.orders

table=retail_db.order_items

jdbc URL = jdbc:mysql://quickstart:3306/retail_db

Columns of order table : (orderid , order_date , ordercustomerid, order_status}

.....

Please accomplish following activities.

1. Copy " retail_db.orders " table to hdfs in a directory p91_orders.

2. Once data is copied to hdfs, using pyspark calculate the number of order for each status.

3. Use all the following methods to calculate the number of order for each status. (You need to know all these functions and its behavior for real exam)

- countByKey()

-groupByKey()

- reduceByKey()

-aggregateByKey()

- combineByKey()

Question # 2

Problem Scenario 18 : You have been given following mysql database details as well as other info.

user=retail_dba

password=cloudera

database=retail_db

jdbc URL = jdbc:mysql://quickstart:3306/retail_db

Now accomplish following activities.

1. Create mysql table as below.

mysql --user=retail_dba -password=cloudera

use retail_db

CREATE TABLE IF NOT EXISTS departments_hive02(id int, department_name varchar(45), avg_salary int);

show tables;

2. Now export data from hive table departments_hive01 in departments_hive02. While exporting, please note following. wherever there is a empty string it should be loaded as a null value in mysql.

wherever there is -999 value for int field, it should be created as null value.

Question # 3

Problem Scenario 23 : You have been given log generating service as below.

Start _l ogs (It will generate continuous logs)

Tail_l ogs (You can check , what logs are being generated)

Stop _l ogs (It will stop the log service)

Path where logs are generated using above service : /opt/gen_logs/logs/access.log

Now write a flume configuration file named flume3.conf , using that configuration file dumps logs in HDFS file system in a directory called flumeflume3/%Y/%m/%d/%H/%M

Means every minute new directory should be created). Please us the interceptors to provide timestamp information, if message header does not have header info.

And also note that you have to preserve existing timestamp, if message contains it. Flume channel should have following property as well. After every 100 message it should be committed, use non-durable/faster channel and it should be able to hold maximum 1000 events.

Question # 4

Problem Scenario 91 : You have been given data in json format as below.

{ " first_name " : " Ankit " , " last_name " : " Jain " }

{ " first_name " : " Amir " , " last_name " : " Khan " }

{ " first_name " : " Rajesh " , " last_name " : " Khanna " }

{ " first_name " : " Priynka " , " last_name " : " Chopra " }

{ " first_name " : " Kareena " , " last_name " : " Kapoor " }

{ " first_name " : " Lokesh " , " last_name " : " Yadav " }

Do the following activity

1. create employee.json tile locally.

2. Load this tile on hdfs

3. Register this data as a temp table in Spark using Python.

4. Write select query and print this data.

5. Now save back this selected data in json format.

Question # 5

Problem Scenario 34 : You have given a file named spark6/user.csv.

Data is given below:

user.csv

id,topic,hits

Rahul,scala,120

Nikita,spark,80

Mithun,spark,1

myself,cca175,180

Now write a Spark code in scala which will remove the header part and create RDD of values as below, for all rows. And also if id is myself " than filter out row.

Map(id - > om, topic - > scala, hits - > 120)

Question # 6

Problem Scenario 2 :

There is a parent organization called " ABC Group Inc " , which has two child companies named Tech Inc and MPTech.

Both companies employee information is given in two separate text file as below. Please do the following activity for employee details.

Tech Inc .txt

1,Alok,Hyderabad

2,Krish,Hongkong

3,Jyoti,Mumbai

4,Atul,Banglore

5,Ishan,Gurgaon

MPTech .txt

6,John,Newyork

7,alp2004,California

8,tellme,Mumbai

9,Gagan21,Pune

10,Mukesh,Chennai

1. Which command will you use to check all the available command line options on HDFS and How will you get the Help for individual command.

2. Create a new Empty Directory named Employee using Command line. And also create an empty file named in it Techinc.txt

3. Load both companies Employee data in Employee directory (How to override existing file in HDFS).

4. Merge both the Employees data in a Single tile called MergedEmployee.txt, merged tiles should have new line character at the end of each file content.

5. Upload merged file on HDFS and change the file permission on HDFS merged file, so that owner and group me mber can read and write, other user can read the file.

6. Write a command to export the individual file as well as entire directory from HDFS to local file System.

Question # 7

Problem Scenario 17 : You have been given following mysql database details as well as other info.

user=retail_dba

password=cloudera

database=retail_db

jdbc URL = jdbc:mysql://quickstart:3306/retail_db

Please accomplish below assignment.

1. Create a table in hive as below, create table departments_hiveOl(department_id int, department_name string, avg_salary int);

2. Create another table in mysql using below statement CREATE TABLE IF NOT EXISTS departments_hive01(id int, department_name varchar(45), avg_salary int);

3. Copy all the data from departments table to departments_hive01 using insert into departments_hive01 select a.*, null from departments a;

Also insert following records as below

insert into departments_hive01 values(777, " Not known " ,1000);

insert into departments_hive01 values(8888, null,1000);

insert into departments_hive01 values(666, null,1100);

4. Now import data from mysql table departments_hive01 to this hive table. Please make sure that data should be visible using below hive command. Also, while importing if null value found for department_name column replace it with " " (empty string) and for id column with -999 select * from departments_hive;

Question # 8

Problem Scenario 9 : You have been given following mysql database details as well as other info.

user=retail_dba

password=cloudera

database=retail_db

jdbc URL = jdbc:mysql://quickstart:3306/retail_db

Please accomplish following.

1. Import departments table in a directory.

2. Again import departments table same directory (However, directory already exist hence it should not overrride and append the results)

3. Also make sure your results fields are terminated by ' | ' and lines terminated by ' \n\

Question # 9

Problem Scenario 55 : You have been given below code snippet.

val pairRDDI = sc.parallelize(List( ( " cat " ,2), ( " cat " , 5), ( " book " , 4),( " cat " , 12))) val pairRDD2 = sc.parallelize(List( ( " cat " ,2), ( " cup " , 5), ( " mouse " , 4),( " cat " , 12)))

operation 1

Write a correct code snippet for operationl which will produce desired output, shown below.

Array[(String, (Option[lnt], Option[lnt] ))] = Array((book,(Some(4},None)), (mouse,(None,Some(4))), (cup,(None,Some(5))), (cat,(Some(2),Some(2)) , (cat,(Some(2),Some(12))), (cat,(Some(5),Some(2))), (cat,(Some(5),Some(12))), (cat,(Some(12),Some(2))), (cat,(Some(12),Some(12)))J

Question # 10

Problem Scenario 82 : You have been given table in Hive with following structure (Which you have created in previous exercise).

productid int code string name string quantity int price float

Using SparkSQL accomplish following activities.

1. Select all the products name and quantity having quantity < = 2000

2. Select name and price of the product having code as ' PEN '

3. Select all the products, which name starts with PENCIL

4. Select all products which " name " begins with ' P\ followed by any two characters, followed by space, followed by zero or more characters

Go to page: