DP-203 Microsoft Data Engineering on Microsoft Azure exact Exam Questions

Question # 4

What should you do to improve high availability of the real-time data processing solution?

Deploy identical Azure Stream Analytics jobs to paired regions in Azure.

Deploy a High Concurrency Databricks cluster.

Deploy an Azure Stream Analytics job and use an Azure Automation runbook to check the status of the job and to start the job if it stops.

Set Data Lake Storage to use geo-redundant storage (GRS).

Full Access

Question # 5

You use Azure Stream Analytics to receive Twitter data from Azure Event Hubs and to output the data to an Azure Blob storage account

You need to output the count of tweets during the last five minutes every five minutes. Each tweet must only be counted once.

Which windowing function should you use?

a five-minute Tumbling window

a five-minute Sliding window

a five-minute Hopping window that has a one-minute hop

a five-minute Session window

Full Access

Question # 6

You have an Azure subscription that contains an Azure Synapse Analytics dedicated SQL pool named Pool1. Pool1 receives new data once every 24 hours.

You have the following function.

You have the following query.

The query is executed once every 15 minutes and the @parameter value is set to the current date.

You need to minimize the time it takes for the query to return results.

Which two actions should you perform? Each correct answer presents part of the solution.

NOTE: Each correct selection is worth one point.

Create an index on the avg_f column.

Convert the avg_c column into a calculated column.

Create an index on the sensorid column.

Enable result set caching.

Change the table distribution to replicate.

Full Access

Question # 7

You have an Azure Synapse Analytics Apache Spark pool named Pool1.

You plan to load JSON files from an Azure Data Lake Storage Gen2 container into the tables in Pool1. The structure and data types vary by file.

You need to load the files into the tables. The solution must maintain the source data types.

What should you do?

Use a Get Metadata activity in Azure Data Factory.

Use a Conditional Split transformation in an Azure Synapse data flow.

Load the data by using the OPEHROwset Transact-SQL command in an Azure Synapse Anarytics serverless SQL pool.

Load the data by using PySpark.

Full Access

Question # 8

You are designing an Azure Databricks cluster that runs user-defined local processes. You need to recommend a cluster configuration that meets the following requirements:

• Minimize query latency.

• Maximize the number of users that can run queues on the cluster at the same time « Reduce overall costs without compromising other requirements

Which cluster type should you recommend?

Standard with Auto termination

Standard with Autoscaling

High Concurrency with Autoscaling

High Concurrency with Auto Termination

Full Access

Question # 9

You are building an Azure Analytics query that will receive input data from Azure IoT Hub and write the results to Azure Blob storage.

You need to calculate the difference in readings per sensor per hour.

How should you complete the query? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Full Access

Question # 10

You have an Azure Synapse Analytics dedicated SQL pool named pool1.

You plan to implement a star schema in pool1 and create a new table named DimCustomer by using the following code.

You need to ensure that DimCustomer has the necessary columns to support a Type 2 slowly changing dimension (SCD). Which two columns should you add? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point.

[HistoricalSalesPerson] [nvarchar] (256) NOT NULL

[EffectiveEndDate] [datetime] NOT NULL

[PreviousModifiedDate] [datetime] NOT NULL

[RowID] [bigint] NOT NULL

[EffectiveStartDate] [datetime] NOT NULL

Full Access

Question # 11

You have an Azure Data Factory instance that contains two pipelines named Pipeline1 and Pipeline2.

Pipeline1 has the activities shown in the following exhibit.

Pipeline2 has the activities shown in the following exhibit.

You execute Pipeline2, and Stored procedure1 in Pipeline1 fails.

What is the status of the pipeline runs?

Pipeline1 and Pipeline2 succeeded.

Pipeline1 and Pipeline2 failed.

Pipeline1 succeeded and Pipeline2 failed.

Pipeline1 failed and Pipeline2 succeeded.

Full Access

Question # 12

You are creating an Azure Data Factory data flow that will ingest data from a CSV file, cast columns to specified types of data, and insert the data into a table in an Azure Synapse Analytic dedicated SQL pool. The CSV file contains three columns named username, comment, and date.

The data flow already contains the following:

A source transformation.

A Derived Column transformation to set the appropriate types of data.

A sink transformation to land the data in the pool.

You need to ensure that the data flow meets the following requirements:

All valid rows must be written to the destination table.

Truncation errors in the comment column must be avoided proactively.

Any rows containing comment values that will cause truncation errors upon insert must be written to a file in blob storage.

Which two actions should you perform? Each correct answer presents part of the solution.

NOTE: Each correct selection is worth one point.

To the data flow, add a sink transformation to write the rows to a file in blob storage.

To the data flow, add a Conditional Split transformation to separate the rows that will cause truncationerrors.

To the data flow, add a filter transformation to filter out rows that will cause truncation errors.

Add a select transformation to select only the rows that will cause truncation errors.

Full Access

Question # 13

You have a trigger in Azure Data Factory configured as shown in the following exhibit.

Use the drop-down menus to select the answer choice that completes each statement based upon the information presented in the graphic.

Full Access

Question # 14

You have a self-hosted integration runtime in Azure Data Factory.

The current status of the integration runtime has the following configurations:

Status: Running

Type: Self-Hosted

Version: 4.4.7292.1

Running / Registered Node(s): 1/1

High Availability Enabled: False

Linked Count: 0

Queue Length: 0

Average Queue Duration. 0.00s

The integration runtime has the following node details:

Name: X-M

Status: Running

Version: 4.4.7292.1

Available Memory: 7697MB

CPU Utilization: 6%

Network (In/Out): 1.21KBps/0.83KBps

Concurrent Jobs (Running/Limit): 2/14

Role: Dispatcher/Worker

Credential Status: In Sync

Use the drop-down menus to select the answer choice that completes each statement based on the information presented.

NOTE: Each correct selection is worth one point.

Full Access

Question # 15

You are designing an application that will use an Azure Data Lake Storage Gen 2 account to store petabytes of license plate photos from toll booths. The account will use zone-redundant storage (ZRS).

You identify the following usage patterns:

• The data will be accessed several times a day during the first 30 days after the data is created. The data must meet an availability SU of 99.9%.

• After 90 days, the data will be accessed infrequently but must be available within 30 seconds.

• After 365 days, the data will be accessed infrequently but must be available within five minutes.

Full Access

Question # 16

You have an Azure Data Factory pipeline named pipeline1 that includes a Copy activity named Copy1. Copy1 has the following configurations:

• The source of Copy1 is a table in an on-premises Microsoft SQL Server instance that is accessed by using a linked service connected via a self-hosted integration runtime.

• The sink of Copy1 uses a table in an Azure SQL database that is accessed by using a linked service connected via an Azure integration runtime.

You need to maximize the amount of compute resources available to Copy1. The solution must minimize administrative effort.

What should you do?

Scale up the data flow runtime of the Azure integration runtime.

Scale up the data flow runtime of the Azure integration runtime and scale out the self-hosted integration runtime.

Scale out the self-hosted integration runtime.

Full Access

Question # 17

You have the Azure Synapse Analytics pipeline shown in the following exhibit.

You need to add a set variable activity to the pipeline to ensure that after the pipeline’s completion, the status of the pipeline is always successful.

What should you configure for the set variable activity?

a success dependency on the Business Activity That Fails activity

a failure dependency on the Upon Failure activity

a skipped dependency on the Upon Success activity

a skipped dependency on the Upon Failure activity

Full Access

Question # 18

You have an Azure Data Factory version 2 (V2) resource named Df1. Df1 contains a linked service.

You have an Azure Key vault named vault1 that contains an encryption key named key1.

You need to encrypt Df1 by using key1.

What should you do first?

Add a private endpoint connection to vault 1.

Enable Azure role-based access control on vault 1.

Remove the linked service from Df1.

Create a self-hosted integration runtime.

Full Access

Question # 19

You have an Azure data factory.

You need to examine the pipeline failures from the last 60 days.

What should you use?

the Activity log blade for the Data Factory resource

the Monitor & Manage app in Data Factory

the Resource health blade for the Data Factory resource

Azure Monitor

Full Access

Question # 20

A company purchases IoT devices to monitor manufacturing machinery. The company uses an IoT appliance to communicate with the IoT devices.

The company must be able to monitor the devices in real-time.

You need to design the solution.

What should you recommend?

Azure Stream Analytics cloud job using Azure PowerShell

Azure Analysis Services using Azure Portal

Azure Data Factory instance using Azure Portal

Azure Analysis Services using Azure PowerShell

Full Access

Question # 21

You plan to create a dimension table in Azure Synapse Analytics that will be less than 1 GB.

You need to create the table to meet the following requirements:

• Provide the fastest Query time.

• Minimize data movement during queries.

Which type of table should you use?

hash distributed

heap

replicated

round-robin

Full Access

Question # 22

You are designing database for an Azure Synapse Analytics dedicated SQL pool to support workloads for detecting ecommerce transaction fraud.

Data will be combined from multiple ecommerce sites and can include sensitive financial information such as credit card numbers.

You need to recommend a solution that meets the following requirements:

Users must be able to identify potentially fraudulent transactions.

Users must be able to use credit cards as a potential feature in models.

Users must NOT be able to access the actual credit card numbers.

What should you include in the recommendation?

Transparent Data Encryption (TDE)

row-level security (RLS)

column-level encryption

Azure Active Directory (Azure AD) pass-through authentication

Full Access

Question # 23

You have an enterprise data warehouse in Azure Synapse Analytics named DW1 on a server named Server1.

You need to determine the size of the transaction log file for each distribution of DW1.

What should you do?

On DW1, execute a query against the sys.database_files dynamic management view.

From Azure Monitor in the Azure portal, execute a query against the logs of DW1.

Execute a query against the logs of DW1 by using theGet-AzOperationalInsightsSearchResult PowerShell cmdlet.

On the master database, execute a query against thesys.dm_pdw_nodes_os_performance_counters dynamic management view.

Full Access

Question # 24

You have a SQL pool in Azure Synapse.

You discover that some queries fail or take a long time to complete.

You need to monitor for transactions that have rolled back.

Which dynamic management view should you query?

sys.dm_pdw_request_steps

sys.dm_pdw_nodes_tran_database_transactions

sys.dm_pdw_waits

sys.dm_pdw_exec_sessions

Full Access

Question # 25

You have a Microsoft SQL Server database that uses a third normal form schema.

You plan to migrate the data in the database to a star schema in an Azure Synapse Analytics dedicated SQI pool.

You need to design the dimension tables. The solution must optimize read operations.

What should you include in the solution? to answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Full Access

Question # 26

You are creating an Azure Data Factory data flow that will ingest data from a CSV file, cast columns to specified types of data, and insert the data into a table in an Azure Synapse Analytics dedicated SQL pool. The CSV file contains columns named username, comment and date.

The data flow already contains the following:

• A source transformation

• A Derived Column transformation to set the appropriate types of data

• A sink transformation to land the data in the pool

You need to ensure that the data flow meets the following requirements;

• All valid rows must be written to the destination table.

• Truncation errors in the comment column must be avoided proactively.

• Any rows containing comment values that will cause truncation errors upon insert must be written to a file in blob storage.

Which two actions should you perform? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point

Add a select transformation that selects only the rows which will cause truncation errors.

Add a sink transformation that writes the rows to a file in blob storage.

Add a filter transformation that filters out rows which will cause truncation errors.

Add a Conditional Split transformation that separates the rows which will cause truncation errors.

Full Access

Question # 27

You plan to develop a dataset named Purchases by using Azure databricks Purchases will contain the following columns:

• ProductID

• ItemPrice

• lineTotal

• Quantity

• StorelD

• Minute

• Month

• Hour

• Year

• Day

You need to store the data to support hourly incremental load pipelines that will vary for each StoreID. the solution must minimize storage costs. How should you complete the rode? To answer, select the appropriate options In the answer area.

NOTE: Each correct selection is worth one point.

Full Access

Question # 28

You have an Azure data factory named ADM that contains a pipeline named Pipelwe1

Pipeline! must execute every 30 minutes with a 15-minute offset.

Vou need to create a trigger for Pipehne1. The trigger must meet the following requirements:

• Backfill data from the beginning of the day to the current time.

• If Pipeline1 fairs, ensure that the pipeline can re-execute within the same 30-mmute period.

• Ensure that only one concurrent pipeline execution can occur.

• Minimize de4velopment and configuration effort

Which type of trigger should you create?

schedule

event-based

manual

tumbling window

Full Access

Question # 29

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will contain the following three workloads:

A workload for data engineers who will use Python and SQL.

A workload for jobs that will run notebooks that use Python, Scala, and SOL.

A workload that data scientists will use to perform ad hoc analysis in Scala and R.

The enterprise architecture team at your company identifies the following standards for Databricks environments:

The data engineers must share a cluster.

The job cluster will be managed by using a request process whereby data scientists and data engineers provide packaged notebooks for deployment to the cluster.

All the data scientists must be assigned their own cluster that terminates automatically after 120 minutes of inactivity. Currently, there are three data scientists.

You need to create the Databricks clusters for the workloads.

Solution: You create a High Concurrency cluster for each data scientist, a High Concurrency cluster for the data engineers, and a Standard cluster for the jobs.

Does this meet the goal?

Yes

Full Access

Question # 30

You have an Azure Synapse Analytics dedicated SQL pool named Pool1 that contains an external table named Sales. Sales contains sales data. Each row in Sales

contains data on a single sale, including the name of the salesperson.

You need to implement row-level security (RLS). The solution must ensure that the salespeople can access only their respective sales.

What should you do? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Full Access

Question # 31

You need to design a data storage structure for the product sales transactions. The solution must meet the sales transaction dataset requirements.

What should you include in the solution? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Full Access

Question # 32

You need to integrate the on-premises data sources and Azure Synapse Analytics. The solution must meet the data integration requirements.

Which type of integration runtime should you use?

Azure-SSIS integration runtime

self-hosted integration runtime

Azure integration runtime

Full Access

Question # 33

You need to design a data retention solution for the Twitter feed data records. The solution must meet the customer sentiment analytics requirements.

Which Azure Storage functionality should you include in the solution?

change feed

soft delete

time-based retention

lifecycle management

Full Access

Question # 34

You need to design the partitions for the product sales transactions. The solution must meet the sales transaction dataset requirements.

What should you include in the solution? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Full Access

Question # 35

You need to design an analytical storage solution for the transactional data. The solution must meet the sales transaction dataset requirements.

What should you include in the solution? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Full Access

Question # 36

You need to implement versioned changes to the integration pipelines. The solution must meet the data integration requirements.

In which order should you perform the actions? To answer, move all actions from the list of actions to the answer area and arrange them in the correct order.

Full Access

Question # 37

You need to implement an Azure Synapse Analytics database object for storing the sales transactions data. The solution must meet the sales transaction dataset requirements.

What solution must meet the sales transaction dataset requirements.

What should you do? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Full Access

Question # 38

You need to design a data ingestion and storage solution for the Twitter feeds. The solution must meet the customer sentiment analytics requirements.

What should you include in the solution? To answer, select the appropriate options in the answer area

NOTE: Each correct selection b worth one point.

Full Access

Question # 39

You need to implement the surrogate key for the retail store table. The solution must meet the sales transaction

dataset requirements.

What should you create?

a table that has an IDENTITY property

a system-versioned temporal table

a user-defined SEQUENCE object

a table that has a FOREIGN KEY constraint

Full Access

Question # 40

You need to ensure that the Twitter feed data can be analyzed in the dedicated SQL pool. The solution must meet the customer sentiment analytics requirements.

Which three Transaction-SQL DDL commands should you run in sequence? To answer, move the appropriate commands from the list of commands to the answer area and arrange them in the correct order.

NOTE: More than one order of answer choices is correct. You will receive credit for any of the correct orders you select.

Full Access

Summer Sale Special 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: ex2p65

Exact2Pass Menu

Exact2Pass

Answer:

Explanation:

Answer:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Answer:

Explanation:

Answer:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

SubFooter