Explanation: Solution :
Step 1 : Create hive table forflumeemployee.'
CREATE TABLE flumeemployee
(
name string, salary int, sex string,
age int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';
Step 2 : Create flume configuration file, with below configuration for source, sink and channel and save it in flume2.conf.
#Define source , sink , channel and agent,
agent1 .sources = source1
agent1 .sinks = sink1
agent1.channels = channel1
# Describe/configure source1
agent1.sources.source1.type = netcat
agent1.sources.source1.bind = 127.0.0.1
agent1.sources.source1.port = 44444
## Describe sink1
agent1 .sinks.sink1.channel = memory-channel
agent1.sinks.sink1.type = hdfs
agent1 .sinks.sink1.hdfs.path = /user/hive/warehouse/flumeemployee
hdfs-agent.sinks.hdfs-write.hdfs.writeFormat=Text
agent1 .sinks.sink1.hdfs.tileType = Data Stream
# Now we need to define channel1 property.
agent1.channels.channel1.type = memory
agent1.channels.channel1.capacity = 1000
agent1.channels.channel1.transactionCapacity = 100
# Bind the source and sink to the channel
Agent1 .sources.sourcel.channels = channell agent1 .sinks.sinkl.channel = channel1
Step 3 : Run below command which will use this configuration file and append data in hdfs.
Start flume service:
flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/flumeconf/flume2.conf --name agent1
Step 4 : Open another terminal and use the netcat service.
nc localhost 44444
Step 5 : Enter data line by line.
alok,100000.male,29
jatin,105000,male,32
yogesh,134000,male,39
ragini,112000,female,35
jyotsana,129000,female,39
valmiki,123000,male,29
Step 6 : Open hue and check the data is available in hive table or not.
step 7 : Stop flume service by pressing ctrl+c
Step 8 : Calculate average salary on hive table using below query. You can use either hive command line tool or hue. select avg(salary) from flumeemployee;