which of the vs of big data are we trying to better solve on by adding heron or storm to hadoop

Name: which of the vs of big data are we trying to better solve on by adding heron or storm to hadoop 70754
Uploaded: 2023-08-18T11:07:08-08:00
Duration: 6 min 13 s
Channel: Madhur L
Description: which of the vs of big data are we trying to better solve on by adding heron or storm to hadoop 70754

Step 1: Identify the V's of big data. The commonly recognized V's include Volume, Velocity, Vari

Which of the vs of big data are we trying to better solve on...

Transcript

00:01 Hello students, hadoop provides the various tools and the technology to faculties the ingestion of the streaming data.

00:08 The one popular approach is to use the apache kafka as the streaming data platform in conjunction with hadoop components like the hdfs and the apache spark.

00:22 As here is a high -level overview of how a streaming data can be ingested into the hadoop cluster.

00:31 As the first we'll go with the choose a streaming platform.

00:38 Choose a streaming platform.

00:45 Streaming platform with the apache kafka is a widely used streaming platform that provides the reliability, scale level and the distributed message processing.

00:59 It acts as a buffer between the data sources and the hadoop components ensuring the data durability and the availability.

01:08 The next is to set up the kafka.

01:15 So install and configure the apache kafka on a dedicated cluster or the services.

01:22 The kafka consists of the procedure that generates the data stream, brokers that stores and distributes the data and the customers that process the data.

01:35 So the next is to create the kafka topic.

01:45 So when the topics are logical channels for the organizing the data streaming, the procedure publish data to specific topic and customers subscribe to those topics to consume the data.

02:00 The next is to produce the data.

02:07 So the data sources such as the sensors, application and the external system generates the streaming data and sends it to the kafka topic using the procedures.

02:17 So kafka allows the high throughput data publishing.

02:22 As the after once you publish the data, you need to connect to the kafka.

02:28 Kafka connect is a framework for connecting the external system with the kafka topics...