An Inside look into Spark: Big Data Beginners Guide

Big Data Beginners Guide

Let us go straight to a simple setup which worked for us in maintaining a big data cluster despite of limited resources.

BigData Infrastructure components ::-

1) HDFS
2) YARN
3) AMBARI
4) ZOOKEEPER
5) HIVE SERVER
6) HBASE

The below diagram is going to be very helpful in understanding the big data landscape.

The above architecture has served our team very well and is a general architectural overview for a big data platform.

Also, one can separate out data nodes i.e. yarn containers/executors and Hbase region servers. We decided on saving large machines and clubbed the executors and region server processes at one node.

Tips and tricks to follow soon.

1) What is the ideal replication factor in Hadoop ?

Always have a default replication factor of 3. But to manage hfs disk usage, one should have a file specific replication factor.

An Inside look into Spark

Tuesday, 3 October 2017

Big Data Beginners Guide

No comments:

Post a Comment