Guide_Part1_Apache Hadoop Installation and Cluster Setup on AWS EC2 (Ubuntu)

EDUREKA Apache Hadoop Installation and Cluster setup on AWS EC2 (Ubuntu) – Part 1 A guide to install and setup Multi-No

Views 152 Downloads 44 File size 2MB

Report DMCA / Copyright

DOWNLOAD FILE

Recommend stories

Citation preview

EDUREKA

Apache Hadoop Installation and Cluster setup on AWS EC2 (Ubuntu) – Part 1 A guide to install and setup Multi-Node Apache Hadoop Cluster on AWS EC2 edureka! 9/20/2013

A guide to setup a Multi-Node Apache Hadoop Cluster on AWS EC2 (using free tier eligible server)

APACHE HADOOP INSTALLATION AND CLUSTER SETUP ON AWS EC2 (UBUNTU) – PART 1 A guide to install and setup Multi-Node Apache Hadoop Cluster on AWS EC2 Table of Contents Introduction ............................................................................................................................................ 2 1.

Setting up the Cluster Infrastructure on AWS EC2 ......................................................................... 2 1.1 Creating a AWS Free Account ....................................................................................................... 2 1.1.1 Signup and register on AWS................................................................................................... 2 1.1.2 Use your correct contact number .......................................................................................... 4 1.1.3 Choose a Plan for your usage................................................................................................. 4 1.2 Login to AWS ................................................................................................................................. 6 1.3. Creating Cluster member servers ................................................................................................ 7 1.3.1 Choose a free tier eligible instance ........................................................................................ 7 1.3.2 Create a key pair .................................................................................................................. 11 1.3.3 Configure Security Group and Firewall settings................................................................... 12 1.3.4 Review the pre-launch ......................................................................................................... 13 1.3.5 Launch the servers ............................................................................................................... 14 1.4 Setup client access to AWS servers............................................................................................. 16 1.4.1 Generate the Public/Private KeyPair ................................................................................... 16 1.4.2 Import keypair and save public/private keys ....................................................................... 16 1.4.3 Access the AWS EC2 servers ................................................................................................ 17 1.4.4 Setup WINSCP access to AWS EC2 servers .......................................................................... 22

© 2013 Brain4ce Education Solutions Pvt. Ltd

Page 1

Introduction This setup and configuration document is a guide to setup a Multi-Node Apache Hadoop cluster on Amazon Web Services (AWS) Elastic Cloud 2 (EC2) using ‘free tier usage eligible’ Ubuntu (t1.micro) servers. If you are new to both AWS and Hadoop, this guide comes handy to quickly setup a MultiNode Apache Hadoop Cluster on AWS EC2. Note AWS also provides a hosted solution for Hadoop, named Amazon Elastic Map Reduce (EMR) but Only Pig and Hive are available as of now and with a cost.

The guide describes the whole process in two parts:

Part 1: Setting up the Cluster Infrastructure on AWS EC2 This section describes step by step guide to setup an AWS account and launch the AWS EC2 free tier eligible Ubuntu servers. These servers will be used to setup a four node Apache Hadoop Clusters on AWS EC2 cloud infrastructure.

Part 2: Installing Apache Hadoop and Setting up the Cluster This section provides step by step guide to install pre-requisites for Hadoop Installation and to configure the cluster on EC2 servers. The section explains primary Hadoop configuration files, Password-less SSH access, configuring master and slaves, and service start/stop in detail. Note The configuration described here is intended for learning purposes only.

1. Setting up the Cluster Infrastructure on AWS EC2 This section describes the steps to create a free account and launch Ubuntu servers on AWS EC2 for Apache Hadoop Installation and Cluster Setup.

1.1 Creating a AWS Free Account The first step is to create a free trial account in AWS. You can review the limit on free services at http://aws.amazon.com/free/

1.1.1 Signup and register on AWS. You can sign up on AWS using your email id and credit card. Even though the AWS EC2 free tier eligible instances are available without any additional cost, you need to specify the credit card during the account creation. As explained in the following image, your credit card will be billed if your monthly usage goes beyond the free tier. For example, using any additional AWS resource or service such as Elastic Block Store (EBS).

© 2013 Brain4ce Education Solutions Pvt. Ltd

Page 2

F IGURE 1-1 SPECIFY YOUR C REDIT C ARD DETAILS

© 2013 Brain4ce Education Solutions Pvt. Ltd

Page 3

1.1.2 Use your correct contact number Please ensure that you provide a correct contact number as AWS verify your identity through a phone call on your number. F IGURE 1-2 VERIFY THE DETAILS

1.1.3 Choose a Plan for your usage Choose basic plan for trial usage. This plan is good enough to create the cluster and to play around p.

© 2013 Brain4ce Education Solutions Pvt. Ltd

Page 4

F IGURE 1-3 C HOOSE A PLAN

© 2013 Brain4ce Education Solutions Pvt. Ltd

Page 5

1.2 Login to AWS Login to your AWS account and access the ‘AWS Management Console’.

F IGURE 1-4 AWS MANAGEMENT CONSOLE

Choose EC2 and access EC2 Dashboard to create cluster member servers.

© 2013 Brain4ce Education Solutions Pvt. Ltd

Page 6

F IGURE 1-5 EC2 D ASHBOARD

1.3. Creating Cluster member servers Click on ‘Launch Instance’ and choose ‘Classic Wizard’ to create, configure and launch your Cluster Servers.

1.3.1 Choose a free tier eligible instance Choose an Instance configuration. All the option with the ‘orange’ colour star are Free tier eligible instances. (If used with a micro instance).

© 2013 Brain4ce Education Solutions Pvt. Ltd

Page 7

F IGURE 1-6 Q UICK L AUNCH

Choose Ubuntu 12.04.2 LTS. Remember to change number of Instances to 4. This will simultaneously create four Ubuntu instances.

© 2013 Brain4ce Education Solutions Pvt. Ltd

Page 8

F IGURE 1-7 C HOOSE I NSTANCE DETAILS

Ensure that you choose free tier for the setup. Keep the defaults but change the root volume to 5 or 6 GiB so that the total HDD usage (4*5 =20 GiB) is below the free tier limit of 30 GiB/Month.

© 2013 Brain4ce Education Solutions Pvt. Ltd

Page 9

F IGURE 1-8 I NSTANCE DETAILS

Choose a name and add any other tag for billing or operations purpose.

F IGURE 1-9 C HOOSE NAME AND TAGS

© 2013 Brain4ce Education Solutions Pvt. Ltd

Page 10

1.3.2 Create a key pair This is the most important part of launching and creating the AWS instances. AWS provides a private/public key based access to the servers. You can choose a previously created key or can create a new key pair. We will create and download the fresh key pair. Keep the Key Pair file (.pem) safe in your PC as this will be needed to access the servers.

F IGURE 1-10 C REATE AND DOWNLOAD A KEY PAIR

© 2013 Brain4ce Education Solutions Pvt. Ltd

Page 11

1.3.3 Configure Security Group and Firewall settings You need to choose a security group to control the access to the services on server. You can create a new Group or use the existing one. Create a group with default options and Add ‘All TCP’, ‘All ICMP’ and ‘SSH (22)’ under the inbound rules. This will allow ping, SSH, and other similar commands among servers and from any other machine on internet. These protocols and ports are also required to enable communication among cluster servers. As this is a test setup we are allowing access to all for TCP, ICMP and SSH and not bothering about the details of individual server port and security.

F IGURE 1-11 CONFIGURE F IREWALL

© 2013 Brain4ce Education Solutions Pvt. Ltd

Page 12

1.3.4 Review the pre-launch Review all the settings before you proceed with the server creation.

F IGURE 1-12 REVIEW THE SERVER CREATION

© 2013 Brain4ce Education Solutions Pvt. Ltd

Page 13

1.3.5 Launch the servers Launch the servers and review the Instance page for newly launched servers.

F IGURE 1-13INSTANCE REVIEW AT EC2 D ASHBOARD

Rename the servers according to their roles in cluster.

F IGURE 1-14 RENAME THE SERVERS AS PER THEIR ROLES

© 2013 Brain4ce Education Solutions Pvt. Ltd

Page 14

Here is the final list of instances:

F IGURE 1-15 SERVER DETAILS

Make a note of the public URL of servers such as ‘ec2-54-212-38-184.us-west2.compute.amazonaws.com’. These URL’s will be used to access the servers from your PC and to monitor the HDFS health from your browser.

F IGURE 1-16 SERVER DETAILS

© 2013 Brain4ce Education Solutions Pvt. Ltd

Page 15

1.4 Setup client access to AWS servers You need to setup password-less SSH access among servers to setup the cluster. Especially from Master server to Slave servers to ensure that Master Server can remotely start the Data Node and Task Tracker services on Slave servers.

1.4.1 Generate the Public/Private KeyPair Download ’putty’ to access the AWS EC2 servers. Also download ‘puttygen’ to generate the public/private keypair from the ‘.pem’ created in step “1.3.2 Create a Key pair”

1.4.2 Import keypair and save public/private keys Open ‘puttygen’ and import the ‘.pem’ file downloaded to your PC in step “1.3.2 Create a Key pair”.

F IGURE 1-17 IMPORT THE KEY PAIR

You can give passphrase to protect your private key or leave the passphrase fields blank to use the private key without any passphrase. The passphrase protects the private key from any unauthorized access to servers using your machine and your private key. Every access to servers using passphrase protected private key will require end user to enter the passphrase to enable the private key enabled access to AWS EC2 server.

© 2013 Brain4ce Education Solutions Pvt. Ltd

Page 16

F IGURE 1-18 C REATE P UBLIC/P RIVATE KEYS

1.4.3 Access the AWS EC2 servers Access the servers using the private key created in Step 1.4.2 Import keypair and save public/private keys and note down their hostname and IP addresses using ifconfig command.

© 2013 Brain4ce Education Solutions Pvt. Ltd

Page 17

F IGURE 1-19 A DD THE PRIVATE KEY TO PUTTY

You may receive following error if you have not appropriately configured your security group in Step 1.1.3 .

© 2013 Brain4ce Education Solutions Pvt. Ltd

Page 18

F IGURE 1-20 A DD THE PRIVATE KEY TO PUTTY

Note the IP Address and update the /etc/hosts file with hostname and IP address.

© 2013 Brain4ce Education Solutions Pvt. Ltd

Page 19

F IGURE 1-21 H OST IP ADDRESS

Change the hostname to Public URL of AWS EC2 server using the following command: $sudo hostname ec2-54-214-206-65.us-west-2.compute.amazonaws.com

© 2013 Brain4ce Education Solutions Pvt. Ltd

Page 20

F IGURE 1-22 C HANGE HOSTNAME

Edit /etc/hosts with Public ID of your AWS EC2 server:

$sudo vi /etc/hosts

F IGURE 1-23 H OSTNAME CHANGE

Also, repeat all the steps in this particular Section (1.4.3) ion all the other three cluster servers to enable public access to these AWS EC2 servers. © 2013 Brain4ce Education Solutions Pvt. Ltd

Page 21

1.4.4 Setup WINSCP access to AWS EC2 servers Use the private key created in Step 1.4.2 Import keypair and save public/private keys to access the servers from desktop with WINSCP for any file download and upload to/from the servers from/to your PC.

F IGURE 1-24 SETUP WINSCP

Copy the .pem file and other keys to Master server using WinSCP You are ready with the infrastructure to create your first Apache Hadoop Cluster. Please Review the Part -2 of this guide to create the Apache Hadoop Cluster.

© 2013 Brain4ce Education Solutions Pvt. Ltd

Page 22