Learning

Posts

Ambari Installation

Set Up Password-less SSH Steps: 1. Generate public and private SSH keys on the Ambari Server host. $ ssh-keygen 2. Copy the SSH Public Key (id_rsa.pub) to the root account on your target hosts. .ssh/id_rsa .ssh/id_rsa.pub 3. Add the SSH Public Key to the authorized_keys file on your target hosts. $cat id_rsa.pub >> authorized_keys 4. Depending on your version of SSH, you may need to set permissions on the .ssh directory (to 700) and the authorized_keys file in that directory (to 600) on the target hosts. $chmod 700 ~/.ssh $chmod 600 ~/.ssh/authorized_keys 5. From the Ambari Server, make sure you can connect to each host in the cluster using SSH, without having to enter a password. $ssh root@<remote.target.host> where <remote.target.host> has the value of each host name in your cluster. 6. If the following warning message displays during your first connection: Are you sure you want to continue connecting (yes/no)? Enter Yes. 7. Reta

Apache Spark

Spark on Windows ========================================== Step 1: Download and install JDK http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html Step 2: Download Spark http://spark.apache.org/downloads.html Choose the following options for the download 1. Choose a Spark release - 1.6.1 2. Pre-built for Hadoop 2.6 or later 3. Direct Download 4. Click on the spark-1.6.1-bin-hadoop2.6.tgz Step 3: Extract the tar file Step 4: Copy the contents of the tar files into C:\spark\ folder Step 5: Update the log4j.properties to set the messages to WARN C:\spark\conf\log4j.properties.template Set the property - log4j.rootCategory=WARN Save the file as log4j.properties Step 6: Download winutils.exe from the here 1. Create a folder C:\winutils\bin 2. Copy the winutils.exe file into C:\bin\winutils.exe Step 7: Set the environment variables (Inform Windows where is Spark) SPARK_HOME = C:\spark JAVA_HO

Platform LSF Installation on Ubuntu

LSF INSTALLATTION STEPS ==================================================================================== STEP 1:Download the Specific LSF package from the IBM resources website. STEP 2:sudo tar -xvf lsfce9.1.3-ppc64le.tar STEP 3:cd lsfce9.1.3-ppc64le STEP 4:sudo vi install.config Edit the following LSF_TOP="/usr/share/lsf" LSF_ADMINS="lsf_admin" LSF_CLUSTER_NAME="my_first_cluster" LSF_TARDIR="/tmp" STEP 5: navigate inside the folder lsfce9.1.3-ppc64le run the following command ./lsfinstall -f install.config Press "1" to accept the License Once it shows the Success message, navigate to /usr/share/lsf/conf run the following command source ./profile.lsf Start LSF daemons lsadmin limstartup lsadmin resstartup badmin hstartup Once starting all the daemons , test the cluster by issuing the following commands lsid --> displays the cluster name and other information lshosts --> display the number o

Apache Hive 1.2.1 installation on Hadoop 2.7.1 in ubuntu 14.04

Steps to follow for installation Step 1: Download the Hive 1.2.1 tar ball from the link Step 2: Extract the apache-hive-1.2.1-bin.tar.gz using the command root@ubuntu:/usr/local#tar -xvzf apache-hive-1.2.1-bin.tar.gz Step 3: Move the extracted file to location called /usr/local/ Step 4: Navigate inside /usr/local/apache-hive-1.2.1-bin Step 5: Export the Hive_Home using the following command root@ubuntu:/usr/local/apache-hive-1.2.1-bin# export HIVE_HOME="/usr/local/apache-hive-1.2.1-bin" Step 6: Set the class-path of the hive1.2.1 root@ubuntu:/usr/local/apache-hive-1.2.1-bin# PATH=$PATH:$HIVE_HOME/bin root@ubuntu:/usr/local/apache-hive-1.2.1-bin# export PATH Step 7: Make changes in the hive-config.sh using the following command root@ubuntu:/usr/local/apache-hive-1.2.1-bin# vi bin/hive-config.sh Add the following at the end.. export HADOOP_HOME=/usr/local/hadoop Step 8: Start the hive using root@ubuntu:/usr/local/apache-hive-1.2.1-bin# bin/hive

How to Pull Twitter Data Using Apache Flume into HDFS

Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application. Flume lets Hadoop users make the most of valuable log data. Specifically, Flume allows users to: Stream data from multiple sources into Hadoop for analysis Collect high-volume Web logs in real time Insulate themselves from transient spikes when the rate of incoming data exceeds the rate at which data can be written to the destination Guarantee data delivery Scale horizontally to handle additional data volume Flume’s high-level architecture is focused on delivering a streamlined codebase that is easy-to-use and easy-to-extend. The project team has designed Flume

WordCount using Eclipse

Starting Eclipse and running MapReduce Program =============================== Step1: Open the home directory Step2: Go inside the eclipse directory Step3: Doube click on the eclipse.exe file to open the eclipse IDE Step4: Provide the workspace path Next the eclipse IDE Starts and look like this Step5: Just close the workbench After closing the workbench you will see something like below figure Step6: Now let’s create a project Right click in Project explorer Select → New → Project → Map/Reduce Project Next Provide Project Name → Wordcount in the Project name field Next Provide Project Name → Wordcount Next Configure hadoop installation directory Click on → Configure hadoop installation directory Next click → Browse → goto hadoop installation directory I.e,. /usr/local/hadoop → Ok → ok → next → finish Once done you will see something like below fig