Using SAP HANA we can connect to HADOOP using Smart Data Access where the first thing which we need to do is the HADOOP installation.
This blog talks about the HADOOP installation.
It takes at the max 2 hours for the installation if you are lucky
Please follow the below steps:
Step-1:
1. Download a stable release ending with tar.gz (hadoop-1.2.1.tar.gz)
2. In Linux, create a new folder “/home/hadoop”
3. Move the downloaded file to the folder “/home/hadoop” using Winscp or Filezilla.
4. In putty type: cd /home/hadoop
5. Type: tar xvf hadoop-1.2.1.tar.gz
Step-2:
Downloading and setting up java:
1.Check if Java is present
Type: java –version
2. If java is not present, please install it by following the below steps
3. Make a directory where we can install Java (/usr/local/java)
4. Download 64-bit Linux Java JDK and JRE ending with tar.gz from the below link:
http://oracle.com/technetwork/java/javase/downloads/index.html
5. Copy the downloaded files to the created folder
6. Extract and install java:
Type: cd /usr/local/java
Type: tar xvzf jdk.*.tar.gz
Type: tar xvzf jre.*.tar.gz
7. Include all the variables for path and Home directories in the /etc/profile at the end of file
JAVA_HOME=/usr/local/java/jdk1.7.0_40
PATH=$PATH:$JAVA_HOME/bin
JRE_HOME=/usr/local/java/jre1.7.0_40
PATH=$PATH:$JRE_HOME/bin
HADOOP_INSTALL=/home/hadoop/hadoop-1.2.1
PATH=$PATH:$ HADOOP_INSTALL /bin
Export JAVA_HOME
Export JRE_HOME
Export HADOOP_INSTALL
8. Run the below commands so that Linux can understand where Java is installed:
sudo update-alternatives --install "/usr/bin/java" "java" "/usr/local/java/jre1.7.0_40/bin/java" 1
sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/local/java/jdk1.7.0_40/bin/javac" 1
sudo update-alternatives --install "/usr/bin/javaws" "javaws" "/usr/local/java/jre1.7.0_40/bin/javaws" 1
sudo update-alternatives –set java /usr/local/java/ jre1.7.0_40/bin/java
sudo update-alternatives –set javac /usr/local/java/jdk1.7.0_40/bin/javac
sudo update-alternatives –set javaws /usr/local/java/jre1.7.0_40/bin/javaws
9. Test Java by typing Java –version
10. Check if JAVA_HOME is set by typing: echo $JAVA_HOME
Now we are done with the installation of Hadoop (Stand alone mode).
Step-3:
We can check if we are successful by running an example.
Go to Hadoop Installation directory
Type: mkdir output
Type: bin/hadoop jar hadoop-examples-*.jar grep input output ‘dfs[a-z.]+’
Type: ls output/*
The output is displayed with the success.
Step-4:
As a next step, change the configuration in the below files:
1. In the Hadoop installation folder change /conf/core-site.xml file to:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
2. Change /conf/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
3. Change /conf/mapred-site.xml:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
4. Edit /conf/hadoop-env.sh file:
export JAVA_HOME=/usr/local/java/ jdk1.7.0_40
Step-5:
1. Setup password less ssh by running the below commands:
Type: ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
Type: cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
2. To check if the ssh password is disabled
Type: ssh localhost (It should not ask any password)
3. Format the name node:
Type: /bin/hadoop namenode –format
Step-6:
To start all the Hadoop services:
Type: /bin/start-all.sh
Now try the same example which we tried earlier:
Type: bin/hadoop jar hadoop-examples-*.jar grep input output ‘dfs[a-z.]+’
It should give the output.
To stop all the Hadoop services:
Type: /bin/stop-all.sh
Now the installation of HADOOP is successful.