[하둡교육 2일차] Hadoop 설치

Data Platform/Hadoop

[하둡교육 2일차] Hadoop 설치

태하팍 2013. 6. 4. 13:22

하둡을 설치 하기 위해 교육은 오라클 버추얼 박스로 진행 하였다.

아쉽게 전체적인 셋팅을 소개 할 수는 없을 것 같다. 강사님이 미리 준비해오고 셋팅 다해오셨다 ㅋㅋ;;

1. JDK 설치

OpenJdk는 하둡에서 오류가 난다고 한다. 오라클 sun jdk를 셋팅 하도록 하자.

2. JAVA_HOME 설정

보통 일반 계정이라면 .bash_profile에 셋팅을 해주지만 이번 교육에서는 root로 잡아줬기 때문에

/etc/profile에서 PATH를 잡아주었다.

3. 하둡 복사 및 설치

하둡..이것 또한 미리 가져와있다 ㅋㅋ; hadoop-1.0.4-bin.tar.gz이라는 파일!!

tar xvf hadoop-1.0.4-bin.tar.gz 로 압축을 해제 해준다.

자주 쓰는 하둡 명령어

confirm daemon

[root@localhost conf]# hadoop dfsadmin -report

start-all.sh

stop-all.sh

ssh
ssh-keygen
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

/media/sf_shared/sample_data

root@localhost sample_data]# hadoop fs -put /media/sf_shared/sample_data/cite75_99.txt /usr/root/test/cite.txt

hadoop fs -put /media/sf_shared/sample_data/apat63_99.txt /usr/root/test/apat.txt

[root@localhost usr]# hadoop fs -lsr /
Warning: $HADOOP_HOME is deprecated.

drwxr-xr-x   - root supergroup          0 2013-06-04 13:14 /home
drwxr-xr-x   - root supergroup          0 2013-06-04 13:14 /home/root
drwxr-xr-x   - root supergroup          0 2013-06-04 13:14 /home/root/hadoop-1.0.4
drwxr-xr-x   - root supergroup          0 2013-06-04 13:14 /home/root/hadoop-1.0.4/tmp
drwxr-xr-x   - root supergroup          0 2013-06-04 13:21 /home/root/hadoop-1.0.4/tmp/mapred
drwx------   - root supergroup          0 2013-06-04 13:21 /home/root/hadoop-1.0.4/tmp/mapred/system
-rw-------   2 root supergroup          4 2013-06-04 13:21 /home/root/hadoop-1.0.4/tmp/mapred/system/jobtracker.info
drwxr-xr-x   - root supergroup          0 2013-06-04 13:27 /usr
drwxr-xr-x   - root supergroup          0 2013-06-04 13:27 /usr/root
drwxr-xr-x   - root supergroup          0 2013-06-04 13:29 /usr/root/test
-rw-r--r--   2 root supergroup 236902953 2013-06-04 13:29 /usr/root/test/apat.txt
-rw-r--r--   2 root supergroup 264075414 2013-06-04 13:27 /usr/root/test/cite.txt

[root@localhost usr]# hadoop fsck /usr/root/test/cite.txt -files -blocks -locations -racks
Warning: $HADOOP_HOME is deprecated.

FSCK started by root from /192.168.56.102 for path /usr/root/test/cite.txt at Tue Jun 04 13:30:27 KST 2013
/usr/root/test/cite.txt 264075414 bytes, 4 block(s): OK
0. blk_1184698969449250244_1005 len=67108864 repl=2 [/default-rack/192.168.56.103:50010, /default-rack/192.168.56.102:50010]
1. blk_-4684336584511364745_1005 len=67108864 repl=2 [/default-rack/192.168.56.103:50010, /default-rack/192.168.56.102:50010]
2. blk_4237572107572422623_1005 len=67108864 repl=2 [/default-rack/192.168.56.103:50010, /default-rack/192.168.56.102:50010]
3. blk_-3592843786766369632_1005 len=62748822 repl=2 [/default-rack/192.168.56.103:50010, /default-rack/192.168.56.102:50010]

Status: HEALTHY
Total size:    264075414 B
Total dirs:    0
Total files:    1
Total blocks (validated):    4 (avg. block size 66018853 B)
Minimally replicated blocks:    4 (100.0 %)
Over-replicated blocks:    0 (0.0 %)
Under-replicated blocks:    0 (0.0 %)
Mis-replicated blocks:        0 (0.0 %)
Default replication factor:    2
Average block replication:    2.0
Corrupt blocks:        0
Missing replicas:        0 (0.0 %)
Number of data-nodes:        2
Number of racks:        1
FSCK ended at Tue Jun 04 13:30:27 KST 2013 in 2 milliseconds

---------------------mapReduce's sample-----------------------------------

hadoop-examples-1.0.4.jar

jar <jar> run a jar file

[root@localhost hadoop-1.0.4]# hadoop jar hadoop-examples-1.0.4.jar
Warning: $HADOOP_HOME is deprecated.

An example program must be given as the first argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
dbcount: An example job that count the pageview counts from a database.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using monte-carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sleep: A job that sleeps at each map and reduce task.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.

Usage: wordcount <in> <out>

[root@localhost hadoop-1.0.4]# hadoop jar hadoop-examples-1.0.4.jar wordcount /usr/root/test/cite.txt /usr/root/test/output/wc
Warning: $HADOOP_HOME is deprecated.

13/06/04 13:39:55 INFO input.FileInputFormat: Total input paths to process : 1
13/06/04 13:39:55 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/06/04 13:39:55 WARN snappy.LoadSnappy: Snappy native library not loaded
13/06/04 13:39:55 INFO mapred.JobClient: Running job: job_201306041320_0001
13/06/04 13:39:56 INFO mapred.JobClient: map 0% reduce 0%
13/06/04 13:40:13 INFO mapred.JobClient: map 13% reduce 0%
13/06/04 13:40:16 INFO mapred.JobClient: map 19% reduce 0%
13/06/04 13:40:17 INFO mapred.JobClient: map 33% reduce 0%
13/06/04 13:40:19 INFO mapred.JobClient: map 39% reduce 0%
13/06/04 13:40:20 INFO mapred.JobClient: map 45% reduce 0%
13/06/04 13:40:22 INFO mapred.JobClient: map 52% reduce 0%
13/06/04 13:40:23 INFO mapred.JobClient: map 59% reduce 0%
13/06/04 13:40:25 INFO mapred.JobClient: map 66% reduce 0%
13/06/04 13:40:26 INFO mapred.JobClient: map 71% reduce 0%
13/06/04 13:40:28 INFO mapred.JobClient: map 78% reduce 0%
13/06/04 13:40:29 INFO mapred.JobClient: map 85% reduce 0%
13/06/04 13:40:32 INFO mapred.JobClient: map 92% reduce 0%
13/06/04 13:40:35 INFO mapred.JobClient: map 100% reduce 0%
13/06/04 13:41:08 INFO mapred.JobClient: map 100% reduce 8%
13/06/04 13:41:14 INFO mapred.JobClient: map 100% reduce 42%
13/06/04 13:41:15 INFO mapred.JobClient: map 100% reduce 67%
13/06/04 13:41:17 INFO mapred.JobClient: map 100% reduce 72%
13/06/04 13:41:20 INFO mapred.JobClient: map 100% reduce 76%
13/06/04 13:41:23 INFO mapred.JobClient: map 100% reduce 83%
13/06/04 13:41:26 INFO mapred.JobClient: map 100% reduce 87%
13/06/04 13:41:29 INFO mapred.JobClient: map 100% reduce 89%
13/06/04 13:41:30 INFO mapred.JobClient: map 100% reduce 93%
13/06/04 13:41:38 INFO mapred.JobClient: map 100% reduce 98%
13/06/04 13:41:39 INFO mapred.JobClient: map 100% reduce 100%
13/06/04 13:41:44 INFO mapred.JobClient: Job complete: job_201306041320_0001
13/06/04 13:41:44 INFO mapred.JobClient: Counters: 29
13/06/04 13:41:44 INFO mapred.JobClient:   Job Counters
13/06/04 13:41:44 INFO mapred.JobClient:     Launched reduce tasks=2
13/06/04 13:41:44 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=198184
13/06/04 13:41:44 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/06/04 13:41:44 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/06/04 13:41:44 INFO mapred.JobClient:     Launched map tasks=4
13/06/04 13:41:44 INFO mapred.JobClient:     Data-local map tasks=4
13/06/04 13:41:44 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=81790
13/06/04 13:41:44 INFO mapred.JobClient:   File Output Format Counters
13/06/04 13:41:44 INFO mapred.JobClient:     Bytes Written=297057497
13/06/04 13:41:44 INFO mapred.JobClient:   FileSystemCounters
13/06/04 13:41:44 INFO mapred.JobClient:     FILE_BYTES_READ=899236112
13/06/04 13:41:44 INFO mapred.JobClient:     HDFS_BYTES_READ=264084033
13/06/04 13:41:44 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1262356574
13/06/04 13:41:44 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=297057497
13/06/04 13:41:44 INFO mapred.JobClient:   File Input Format Counters
13/06/04 13:41:44 INFO mapred.JobClient:     Bytes Read=264083609
13/06/04 13:41:44 INFO mapred.JobClient:   Map-Reduce Framework
13/06/04 13:41:44 INFO mapred.JobClient:     Map output materialized bytes=363133337
13/06/04 13:41:44 INFO mapred.JobClient:     Map input records=16522438
13/06/04 13:41:44 INFO mapred.JobClient:     Reduce shuffle bytes=363133337
13/06/04 13:41:44 INFO mapred.JobClient:     Spilled Records=57419402
13/06/04 13:41:44 INFO mapred.JobClient:     Map output bytes=330165166
13/06/04 13:41:44 INFO mapred.JobClient:     CPU time spent (ms)=97010
13/06/04 13:41:44 INFO mapred.JobClient:     Total committed heap usage (bytes)=673988608
13/06/04 13:41:44 INFO mapred.JobClient:     Combine input records=33041386
13/06/04 13:41:44 INFO mapred.JobClient:     SPLIT_RAW_BYTES=424
13/06/04 13:41:44 INFO mapred.JobClient:     Reduce input records=16518948
13/06/04 13:41:44 INFO mapred.JobClient:     Reduce input groups=16518948
13/06/04 13:41:44 INFO mapred.JobClient:     Combine output records=33037896
13/06/04 13:41:44 INFO mapred.JobClient:     Physical memory (bytes) snapshot=838295552
13/06/04 13:41:44 INFO mapred.JobClient:     Reduce output records=16518948
13/06/04 13:41:44 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2260516864
13/06/04 13:41:44 INFO mapred.JobClient:     Map output records=16522438

---------------------mapReduce's sample-----------------------------------

------------------------------add another server~-----------------------------------

1) change host name

/etc/sysconfig/network

add node03

2) hadoop tar xvf

#node01 server

(1) conf/slaves add node03

[root@localhost conf]# pwd
/home/root/hadoop-1.0.4/conf

[root@localhost conf]# vi slaves

add node03

(2) /etc/hosts add node03

[root@localhost conf]# vi /etc/hosts

192.168.56.102 node01

192.168.56.103 node02
192.168.56.104 node03

(3) /etc/profile, hosts, conf files copy node01-> node03 also copy ndoe01 -> node2

root@localhost conf]# scp /etc/hosts node03:/etc/

root@localhost conf]# scp /etc/hosts node02:/etc/

[root@localhost conf]# scp /home/root/hadoop-1.0.4/conf/* node03:/home/root/hadoop-1.0.4/conf/

[root@localhost conf]# scp /home/root/hadoop-1.0.4/conf/* node02:/home/root/hadoop-1.0.4/conf/

[root@localhost conf]# scp /etc/profile node03:/etc/

if you set up pssh...you don't this work!

------------------------------add another server~-----------------------------------

-----------------------------start damon----------------------------------------------

[root@localhost conf]# ssh node03 /home/root/hadoop-1.0.4/bin/hadoop-daemon.sh start datanode

but you have to delete..iptalbes already start..

1) [root@localhost conf]# ssh node03 /home/root/hadoop-1.0.4/bin/hadoop-daemon.sh stop datanode

2) root@localhost conf]# ssh node03 service iptables stop
iptables: Flushing firewall rules: [ OK ]
iptables: Setting chains to policy ACCEPT: filter [ OK ]
iptables: Unloading modules: [ OK ]

3) [root@localhost conf]# hadoop dfsadmin -report

result : Datanodes available: 3 (3 total, 0 dead)

-----------------------------start damon----------------------------------------------

---------------balancing--------------------------------------

[root@localhost conf]# hadoop balancer -threshold 1

[root@localhost conf]# hadoop balancer -threshold 1
Warning: $HADOOP_HOME is deprecated.

13/06/04 14:14:15 INFO balancer.Balancer: Using a threshold of 1.0
Time Stamp               Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved
13/06/04 14:14:16 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.56.102:50010
13/06/04 14:14:16 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.56.104:50010
13/06/04 14:14:16 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.56.103:50010
13/06/04 14:14:16 INFO balancer.Balancer: 2 over utilized nodes: 192.168.56.102:50010 192.168.56.103:50010
13/06/04 14:14:16 INFO balancer.Balancer: 1 under utilized nodes: 192.168.56.104:50010
13/06/04 14:14:16 INFO balancer.Balancer: Need to move 349.76 MB bytes to make the cluster balanced.
13/06/04 14:14:16 INFO balancer.Balancer: Decided to move 161.66 MB bytes from 192.168.56.102:50010 to 192.168.56.104:50010
13/06/04 14:14:16 INFO balancer.Balancer: Will move 161.66 MBbytes in this iteration
Jun 4, 2013 2:14:16 PM            0                 0 KB           349.76 MB          161.66 MB
13/06/04 14:17:09 INFO balancer.Balancer: Moving block 8103612282426249076 from 192.168.56.102:50010 to 192.168.56.104:50010 through 192.168.56.103:50010 is succeeded.
13/06/04 14:17:12 INFO balancer.Balancer: Moving block -1466839656797754499 from 192.168.56.102:50010 to 192.168.56.104:50010 through 192.168.56.103:50010 is succeeded.
13/06/04 14:17:24 INFO balancer.Balancer: Moving block -3592843786766369632 from 192.168.56.102:50010 to 192.168.56.104:50010 through 192.168.56.103:50010 is succeeded.
13/06/04 14:17:46 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.56.103:50010
13/06/04 14:17:46 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.56.104:50010
13/06/04 14:17:46 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.56.102:50010
13/06/04 14:17:46 INFO balancer.Balancer: 1 over utilized nodes: 192.168.56.103:50010
13/06/04 14:17:46 INFO balancer.Balancer: 1 under utilized nodes: 192.168.56.104:50010
13/06/04 14:17:46 INFO balancer.Balancer: Need to move 123.12 MB bytes to make the cluster balanced.
13/06/04 14:17:46 INFO balancer.Balancer: Decided to move 161.66 MB bytes from 192.168.56.103:50010 to 192.168.56.104:50010
13/06/04 14:17:46 INFO balancer.Balancer: Will move 161.66 MBbytes in this iteration
Jun 4, 2013 2:17:46 PM            1            187.84 MB           123.12 MB          161.66 MB
13/06/04 14:18:39 INFO balancer.Balancer: Moving block 715304243492723016 from 192.168.56.103:50010 to 192.168.56.104:50010 through 192.168.56.103:50010 is succeeded.
13/06/04 14:21:11 INFO balancer.Balancer: Moving block 4843232263157045166 from 192.168.56.103:50010 to 192.168.56.104:50010 through 192.168.56.103:50010 is succeeded.
13/06/04 14:21:12 INFO balancer.Balancer: Moving block -8956265055413503717 from 192.168.56.103:50010 to 192.168.56.104:50010 through 192.168.56.102:50010 is succeeded.
13/06/04 14:21:13 INFO balancer.Balancer: Moving block -5391126869938599807 from 192.168.56.103:50010 to 192.168.56.104:50010 through 192.168.56.103:50010 is succeeded.
13/06/04 14:21:16 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.56.104:50010
13/06/04 14:21:16 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.56.103:50010
13/06/04 14:21:16 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.56.102:50010
13/06/04 14:21:16 INFO balancer.Balancer: 0 over utilized nodes:
13/06/04 14:21:16 INFO balancer.Balancer: 0 under utilized nodes:
The cluster is balanced. Exiting...
Balancing took 7.005833333333333 minutes

---------------balancing--------------------------------------

----------------map, Reduce sample --------------------------

[root@localhost hadoop-1.0.4]# javac -classpath hadoop-core-1.0.4.jar:lib/commons-cli-1.2.jar -d sample_source/classes/ sample_source/sources/WordCount02.java

[root@localhost hadoop-1.0.4]# jar -cvf sample_source/WordCount.jar -C sample_source/classes .

root@localhost hadoop-1.0.4]# hadoop jar sample_source/WordCount.jar WordCount02.WordCount02 /usr/root/test/cite.txt /usr/root/test/output/wc02

WordCount02.java

----------------map, Reduce sample --------------------------

하둡 관련 좋은 사이트

http://fniko.tistory.com/4

저작자표시 비영리 변경금지 (새창열림)