
[하둡교육 2일차] Hadoop 설치

태하팍 2013. 6. 4. 13:22

하둡을 설치 하기 위해 교육은 오라클 버추얼 박스로 진행 하였다.

아쉽게 전체적인 셋팅을 소개 할 수는 없을 것 같다. 강사님이 미리 준비해오고 셋팅 다해오셨다 ㅋㅋ;;

1. JDK 설치

  OpenJdk는 하둡에서 오류가 난다고 한다. 오라클 sun jdk를 셋팅 하도록 하자.


  보통 일반 계정이라면 .bash_profile에 셋팅을 해주지만 이번 교육에서는 root로 잡아줬기 때문에

  /etc/profile에서 PATH를 잡아주었다.

3. 하둡 복사 및 설치

   하둡..이것 또한 미리 가져와있다 ㅋㅋ; hadoop-1.0.4-bin.tar.gz이라는 파일!!

   tar xvf hadoop-1.0.4-bin.tar.gz 로 압축을 해제 해준다.

자주 쓰는 하둡 명령어

confirm daemon

[root@localhost conf]# hadoop dfsadmin -report

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/ >> ~/.ssh/authorized_keys


root@localhost sample_data]# hadoop fs -put /media/sf_shared/sample_data/cite75_99.txt /usr/root/test/cite.txt

hadoop  fs -put /media/sf_shared/sample_data/apat63_99.txt /usr/root/test/apat.txt

[root@localhost usr]# hadoop fs -lsr /
Warning: $HADOOP_HOME is deprecated.

drwxr-xr-x   - root supergroup          0 2013-06-04 13:14 /home
drwxr-xr-x   - root supergroup          0 2013-06-04 13:14 /home/root
drwxr-xr-x   - root supergroup          0 2013-06-04 13:14 /home/root/hadoop-1.0.4
drwxr-xr-x   - root supergroup          0 2013-06-04 13:14 /home/root/hadoop-1.0.4/tmp
drwxr-xr-x   - root supergroup          0 2013-06-04 13:21 /home/root/hadoop-1.0.4/tmp/mapred
drwx------   - root supergroup          0 2013-06-04 13:21 /home/root/hadoop-1.0.4/tmp/mapred/system
-rw-------   2 root supergroup          4 2013-06-04 13:21 /home/root/hadoop-1.0.4/tmp/mapred/system/
drwxr-xr-x   - root supergroup          0 2013-06-04 13:27 /usr
drwxr-xr-x   - root supergroup          0 2013-06-04 13:27 /usr/root
drwxr-xr-x   - root supergroup          0 2013-06-04 13:29 /usr/root/test
-rw-r--r--   2 root supergroup  236902953 2013-06-04 13:29 /usr/root/test/apat.txt
-rw-r--r--   2 root supergroup  264075414 2013-06-04 13:27 /usr/root/test/cite.txt

[root@localhost usr]# hadoop fsck /usr/root/test/cite.txt -files -blocks -locations -racks
Warning: $HADOOP_HOME is deprecated.

FSCK started by root from / for path /usr/root/test/cite.txt at Tue Jun 04 13:30:27 KST 2013
/usr/root/test/cite.txt 264075414 bytes, 4 block(s):  OK
0. blk_1184698969449250244_1005 len=67108864 repl=2 [/default-rack/, /default-rack/]
1. blk_-4684336584511364745_1005 len=67108864 repl=2 [/default-rack/, /default-rack/]
2. blk_4237572107572422623_1005 len=67108864 repl=2 [/default-rack/, /default-rack/]
3. blk_-3592843786766369632_1005 len=62748822 repl=2 [/default-rack/, /default-rack/]

 Total size:    264075414 B
 Total dirs:    0
 Total files:    1
 Total blocks (validated):    4 (avg. block size 66018853 B)
 Minimally replicated blocks:    4 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    2
 Average block replication:    2.0
 Corrupt blocks:        0
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        2
 Number of racks:        1
FSCK ended at Tue Jun 04 13:30:27 KST 2013 in 2 milliseconds

---------------------mapReduce's sample-----------------------------------


jar <jar>            run a jar file

[root@localhost hadoop-1.0.4]# hadoop jar hadoop-examples-1.0.4.jar
Warning: $HADOOP_HOME is deprecated.

An example program must be given as the first argument.
Valid program names are:
  aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
  aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
  dbcount: An example job that count the pageview counts from a database.
  grep: A map/reduce program that counts the matches of a regex in the input.
  join: A job that effects a join over sorted, equally partitioned datasets
  multifilewc: A job that counts words from several files.
  pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
  pi: A map/reduce program that estimates Pi using monte-carlo method.
  randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
  randomwriter: A map/reduce program that writes 10GB of random data per node.
  secondarysort: An example defining a secondary sort to the reduce.
  sleep: A job that sleeps at each map and reduce task.
  sort: A map/reduce program that sorts the data written by the random writer.
  sudoku: A sudoku solver.
  teragen: Generate data for the terasort
  terasort: Run the terasort
  teravalidate: Checking results of terasort
  wordcount: A map/reduce program that counts the words in the input files.

Usage: wordcount <in> <out>

[root@localhost hadoop-1.0.4]# hadoop jar hadoop-examples-1.0.4.jar wordcount /usr/root/test/cite.txt /usr/root/test/output/wc
Warning: $HADOOP_HOME is deprecated.

13/06/04 13:39:55 INFO input.FileInputFormat: Total input paths to process : 1
13/06/04 13:39:55 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/06/04 13:39:55 WARN snappy.LoadSnappy: Snappy native library not loaded
13/06/04 13:39:55 INFO mapred.JobClient: Running job: job_201306041320_0001
13/06/04 13:39:56 INFO mapred.JobClient:  map 0% reduce 0%
13/06/04 13:40:13 INFO mapred.JobClient:  map 13% reduce 0%
13/06/04 13:40:16 INFO mapred.JobClient:  map 19% reduce 0%
13/06/04 13:40:17 INFO mapred.JobClient:  map 33% reduce 0%
13/06/04 13:40:19 INFO mapred.JobClient:  map 39% reduce 0%
13/06/04 13:40:20 INFO mapred.JobClient:  map 45% reduce 0%
13/06/04 13:40:22 INFO mapred.JobClient:  map 52% reduce 0%
13/06/04 13:40:23 INFO mapred.JobClient:  map 59% reduce 0%
13/06/04 13:40:25 INFO mapred.JobClient:  map 66% reduce 0%
13/06/04 13:40:26 INFO mapred.JobClient:  map 71% reduce 0%
13/06/04 13:40:28 INFO mapred.JobClient:  map 78% reduce 0%
13/06/04 13:40:29 INFO mapred.JobClient:  map 85% reduce 0%
13/06/04 13:40:32 INFO mapred.JobClient:  map 92% reduce 0%
13/06/04 13:40:35 INFO mapred.JobClient:  map 100% reduce 0%
13/06/04 13:41:08 INFO mapred.JobClient:  map 100% reduce 8%
13/06/04 13:41:14 INFO mapred.JobClient:  map 100% reduce 42%
13/06/04 13:41:15 INFO mapred.JobClient:  map 100% reduce 67%
13/06/04 13:41:17 INFO mapred.JobClient:  map 100% reduce 72%
13/06/04 13:41:20 INFO mapred.JobClient:  map 100% reduce 76%
13/06/04 13:41:23 INFO mapred.JobClient:  map 100% reduce 83%
13/06/04 13:41:26 INFO mapred.JobClient:  map 100% reduce 87%
13/06/04 13:41:29 INFO mapred.JobClient:  map 100% reduce 89%
13/06/04 13:41:30 INFO mapred.JobClient:  map 100% reduce 93%
13/06/04 13:41:38 INFO mapred.JobClient:  map 100% reduce 98%
13/06/04 13:41:39 INFO mapred.JobClient:  map 100% reduce 100%
13/06/04 13:41:44 INFO mapred.JobClient: Job complete: job_201306041320_0001
13/06/04 13:41:44 INFO mapred.JobClient: Counters: 29
13/06/04 13:41:44 INFO mapred.JobClient:   Job Counters
13/06/04 13:41:44 INFO mapred.JobClient:     Launched reduce tasks=2
13/06/04 13:41:44 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=198184
13/06/04 13:41:44 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/06/04 13:41:44 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/06/04 13:41:44 INFO mapred.JobClient:     Launched map tasks=4
13/06/04 13:41:44 INFO mapred.JobClient:     Data-local map tasks=4
13/06/04 13:41:44 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=81790
13/06/04 13:41:44 INFO mapred.JobClient:   File Output Format Counters
13/06/04 13:41:44 INFO mapred.JobClient:     Bytes Written=297057497
13/06/04 13:41:44 INFO mapred.JobClient:   FileSystemCounters
13/06/04 13:41:44 INFO mapred.JobClient:     FILE_BYTES_READ=899236112
13/06/04 13:41:44 INFO mapred.JobClient:     HDFS_BYTES_READ=264084033
13/06/04 13:41:44 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1262356574
13/06/04 13:41:44 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=297057497
13/06/04 13:41:44 INFO mapred.JobClient:   File Input Format Counters
13/06/04 13:41:44 INFO mapred.JobClient:     Bytes Read=264083609
13/06/04 13:41:44 INFO mapred.JobClient:   Map-Reduce Framework
13/06/04 13:41:44 INFO mapred.JobClient:     Map output materialized bytes=363133337
13/06/04 13:41:44 INFO mapred.JobClient:     Map input records=16522438
13/06/04 13:41:44 INFO mapred.JobClient:     Reduce shuffle bytes=363133337
13/06/04 13:41:44 INFO mapred.JobClient:     Spilled Records=57419402
13/06/04 13:41:44 INFO mapred.JobClient:     Map output bytes=330165166
13/06/04 13:41:44 INFO mapred.JobClient:     CPU time spent (ms)=97010
13/06/04 13:41:44 INFO mapred.JobClient:     Total committed heap usage (bytes)=673988608
13/06/04 13:41:44 INFO mapred.JobClient:     Combine input records=33041386
13/06/04 13:41:44 INFO mapred.JobClient:     SPLIT_RAW_BYTES=424
13/06/04 13:41:44 INFO mapred.JobClient:     Reduce input records=16518948
13/06/04 13:41:44 INFO mapred.JobClient:     Reduce input groups=16518948
13/06/04 13:41:44 INFO mapred.JobClient:     Combine output records=33037896
13/06/04 13:41:44 INFO mapred.JobClient:     Physical memory (bytes) snapshot=838295552
13/06/04 13:41:44 INFO mapred.JobClient:     Reduce output records=16518948
13/06/04 13:41:44 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2260516864
13/06/04 13:41:44 INFO mapred.JobClient:     Map output records=16522438

---------------------mapReduce's sample-----------------------------------

------------------------------add another server~-----------------------------------

1) change host name


add node03

2) hadoop tar xvf

#node01 server

(1) conf/slaves add node03

[root@localhost conf]# pwd

[root@localhost conf]# vi slaves 

add node03

(2) /etc/hosts  add node03

[root@localhost conf]# vi /etc/hosts node01 node02 node03

(3) /etc/profile, hosts, conf files copy node01-> node03  also copy ndoe01 -> node2

root@localhost conf]# scp /etc/hosts node03:/etc/

root@localhost conf]# scp /etc/hosts node02:/etc/

[root@localhost conf]# scp /home/root/hadoop-1.0.4/conf/* node03:/home/root/hadoop-1.0.4/conf/

[root@localhost conf]# scp /home/root/hadoop-1.0.4/conf/* node02:/home/root/hadoop-1.0.4/conf/

[root@localhost conf]# scp /etc/profile node03:/etc/

if you set up don't this work!

------------------------------add another server~-----------------------------------

-----------------------------start damon----------------------------------------------

[root@localhost conf]# ssh node03 /home/root/hadoop-1.0.4/bin/ start datanode

but you have to delete..iptalbes already start..

1) [root@localhost conf]# ssh node03 /home/root/hadoop-1.0.4/bin/ stop datanode

2) root@localhost conf]# ssh node03 service iptables stop
iptables: Flushing firewall rules: [  OK  ]
iptables: Setting chains to policy ACCEPT: filter [  OK  ]
iptables: Unloading modules: [  OK  ]

3) [root@localhost conf]# hadoop dfsadmin -report

result :  Datanodes available: 3 (3 total, 0 dead)

-----------------------------start damon----------------------------------------------


[root@localhost conf]# hadoop balancer -threshold 1

[root@localhost conf]# hadoop balancer -threshold 1
Warning: $HADOOP_HOME is deprecated.

13/06/04 14:14:15 INFO balancer.Balancer: Using a threshold of 1.0
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved
13/06/04 14:14:16 INFO net.NetworkTopology: Adding a new node: /default-rack/
13/06/04 14:14:16 INFO net.NetworkTopology: Adding a new node: /default-rack/
13/06/04 14:14:16 INFO net.NetworkTopology: Adding a new node: /default-rack/
13/06/04 14:14:16 INFO balancer.Balancer: 2 over utilized nodes:
13/06/04 14:14:16 INFO balancer.Balancer: 1 under utilized nodes:
13/06/04 14:14:16 INFO balancer.Balancer: Need to move 349.76 MB bytes to make the cluster balanced.
13/06/04 14:14:16 INFO balancer.Balancer: Decided to move 161.66 MB bytes from to
13/06/04 14:14:16 INFO balancer.Balancer: Will move 161.66 MBbytes in this iteration
Jun 4, 2013 2:14:16 PM            0                 0 KB           349.76 MB          161.66 MB
13/06/04 14:17:09 INFO balancer.Balancer: Moving block 8103612282426249076 from to through is succeeded.
13/06/04 14:17:12 INFO balancer.Balancer: Moving block -1466839656797754499 from to through is succeeded.
13/06/04 14:17:24 INFO balancer.Balancer: Moving block -3592843786766369632 from to through is succeeded.
13/06/04 14:17:46 INFO net.NetworkTopology: Adding a new node: /default-rack/
13/06/04 14:17:46 INFO net.NetworkTopology: Adding a new node: /default-rack/
13/06/04 14:17:46 INFO net.NetworkTopology: Adding a new node: /default-rack/
13/06/04 14:17:46 INFO balancer.Balancer: 1 over utilized nodes:
13/06/04 14:17:46 INFO balancer.Balancer: 1 under utilized nodes:
13/06/04 14:17:46 INFO balancer.Balancer: Need to move 123.12 MB bytes to make the cluster balanced.
13/06/04 14:17:46 INFO balancer.Balancer: Decided to move 161.66 MB bytes from to
13/06/04 14:17:46 INFO balancer.Balancer: Will move 161.66 MBbytes in this iteration
Jun 4, 2013 2:17:46 PM            1            187.84 MB           123.12 MB          161.66 MB
13/06/04 14:18:39 INFO balancer.Balancer: Moving block 715304243492723016 from to through is succeeded.
13/06/04 14:21:11 INFO balancer.Balancer: Moving block 4843232263157045166 from to through is succeeded.
13/06/04 14:21:12 INFO balancer.Balancer: Moving block -8956265055413503717 from to through is succeeded.
13/06/04 14:21:13 INFO balancer.Balancer: Moving block -5391126869938599807 from to through is succeeded.
13/06/04 14:21:16 INFO net.NetworkTopology: Adding a new node: /default-rack/
13/06/04 14:21:16 INFO net.NetworkTopology: Adding a new node: /default-rack/
13/06/04 14:21:16 INFO net.NetworkTopology: Adding a new node: /default-rack/
13/06/04 14:21:16 INFO balancer.Balancer: 0 over utilized nodes:
13/06/04 14:21:16 INFO balancer.Balancer: 0 under utilized nodes:
The cluster is balanced. Exiting...
Balancing took 7.005833333333333 minutes


----------------map, Reduce sample --------------------------

[root@localhost hadoop-1.0.4]# javac -classpath hadoop-core-1.0.4.jar:lib/commons-cli-1.2.jar -d sample_source/classes/ sample_source/sources/

[root@localhost hadoop-1.0.4]# jar -cvf sample_source/WordCount.jar -C sample_source/classes .

root@localhost hadoop-1.0.4]# hadoop jar sample_source/WordCount.jar WordCount02.WordCount02 /usr/root/test/cite.txt /usr/root/test/output/wc02

----------------map, Reduce sample --------------------------

하둡 관련 좋은 사이트

