[하둡교육 2일차] Hadoop 설치
하둡을 설치 하기 위해 교육은 오라클 버추얼 박스로 진행 하였다.
아쉽게 전체적인 셋팅을 소개 할 수는 없을 것 같다. 강사님이 미리 준비해오고 셋팅 다해오셨다 ㅋㅋ;;
1. JDK 설치
OpenJdk는 하둡에서 오류가 난다고 한다. 오라클 sun jdk를 셋팅 하도록 하자.
2. JAVA_HOME 설정
보통 일반 계정이라면 .bash_profile에 셋팅을 해주지만 이번 교육에서는 root로 잡아줬기 때문에
/etc/profile에서 PATH를 잡아주었다.
3. 하둡 복사 및 설치
하둡..이것 또한 미리 가져와있다 ㅋㅋ; hadoop-1.0.4-bin.tar.gz이라는 파일!!
tar xvf hadoop-1.0.4-bin.tar.gz 로 압축을 해제 해준다.
자주 쓰는 하둡 명령어
confirm daemon
[root@localhost conf]# hadoop dfsadmin -report
start-all.sh
stop-all.sh
ssh
ssh-keygen
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
/media/sf_shared/sample_data
root@localhost sample_data]# hadoop fs -put /media/sf_shared/sample_data/cite75_99.txt /usr/root/test/cite.txt
hadoop fs -put /media/sf_shared/sample_data/apat63_99.txt /usr/root/test/apat.txt
[root@localhost usr]# hadoop fs -lsr /
Warning: $HADOOP_HOME is deprecated.
drwxr-xr-x - root supergroup 0 2013-06-04 13:14 /home
drwxr-xr-x - root supergroup 0 2013-06-04 13:14 /home/root
drwxr-xr-x - root supergroup 0 2013-06-04 13:14 /home/root/hadoop-1.0.4
drwxr-xr-x - root supergroup 0 2013-06-04 13:14 /home/root/hadoop-1.0.4/tmp
drwxr-xr-x - root supergroup 0 2013-06-04 13:21 /home/root/hadoop-1.0.4/tmp/mapred
drwx------ - root supergroup 0 2013-06-04 13:21 /home/root/hadoop-1.0.4/tmp/mapred/system
-rw------- 2 root supergroup 4 2013-06-04 13:21 /home/root/hadoop-1.0.4/tmp/mapred/system/jobtracker.info
drwxr-xr-x - root supergroup 0 2013-06-04 13:27 /usr
drwxr-xr-x - root supergroup 0 2013-06-04 13:27 /usr/root
drwxr-xr-x - root supergroup 0 2013-06-04 13:29 /usr/root/test
-rw-r--r-- 2 root supergroup 236902953 2013-06-04 13:29 /usr/root/test/apat.txt
-rw-r--r-- 2 root supergroup 264075414 2013-06-04 13:27 /usr/root/test/cite.txt
[root@localhost usr]# hadoop fsck /usr/root/test/cite.txt -files -blocks -locations -racks
Warning: $HADOOP_HOME is deprecated.
FSCK started by root from /192.168.56.102 for path /usr/root/test/cite.txt at Tue Jun 04 13:30:27 KST 2013
/usr/root/test/cite.txt 264075414 bytes, 4 block(s): OK
0. blk_1184698969449250244_1005 len=67108864 repl=2 [/default-rack/192.168.56.103:50010, /default-rack/192.168.56.102:50010]
1. blk_-4684336584511364745_1005 len=67108864 repl=2 [/default-rack/192.168.56.103:50010, /default-rack/192.168.56.102:50010]
2. blk_4237572107572422623_1005 len=67108864 repl=2 [/default-rack/192.168.56.103:50010, /default-rack/192.168.56.102:50010]
3. blk_-3592843786766369632_1005 len=62748822 repl=2 [/default-rack/192.168.56.103:50010, /default-rack/192.168.56.102:50010]
Status: HEALTHY
Total size: 264075414 B
Total dirs: 0
Total files: 1
Total blocks (validated): 4 (avg. block size 66018853 B)
Minimally replicated blocks: 4 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 2
Average block replication: 2.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 2
Number of racks: 1
FSCK ended at Tue Jun 04 13:30:27 KST 2013 in 2 milliseconds
---------------------mapReduce's sample-----------------------------------
hadoop-examples-1.0.4.jar
jar <jar> run a jar file
[root@localhost hadoop-1.0.4]# hadoop jar hadoop-examples-1.0.4.jar
Warning: $HADOOP_HOME is deprecated.
An example program must be given as the first argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
dbcount: An example job that count the pageview counts from a database.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using monte-carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sleep: A job that sleeps at each map and reduce task.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
Usage: wordcount <in> <out>
[root@localhost hadoop-1.0.4]# hadoop jar hadoop-examples-1.0.4.jar wordcount /usr/root/test/cite.txt /usr/root/test/output/wc
Warning: $HADOOP_HOME is deprecated.
13/06/04 13:39:55 INFO input.FileInputFormat: Total input paths to process : 1
13/06/04 13:39:55 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/06/04 13:39:55 WARN snappy.LoadSnappy: Snappy native library not loaded
13/06/04 13:39:55 INFO mapred.JobClient: Running job: job_201306041320_0001
13/06/04 13:39:56 INFO mapred.JobClient: map 0% reduce 0%
13/06/04 13:40:13 INFO mapred.JobClient: map 13% reduce 0%
13/06/04 13:40:16 INFO mapred.JobClient: map 19% reduce 0%
13/06/04 13:40:17 INFO mapred.JobClient: map 33% reduce 0%
13/06/04 13:40:19 INFO mapred.JobClient: map 39% reduce 0%
13/06/04 13:40:20 INFO mapred.JobClient: map 45% reduce 0%
13/06/04 13:40:22 INFO mapred.JobClient: map 52% reduce 0%
13/06/04 13:40:23 INFO mapred.JobClient: map 59% reduce 0%
13/06/04 13:40:25 INFO mapred.JobClient: map 66% reduce 0%
13/06/04 13:40:26 INFO mapred.JobClient: map 71% reduce 0%
13/06/04 13:40:28 INFO mapred.JobClient: map 78% reduce 0%
13/06/04 13:40:29 INFO mapred.JobClient: map 85% reduce 0%
13/06/04 13:40:32 INFO mapred.JobClient: map 92% reduce 0%
13/06/04 13:40:35 INFO mapred.JobClient: map 100% reduce 0%
13/06/04 13:41:08 INFO mapred.JobClient: map 100% reduce 8%
13/06/04 13:41:14 INFO mapred.JobClient: map 100% reduce 42%
13/06/04 13:41:15 INFO mapred.JobClient: map 100% reduce 67%
13/06/04 13:41:17 INFO mapred.JobClient: map 100% reduce 72%
13/06/04 13:41:20 INFO mapred.JobClient: map 100% reduce 76%
13/06/04 13:41:23 INFO mapred.JobClient: map 100% reduce 83%
13/06/04 13:41:26 INFO mapred.JobClient: map 100% reduce 87%
13/06/04 13:41:29 INFO mapred.JobClient: map 100% reduce 89%
13/06/04 13:41:30 INFO mapred.JobClient: map 100% reduce 93%
13/06/04 13:41:38 INFO mapred.JobClient: map 100% reduce 98%
13/06/04 13:41:39 INFO mapred.JobClient: map 100% reduce 100%
13/06/04 13:41:44 INFO mapred.JobClient: Job complete: job_201306041320_0001
13/06/04 13:41:44 INFO mapred.JobClient: Counters: 29
13/06/04 13:41:44 INFO mapred.JobClient: Job Counters
13/06/04 13:41:44 INFO mapred.JobClient: Launched reduce tasks=2
13/06/04 13:41:44 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=198184
13/06/04 13:41:44 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/06/04 13:41:44 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/06/04 13:41:44 INFO mapred.JobClient: Launched map tasks=4
13/06/04 13:41:44 INFO mapred.JobClient: Data-local map tasks=4
13/06/04 13:41:44 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=81790
13/06/04 13:41:44 INFO mapred.JobClient: File Output Format Counters
13/06/04 13:41:44 INFO mapred.JobClient: Bytes Written=297057497
13/06/04 13:41:44 INFO mapred.JobClient: FileSystemCounters
13/06/04 13:41:44 INFO mapred.JobClient: FILE_BYTES_READ=899236112
13/06/04 13:41:44 INFO mapred.JobClient: HDFS_BYTES_READ=264084033
13/06/04 13:41:44 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1262356574
13/06/04 13:41:44 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=297057497
13/06/04 13:41:44 INFO mapred.JobClient: File Input Format Counters
13/06/04 13:41:44 INFO mapred.JobClient: Bytes Read=264083609
13/06/04 13:41:44 INFO mapred.JobClient: Map-Reduce Framework
13/06/04 13:41:44 INFO mapred.JobClient: Map output materialized bytes=363133337
13/06/04 13:41:44 INFO mapred.JobClient: Map input records=16522438
13/06/04 13:41:44 INFO mapred.JobClient: Reduce shuffle bytes=363133337
13/06/04 13:41:44 INFO mapred.JobClient: Spilled Records=57419402
13/06/04 13:41:44 INFO mapred.JobClient: Map output bytes=330165166
13/06/04 13:41:44 INFO mapred.JobClient: CPU time spent (ms)=97010
13/06/04 13:41:44 INFO mapred.JobClient: Total committed heap usage (bytes)=673988608
13/06/04 13:41:44 INFO mapred.JobClient: Combine input records=33041386
13/06/04 13:41:44 INFO mapred.JobClient: SPLIT_RAW_BYTES=424
13/06/04 13:41:44 INFO mapred.JobClient: Reduce input records=16518948
13/06/04 13:41:44 INFO mapred.JobClient: Reduce input groups=16518948
13/06/04 13:41:44 INFO mapred.JobClient: Combine output records=33037896
13/06/04 13:41:44 INFO mapred.JobClient: Physical memory (bytes) snapshot=838295552
13/06/04 13:41:44 INFO mapred.JobClient: Reduce output records=16518948
13/06/04 13:41:44 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2260516864
13/06/04 13:41:44 INFO mapred.JobClient: Map output records=16522438
---------------------mapReduce's sample-----------------------------------
------------------------------add another server~-----------------------------------
1) change host name
/etc/sysconfig/network
add node03
2) hadoop tar xvf
#node01 server
(1) conf/slaves add node03
[root@localhost conf]# pwd
/home/root/hadoop-1.0.4/conf
[root@localhost conf]# vi slaves
add node03
(2) /etc/hosts add node03
[root@localhost conf]# vi /etc/hosts
192.168.56.102 node01192.168.56.103 node02
192.168.56.104 node03
(3) /etc/profile, hosts, conf files copy node01-> node03 also copy ndoe01 -> node2
root@localhost conf]# scp /etc/hosts node03:/etc/
root@localhost conf]# scp /etc/hosts node02:/etc/
[root@localhost conf]# scp /home/root/hadoop-1.0.4/conf/* node03:/home/root/hadoop-1.0.4/conf/
[root@localhost conf]# scp /home/root/hadoop-1.0.4/conf/* node02:/home/root/hadoop-1.0.4/conf/
[root@localhost conf]# scp /etc/profile node03:/etc/
if you set up pssh...you don't this work!
------------------------------add another server~-----------------------------------
-----------------------------start damon----------------------------------------------
[root@localhost conf]# ssh node03 /home/root/hadoop-1.0.4/bin/hadoop-daemon.sh start datanode
but you have to delete..iptalbes already start..
1) [root@localhost conf]# ssh node03 /home/root/hadoop-1.0.4/bin/hadoop-daemon.sh stop datanode
2) root@localhost conf]# ssh node03 service iptables stop
iptables: Flushing firewall rules: [ OK ]
iptables: Setting chains to policy ACCEPT: filter [ OK ]
iptables: Unloading modules: [ OK ]
3) [root@localhost conf]# hadoop dfsadmin -report
result : Datanodes available: 3 (3 total, 0 dead)
-----------------------------start damon----------------------------------------------
---------------balancing--------------------------------------
[root@localhost conf]# hadoop balancer -threshold 1
[root@localhost conf]# hadoop balancer -threshold 1
Warning: $HADOOP_HOME is deprecated.
13/06/04 14:14:15 INFO balancer.Balancer: Using a threshold of 1.0
Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved
13/06/04 14:14:16 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.56.102:50010
13/06/04 14:14:16 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.56.104:50010
13/06/04 14:14:16 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.56.103:50010
13/06/04 14:14:16 INFO balancer.Balancer: 2 over utilized nodes: 192.168.56.102:50010 192.168.56.103:50010
13/06/04 14:14:16 INFO balancer.Balancer: 1 under utilized nodes: 192.168.56.104:50010
13/06/04 14:14:16 INFO balancer.Balancer: Need to move 349.76 MB bytes to make the cluster balanced.
13/06/04 14:14:16 INFO balancer.Balancer: Decided to move 161.66 MB bytes from 192.168.56.102:50010 to 192.168.56.104:50010
13/06/04 14:14:16 INFO balancer.Balancer: Will move 161.66 MBbytes in this iteration
Jun 4, 2013 2:14:16 PM 0 0 KB 349.76 MB 161.66 MB
13/06/04 14:17:09 INFO balancer.Balancer: Moving block 8103612282426249076 from 192.168.56.102:50010 to 192.168.56.104:50010 through 192.168.56.103:50010 is succeeded.
13/06/04 14:17:12 INFO balancer.Balancer: Moving block -1466839656797754499 from 192.168.56.102:50010 to 192.168.56.104:50010 through 192.168.56.103:50010 is succeeded.
13/06/04 14:17:24 INFO balancer.Balancer: Moving block -3592843786766369632 from 192.168.56.102:50010 to 192.168.56.104:50010 through 192.168.56.103:50010 is succeeded.
13/06/04 14:17:46 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.56.103:50010
13/06/04 14:17:46 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.56.104:50010
13/06/04 14:17:46 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.56.102:50010
13/06/04 14:17:46 INFO balancer.Balancer: 1 over utilized nodes: 192.168.56.103:50010
13/06/04 14:17:46 INFO balancer.Balancer: 1 under utilized nodes: 192.168.56.104:50010
13/06/04 14:17:46 INFO balancer.Balancer: Need to move 123.12 MB bytes to make the cluster balanced.
13/06/04 14:17:46 INFO balancer.Balancer: Decided to move 161.66 MB bytes from 192.168.56.103:50010 to 192.168.56.104:50010
13/06/04 14:17:46 INFO balancer.Balancer: Will move 161.66 MBbytes in this iteration
Jun 4, 2013 2:17:46 PM 1 187.84 MB 123.12 MB 161.66 MB
13/06/04 14:18:39 INFO balancer.Balancer: Moving block 715304243492723016 from 192.168.56.103:50010 to 192.168.56.104:50010 through 192.168.56.103:50010 is succeeded.
13/06/04 14:21:11 INFO balancer.Balancer: Moving block 4843232263157045166 from 192.168.56.103:50010 to 192.168.56.104:50010 through 192.168.56.103:50010 is succeeded.
13/06/04 14:21:12 INFO balancer.Balancer: Moving block -8956265055413503717 from 192.168.56.103:50010 to 192.168.56.104:50010 through 192.168.56.102:50010 is succeeded.
13/06/04 14:21:13 INFO balancer.Balancer: Moving block -5391126869938599807 from 192.168.56.103:50010 to 192.168.56.104:50010 through 192.168.56.103:50010 is succeeded.
13/06/04 14:21:16 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.56.104:50010
13/06/04 14:21:16 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.56.103:50010
13/06/04 14:21:16 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.56.102:50010
13/06/04 14:21:16 INFO balancer.Balancer: 0 over utilized nodes:
13/06/04 14:21:16 INFO balancer.Balancer: 0 under utilized nodes:
The cluster is balanced. Exiting...
Balancing took 7.005833333333333 minutes
---------------balancing--------------------------------------
----------------map, Reduce sample --------------------------
[root@localhost hadoop-1.0.4]# javac -classpath hadoop-core-1.0.4.jar:lib/commons-cli-1.2.jar -d sample_source/classes/ sample_source/sources/WordCount02.java
[root@localhost hadoop-1.0.4]# jar -cvf sample_source/WordCount.jar -C sample_source/classes .
root@localhost hadoop-1.0.4]# hadoop jar sample_source/WordCount.jar WordCount02.WordCount02 /usr/root/test/cite.txt /usr/root/test/output/wc02
----------------map, Reduce sample --------------------------
하둡 관련 좋은 사이트