hadoop distcp

2016. 10. 28. 11:17BigDATA/Hadoop


$ ./hadoop distcp

usage: distcp OPTIONS [source_path...] <target_path>


 -append                       Reuse existing data in target files and

                               append new data to them if possible

 -async                        Should distcp execution be blocking

 -atomic                       Commit all changes or none

 -bandwidth <arg>              Specify bandwidth per map in MB

 -delete                       Delete from target, files missing in source

 -diff <arg>                   Use snapshot diff report to identify the

                               difference between source and target

 -f <arg>                      List of files that need to be copied

 -filelimit <arg>              (Deprecated!) Limit number of files copied

                               to <= n

 -filters <arg>                The path to a file containing a list of

                               strings for paths to be excluded from the


 -i                            Ignore failures during copy

 -log <arg>                    Folder on DFS where distcp execution logs

                               are saved

 -m <arg>                      Max number of concurrent maps to use for


 -mapredSslConf <arg>          Configuration for ssl config file, to use

                               with hftps://

 -numListstatusThreads <arg>   Number of threads to use for building file

                               listing (max 40).

 -overwrite                    Choose to overwrite target files

                               unconditionally, even if they exist.

 -p <arg>                      preserve status (rbugpcaxt)(replication,

                               block-size, user, group, permission,

                               checksum-type, ACL, XATTR, timestamps). If

                               -p is specified with no <arg>, then

                               preserves replication, block size, user,

                               group, permission, checksum type and

                               timestamps. raw.* xattrs are preserved when

                               both the source and destination paths are

                               in the /.reserved/raw hierarchy (HDFS

                               only). raw.* xattrpreservation is

                               independent of the -p flag. Refer to the

                               DistCp documentation for more details.

 -sizelimit <arg>              (Deprecated!) Limit number of files copied

                               to <= n bytes

 -skipcrccheck                 Whether to skip CRC checks between source

                               and target paths.

 -strategy <arg>               Copy strategy to use. Default is dividing

                               work based on file sizes

 -tmp <arg>                    Intermediate work path to be used for

                               atomic commit

 -update                       Update target, copying only missingfiles or