hadoop distcp
$ ./hadoop distcp
usage: distcp OPTIONS [source_path...] <target_path>
OPTIONS
-append Reuse existing data in target files and
append new data to them if possible
-async Should distcp execution be blocking
-atomic Commit all changes or none
-bandwidth <arg> Specify bandwidth per map in MB
-delete Delete from target, files missing in source
-diff <arg> Use snapshot diff report to identify the
difference between source and target
-f <arg> List of files that need to be copied
-filelimit <arg> (Deprecated!) Limit number of files copied
to <= n
-filters <arg> The path to a file containing a list of
strings for paths to be excluded from the
copy.
-i Ignore failures during copy
-log <arg> Folder on DFS where distcp execution logs
are saved
-m <arg> Max number of concurrent maps to use for
copy
-mapredSslConf <arg> Configuration for ssl config file, to use
with hftps://
-numListstatusThreads <arg> Number of threads to use for building file
listing (max 40).
-overwrite Choose to overwrite target files
unconditionally, even if they exist.
-p <arg> preserve status (rbugpcaxt)(replication,
block-size, user, group, permission,
checksum-type, ACL, XATTR, timestamps). If
-p is specified with no <arg>, then
preserves replication, block size, user,
group, permission, checksum type and
timestamps. raw.* xattrs are preserved when
both the source and destination paths are
in the /.reserved/raw hierarchy (HDFS
only). raw.* xattrpreservation is
independent of the -p flag. Refer to the
DistCp documentation for more details.
-sizelimit <arg> (Deprecated!) Limit number of files copied
to <= n bytes
-skipcrccheck Whether to skip CRC checks between source
and target paths.
-strategy <arg> Copy strategy to use. Default is dividing
work based on file sizes
-tmp <arg> Intermediate work path to be used for
atomic commit
-update Update target, copying only missingfiles or
directories