The Fastest for copy huge data


 
Thread Tools Search this Thread
Operating Systems Solaris The Fastest for copy huge data
# 8  
Old 09-16-2014
Quote:
Originally Posted by jim mcnamara
IF you have an ssh connection and have set up ssh-keys for an account that can write to /.
Where /parent is the the path primary member of /parent/path/to/files/

Code:
tar cf - ./path/to/files | ssh special_user@remoteserver ' cd /parent && tar xBf - '

This runs in one about half of the time of:
Code:
tar cf tarfile.tar
scp tarfile.tar remoteserver:
ssh remoteserver ' tar xf tarfile.tar'

When copying data via ssh pipe, always add "-e none" to the command in case any characters in the stream match the ssh escape characters:
Code:
tar cf - ./path/to/files | ssh -e none special_user@remoteserver ' cd /parent && tar xBf - '

This is really moot, though, until we get more details from the original poster.
# 9  
Old 09-16-2014
@achenle: the OP specified 3 Mio (3 million) files that are on average 1 KiB in size, so that should be in the order of 3 GiB. With a SATA disk with 120 (sequential, but small) iops, so with a 1KlB IO Size that should then theoretically be 3,000,000 IOS / 120 IOPS = 25000 seconds, i.e. around 7 hours for the data alone, limited either by the reading or the writing system (probably the writing side is faster since the IO's will more sequential in nature). This is excluding the IOPS required for the metadata. If the filesystem can do write combining / prefetching then perhaps that may be a bit more efficient. If the filesystem has a larger minimum block size, then that would not matter much for speed, since the block size would still be smallish.

When we take the disk out and put it in the other server we need another stream and the same amount to copy it to the disk on to the other server plus sneaker time..

If we would use the netwerk, we would probably not need much more time and we could do it with a single stream, reading from one computer, writing onto the other (the network would not be a bottleneck here..)... So that should take in the order of half the time..

If we we use any of the block copy methods in my post, there would be no need to copy the files individually nor all that metadata manipulation and can read large chunks of data with big IO's (for example 1 MiB per IO) which will be significantly faster probably in the order of 100MB/s so it should theoretically take in the order of 30-60 seconds for the data alone, if the network is not a bottleneck...

Of course if the data is on a large filesystem then that whole filesystem would need to be copied unless the method is smart like filesystem dumping methods or ZFS send / receive, which only copy the parts that are in use..

Last edited by Scrutinizer; 09-17-2014 at 01:07 AM..
# 10  
Old 09-16-2014
So that's what "mio" means...

120 IO operations per second from a SATA drive is quite optimistic. A single 7200 rpm SATA disk is realistically more likely to get about 60-70 IO operations per second, because the small reads in this case are not likely to be sequential - they'll effectively be random IO operations. If it's a 5400 rpm disk, the number would be even less.

And if atime modification isn't turned off, every read operation that reads a file will generate a write operation to update the inode data for that file.

So that's probably somewhere between 6 and 9 million IO operations because metadata has to be read to even find each file. Call it 6 million IO operations, and assume the disk can do 60 IO operations per second. That's 100,000 seconds, More like 28 hours. And that assumes the disk isn't servicing other IO operations.

Why not just share the file system via NFS and let other systems access the files that way.
# 11  
Old 09-16-2014
Well, given our lack of information, there is no real answer.

IOPS are not knowable - our SAN does 12000 iops continuously if required. The sata disk on my desktop does maybe 70. And if the file systems were zfs and were on a SAN, then the "copy" time is the time it takes to type four or five zfs commands.

So maybe we are are comparing apples to elephants. Do not know.

In any event, when an app (or a user) is allowed to clutter a filesystem as described, there is not a lot of hope for it. A simple find or ls command can take hours to complete. On some systems. Copying it as is does not seem like a best practices idea to me.
# 12  
Old 09-17-2014
cpio in pass through mode is generally regarded as being much faster than tar.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Phrase XML with Huge Data

HI Guys, I have Big XML file with Below Format :- Input :- <pokl>MKL=1,FN=1,GBNo=B10C</pokl> <d>192</d> <d>315</d> <d>35</d> <d>0,7,8</d> <pokl>MKL=1,dFN=1,GBNo=B11C</pokl> <d>162</d> <d>315</d> <d>35</d> <d>0,5,6</d> <pokl>MKL=1,dFN=1,GBNo=B12C</pokl> <d>188</d> (4 Replies)
Discussion started by: pareshkp
4 Replies

2. Shell Programming and Scripting

Aggregation of huge data

Hi Friends, I have a file with sample amount data as follows: -89990.3456 8788798.990000128 55109787.20 -12455558989.90876 I need to exclude the '-' symbol in order to treat all values as an absolute one and then I need to sum up.The record count is around 1 million. How... (8 Replies)
Discussion started by: Ravichander
8 Replies

3. Red Hat

Disk is Full but really does not contain huge data

Hi All, My disk usage show 100 % . When I check “df –kh” it shows my root partition is full. But when I run the “du –skh /” shows only 7 GB is used. Filesystem Size Used Avail Use% Mounted on /dev/sda1 30G 28G 260MB 100% / How I can identify who is using the 20 GB of memory. Os: Centos... (10 Replies)
Discussion started by: kalpeer
10 Replies

4. UNIX for Dummies Questions & Answers

Copy huge data into vi editor

Hi All, HP-UX dev4 B.11.11 U 9000/800 3251073457 I need to copy huge data from windows text file to vi editor. when I tried copy huge data, the format of data is not preserverd and appered to scatterd through the vi, something like give below. Please let me know, how can I correct this? ... (18 Replies)
Discussion started by: alok.behria
18 Replies

5. AIX

Copy huge files system

Dear Guy’s By using dd command or any strong command, I’d like to copy huge data from file system to another file system Sours File system: /sfsapp File system has 250 GB of data Target File system: /tgtapp I’d like to copy all these files and directories from /sfsapp to /tgtapp as... (28 Replies)
Discussion started by: Mr.AIX
28 Replies

6. Solaris

The FASTEST copy method?

Hi Experts, I've been asked if there is a fast way to duplicate a file(10GB) and zip it at the same time. The zipped file would be FTP'd.....management is asking this. Maybe there is a better method all together? any ideas? CP will not cut it. Thanks in advance Harley (1 Reply)
Discussion started by: Harleyrci
1 Replies

7. UNIX for Dummies Questions & Answers

copy and paste certain many lines of huge file in linux

Dear All, I am working with windoes OS but remote a linux machine. I wonder the way to copy an paste some part of a huge file in linux machine. the contain of file like as follow: ... dump annealling all custom 10 anneal_*.dat id type x y z q timestep 0.02 run 200000 Memory... (2 Replies)
Discussion started by: ariesto
2 Replies

8. UNIX for Advanced & Expert Users

A variable and sum of its value in a huge data.

Hi Experts, I got a question.. In the following output of `ps -elf | grep DataFlow` I get:- 242001 A mqsiadm 2076676 1691742 0 60 20 26ad4f400 130164 * May 09 - 3:02 DataFlowEngine EAIDVBR1_BROKER 5e453de8-2001-0000-0080-fd142b9ce8cb VIPS_INQ1 0 242001 A mqsiadm... (5 Replies)
Discussion started by: varungupta
5 Replies

9. Shell Programming and Scripting

How to extract data from a huge file?

Hi, I have a huge file of bibliographic records in some standard format.I need a script to do some repeatable task as follows: 1. Needs to create folders as the strings starts with "item_*" from the input file 2. Create a file "contents" in each folders having "license.txt(tab... (5 Replies)
Discussion started by: srsahu75
5 Replies

10. UNIX for Dummies Questions & Answers

fastest copy command

wich is the fastest command in HP-UX to copy an entire disk to dat tapes, or even disk to disk? thanks (0 Replies)
Discussion started by: vascobrito
0 Replies
Login or Register to Ask a Question