Sponsored Content
Top Forums Shell Programming and Scripting Copying number by looking a large file Post 302561687 by rdcwayx on Wednesday 5th of October 2011 12:56:24 AM
Old 10-05-2011
Code:
for txt in *.txt; do                                                         
  awk 'NR==FNR{a[$1]=$2;next} {print a[$1]==""?"0":a[$1]}' big_file.dat $txt  > ${txt%txt}num
done

This User Gave Thanks to rdcwayx For This Post:
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

copying a large filesystem

Hi there In my organisation we have a solaris network with /home being automounted from /export/home on a central file server (usual stuff) however, the guy who originally set this up only allocated 3gb to /export/home and now we are really struggling for space. I have a new 18gb disk installed... (3 Replies)
Discussion started by: hcclnoodles
3 Replies

2. Filesystems, Disks and Memory

Strange difference in file size when copying LARGE file..

Hi, Im trying to take a database backup. one of the files is 26 GB. I am using cp -pr to create a backup copy of the database. after the copying is complete, if i do du -hrs on the folders i saw a difference of 2GB. The weird fact is that the BACKUP folder was 2 GB more than the original one! ... (1 Reply)
Discussion started by: 0ktalmagik
1 Replies

3. Shell Programming and Scripting

script to splite large file to number of small files

Dear All, Could you please help me to split a file contain around 240,000,000 line to 4 files all equally likely , note that we need to maintain that the end of each file should started by start flage (MSISDN) and ended by end flag (End), also the number of the line between the... (10 Replies)
Discussion started by: ahmed.gad
10 Replies

4. UNIX for Dummies Questions & Answers

Copying large file problem on SVR4 Unix

We have 3 Unix servers all running SVR4 Unix 1.4. I have no problems copying files to and from 2 of the servers using either the rcp command or ftp but when i come to transfer large files to the third server the copy gives up part way through and crashes this server. Copying smaller files using RCP... (7 Replies)
Discussion started by: coatesd
7 Replies

5. Shell Programming and Scripting

Copying of large files fail

Hi, I have a process which duplicates files for different environments. As the files arrive, my script (korn shell) makes copies of them (giving a unique name) and then renames the original file so that my process won't get triggered again. I don't like it either, but it's what we were told to... (4 Replies)
Discussion started by: GoldenEye4ever
4 Replies

6. UNIX for Dummies Questions & Answers

Copying a Large File

I have a large file that I append entries to the end of every few seconds. Its grown to >150MB. Its basically a log file but a perl script is writing to it. I need to make a copy of it to a new directory. I realize the latest entries occuring while the copy is taking place will not be recorded... (1 Reply)
Discussion started by: lforum
1 Replies

7. Shell Programming and Scripting

Find line number of bad data in large file

Hi Forum. I was trying to search the following scenario on the forum but was not able to. Let's say that I have a very large file that has some bad data in it (for ex: 0.0015 in the 12th column) and I would like to find the line number and remove that particular line. What's the easiest... (3 Replies)
Discussion started by: pchang
3 Replies

8. Shell Programming and Scripting

Start copying large file while its still being restored from tape

Hello, I need to copy a 700GB tape-image file over a network. I want to start the copy process before the tape-image has finished being restored from the tape. The tape restore speed is about 78 Mbps and the file transfer speed over the network is about 45 Mbps I don't want to use a pipe, since... (7 Replies)
Discussion started by: swamik
7 Replies

9. UNIX for Dummies Questions & Answers

Delete large number of columns rom file

Hi, I have a data file that contains 61 columns. I want to delete all the columns except columns, 3,6 and 8. The columns are tab de-limited. How would I achieve this on the terminal? Thanks (2 Replies)
Discussion started by: lost.identity
2 Replies

10. UNIX for Advanced & Expert Users

Count number of lines between a pattern in a large file

1000CUS E Y4NYRETAIL 10010004HELIOPOLIS 110000500022360591000056XX EG 1101DEBY XXAD ZSSKY TSSROS 1102HANYNNYY@HOTMAIL.COM 210030/05/201301/06/2013AED 3100 OPE 3100 CLO 3100 The 1000CUS E Y NYCORPORATE 10010004HELIOPOLIS 110000500025270504550203XX EG 1101XXXQ FOR CTING AND... (1 Reply)
Discussion started by: john2022
1 Replies
bup-margin(1)						      General Commands Manual						     bup-margin(1)

NAME
bup-margin - figure out your deduplication safety margin SYNOPSIS
bup margin [options...] DESCRIPTION
bup margin iterates through all objects in your bup repository, calculating the largest number of prefix bits shared between any two entries. This number, n, identifies the longest subset of SHA-1 you could use and still encounter a collision between your object ids. For example, one system that was tested had a collection of 11 million objects (70 GB), and bup margin returned 45. That means a 46-bit hash would be sufficient to avoid all collisions among that set of objects; each object in that repository could be uniquely identified by its first 46 bits. The number of bits needed seems to increase by about 1 or 2 for every doubling of the number of objects. Since SHA-1 hashes have 160 bits, that leaves 115 bits of margin. Of course, because SHA-1 hashes are essentially random, it's theoretically possible to use many more bits with far fewer objects. If you're paranoid about the possibility of SHA-1 collisions, you can monitor your repository by running bup margin occasionally to see if you're getting dangerously close to 160 bits. OPTIONS
--predict Guess the offset into each index file where a particular object will appear, and report the maximum deviation of the correct answer from the guess. This is potentially useful for tuning an interpolation search algorithm. --ignore-midx don't use .midx files, use only .idx files. This is only really useful when used with --predict. EXAMPLE
$ bup margin Reading indexes: 100.00% (1612581/1612581), done. 40 40 matching prefix bits 1.94 bits per doubling 120 bits (61.86 doublings) remaining 4.19338e+18 times larger is possible Everyone on earth could have 625878182 data sets like yours, all in one repository, and we would expect 1 object collision. $ bup margin --predict PackIdxList: using 1 index. Reading indexes: 100.00% (1612581/1612581), done. 915 of 1612581 (0.057%) SEE ALSO
bup-midx(1), bup-save(1) BUP
Part of the bup(1) suite. AUTHORS
Avery Pennarun <apenwarr@gmail.com>. Bup unknown- bup-margin(1)
All times are GMT -4. The time now is 06:16 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy