Sponsored Content
Full Discussion: Uniq code in sorted order
Top Forums Shell Programming and Scripting Uniq code in sorted order Post 302733227 by Don Cragun on Tuesday 20th of November 2012 02:20:54 AM
Old 11-20-2012
Quote:
Originally Posted by irfanmemon
Hi Don,

Its a same file, I want to remove the duplicates from the file & want to keep that in the same file 10_FMS_CRXtoFMS.csv.

Is there something wrong or how i can use then.
OK. Let's forget about the ssh complications and go back to basics. Essentially, you have the command:
Code:
awk -F, '!c[$3]++' 10_FMS_CRXtoFMS.csv >> 10_FMS_CRXtoFMS.csv

This reads the file 10_FMS_CRXtoFMS.csv and adds the lines in that file that had different values in the 3rd field to the end of the file. It does not throw away the original contents of the file.

If you change the command to:
Code:
awk -F, '!c[$3]++' 10_FMS_CRXtoFMS.csv > 10_FMS_CRXtoFMS.csv

you will empty the file named 10_FMS_CRXtoFMS.csv and then add any unique 3rd column values to the file (but since you emptied the file before calling awk, there aren't any lines in the file and you end up with an empty file).

Even if it did do what you thought it was doing, you still wouldn't want to do that. If your awk script fails for some reason, you will destroy your input file and have no backup. The safer way to handle something like this is:
Code:
awk -F, '!c[$3]++' 10_FMS_CRXtoFMS.csv > tmp$$.csv && mv tmp$$.csv 10_FMS_CRXtoFMS.csv

This writes the results to a temporary file and then moves the temporary file back to your original file's name if and only if awk completed successfully. (If awk fails, you will have the diagnostic messages awk prints, your unchanged input file, and the results awk produced before it failed in the temp file to debug the problem and fix it without losing any data. Using $$ in the file name allows you to sue the script to concurrently process other files without them interfering with each other. In POSIX conforming shells, $$ expands to the process ID of the shell creating the file.)

There are other issues to consider (and other ways to do this safely) if your input file has multiple hard links, but I'm assuming that isn't an issue for now.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

sorted processes

Hi, I am trying to make a script that creates a list of all active (alive) processes sorted by size and then print this list on screen. Could anyone help me? Thaks a lot (7 Replies)
Discussion started by: pro
7 Replies

2. Shell Programming and Scripting

Compare 2 sorted files

Hi all, please give me the commands using which i can compare 2 sorted files and get the difference in third file, indiating where the difference is from either file1 or file2. as: File1 (Original file) GARRY JOHN JULIE SAM --------------- File2 DEV GARRY JOHN JOHNIEE (7 Replies)
Discussion started by: varungupta
7 Replies

3. Shell Programming and Scripting

executing code on files in the sorted order -help!

Say i have 2 files in the giving format: file1 1 2 3 4 1 2 3 4 1 2 3 4 file2 1 2 3 4 1 2 3 4 1 2 3 4 I have a PERL code (loaned by one of u -i forgot who - thanks!) that extracts the 2nd column from each file and append horizontally to a new file: perl -ane 'push @{$L->}, $F; close... (1 Reply)
Discussion started by: epi8
1 Replies

4. UNIX for Advanced & Expert Users

Sorted file

Hi Is there any unix shell command or utility to know if the file is sorted or not? Thanks (3 Replies)
Discussion started by: ksailesh
3 Replies

5. UNIX for Dummies Questions & Answers

numerically sorted filenames

How do you sort filenames: 1 2 3 4 5 6 7 8 9 10 12 13 14 15 16 17 18 19 20 21 as: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 (6 Replies)
Discussion started by: kenneth.mcbride
6 Replies

6. UNIX for Dummies Questions & Answers

Help with printing sorted expression

Hi All, How can I print the sorted results of the following expression in Perl ?? print "$i\t$h{$i}\n"; I tried print (sort ("$i\t$h{$i}")"\n"); and other variations of the same but failed. Can someone suggest how to solve this problem, as I'm tryin print sorted results of my script, which... (11 Replies)
Discussion started by: pawannoel
11 Replies

7. Shell Programming and Scripting

sort the files based on timestamp and execute sorted files in order

Hi I have a requirement like below I need to sort the files based on the timestamp in the file name and run them in sorted order and then archive all the files which are one day old to temp directory My files looks like this PGABOLTXML1D_201108121235.xml... (1 Reply)
Discussion started by: saidutta123
1 Replies

8. Shell Programming and Scripting

Read filenames in sorted order

Hi , My requirement is to scan a directory for file names with LTR.PDF* and send those files via ftp to another server one by one. Now the the problem is file names are like LTR.PDF ,LTR.PDF1 ,LTR.PDF2.....LTR.PDF10..upto 99 and these needs to be sent in sorted order. is there a way to get... (10 Replies)
Discussion started by: nishantrk
10 Replies

9. UNIX for Beginners Questions & Answers

How to create a summary file of all files in a directory sorted in reverse alphabetical order.?

I have an interactive script which works terrific at processing a folder of unsorted files into new directories. I am wondering how I could modify my script so that( upon execution) it provides an additional labelled summary file on my desktop that lists all of the files in each directory that... (4 Replies)
Discussion started by: Braveheart
4 Replies

10. Shell Programming and Scripting

Matrix to 3 col sorted

Hello experts, I have matrices sorted by position, there are 400k rows, 3000 columns. ID CHR POS M1 M2 M3 M4 M5 ID1 1 1 4.6 2.6 2.1 3.5 4.2 ID2 1 100 3.6 2.9 3.2 2.6 2.5 ID3 1 1000 4.1... (9 Replies)
Discussion started by: senhia83
9 Replies
bup-margin(1)						      General Commands Manual						     bup-margin(1)

NAME
bup-margin - figure out your deduplication safety margin SYNOPSIS
bup margin [options...] DESCRIPTION
bup margin iterates through all objects in your bup repository, calculating the largest number of prefix bits shared between any two entries. This number, n, identifies the longest subset of SHA-1 you could use and still encounter a collision between your object ids. For example, one system that was tested had a collection of 11 million objects (70 GB), and bup margin returned 45. That means a 46-bit hash would be sufficient to avoid all collisions among that set of objects; each object in that repository could be uniquely identified by its first 46 bits. The number of bits needed seems to increase by about 1 or 2 for every doubling of the number of objects. Since SHA-1 hashes have 160 bits, that leaves 115 bits of margin. Of course, because SHA-1 hashes are essentially random, it's theoretically possible to use many more bits with far fewer objects. If you're paranoid about the possibility of SHA-1 collisions, you can monitor your repository by running bup margin occasionally to see if you're getting dangerously close to 160 bits. OPTIONS
--predict Guess the offset into each index file where a particular object will appear, and report the maximum deviation of the correct answer from the guess. This is potentially useful for tuning an interpolation search algorithm. --ignore-midx don't use .midx files, use only .idx files. This is only really useful when used with --predict. EXAMPLE
$ bup margin Reading indexes: 100.00% (1612581/1612581), done. 40 40 matching prefix bits 1.94 bits per doubling 120 bits (61.86 doublings) remaining 4.19338e+18 times larger is possible Everyone on earth could have 625878182 data sets like yours, all in one repository, and we would expect 1 object collision. $ bup margin --predict PackIdxList: using 1 index. Reading indexes: 100.00% (1612581/1612581), done. 915 of 1612581 (0.057%) SEE ALSO
bup-midx(1), bup-save(1) BUP
Part of the bup(1) suite. AUTHORS
Avery Pennarun <apenwarr@gmail.com>. Bup unknown- bup-margin(1)
All times are GMT -4. The time now is 11:09 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy