Remove duplicate lines, sort it and save it as file itself


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove duplicate lines, sort it and save it as file itself
# 8  
Old 04-13-2015
As we said before, if it is important to choose the 1st line in your input file for lines with the same 1st field, sort -u is not guaranteed to do that. And, sort and sort -k1,1 are not guaranteed to keep lines with the same first field in the same order in the output file as they appeared in the output file. So, sorting and then using awk to choose the first line of those with the same 1st field won't work either.

And, in your earlier statements, you said you wanted the output to be stored in your input file; but the code you now have that you say works doesn't do that. Instead, it uses an input file named by the expansion of the shell variable $result, stores the sorted results in a file with the extension .csv added to the end of the name of the input file. And, whether or not processing was successful, it removes the input file.

Assuming that your input file is specified by $result and you want the output stored in that same file if processing is successful (and the original file left unchanged if there is an error), you might try something like:
Code:
#!/bin/ksh
result="file"
awk -F, '
NR == 1 {
	print
	next
}
!A[$1]++ {
	print | "sort"
}' "$result" > "$result.$$" && cp "$result.$$" "$result"; rm -f "$result.$$"

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk. Although written and tested using the Korn shell, this script should work with any shell that uses basic Bourne shell syntax.
# 9  
Old 04-13-2015
If you want awk to do the parsing & sorting try the below script...
Code:
awk -F\, '{
    if (NR == 1) print
    else !f[$1]++ && x[++i] = $0
} END {
    for (j=1; j<i; j++)
        for (k=1; k<(i-j+1); k++)
            if (x[k] > x[k+1]) {
                t = x[k]
                x[k] = x[k+1]
                x[k+1] = t
            }
    for (k=1; k<=i; k++)
        print x[k]
}' file

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicate lines from a file

Hi, I have a csv file which contains some millions of lines in it. The first line(Header) repeats at every 50000th line. I want to remove all the duplicate headers from the second occurance(should not remove the first line). I don't want to use any pattern from the Header as I have some... (7 Replies)
Discussion started by: sudhakar T
7 Replies

2. Shell Programming and Scripting

Remove duplicate lines based on field and sort

I have a csv file that I would like to remove duplicate lines based on field 1 and sort. I don't care about any of the other fields but I still wanna keep there data intact. I was thinking I could do something like this but I have no idea how to print the full line with this. Please show any method... (8 Replies)
Discussion started by: cokedude
8 Replies

3. Shell Programming and Scripting

How to remove blank lines in a file and save the file with same name?

I have a text file which has blank lines. I want them to be removed before upload it to DB using SQL *Loader. Below is the command line, i use to remove blank lines. sed '/^ *$/d' /loc/test.txt If i use the below command to replace the file after removing the blank lines, it replace the... (6 Replies)
Discussion started by: vel4ever
6 Replies

4. Shell Programming and Scripting

Remove duplicate lines from a 50 MB file size

hi, Please help me to write a command to delete duplicate lines from a file. And the size of file is 50 MB. How to remove duplicate lins from such a big file. (6 Replies)
Discussion started by: vsachan
6 Replies

5. Shell Programming and Scripting

How do I remove the duplicate lines in this file?

Hey guys, need some help to fix this script. I am trying to remove all the duplicate lines in this file. I wrote the following script, but does not work. What is the problem? The output file should only contain five lines: Later! (5 Replies)
Discussion started by: Ernst
5 Replies

6. Shell Programming and Scripting

remove duplicate lines from file linux/sh

greetings, i'm hoping there is a way to cat a file, remove duplicate lines and send that output to a new file. the file will always vary but be something similar to this: please keep in mind that the above could be eight occurrences of each hostname or it might simply have another four of an... (2 Replies)
Discussion started by: crimso
2 Replies

7. Shell Programming and Scripting

Sort and Remove Duplicate on file

How do we sort and remove duplicate on column 1,2 retaining the record with maximum date (in feild 3) for the file with following format. aaa|1234|2010-12-31 aaa|1234|2010-11-10 bbb|345|2011-01-01 ccc|346|2011-02-01 bbb|345|2011-03-10 aaa|1234|2010-01-01 Required Output ... (5 Replies)
Discussion started by: mabarif16
5 Replies

8. UNIX for Dummies Questions & Answers

How to delete or remove duplicate lines in a file

Hi please help me how to remove duplicate lines in any file. I have a file having huge number of lines. i want to remove selected lines in it. And also if there exists duplicate lines, I want to delete the rest & just keep one of them. Please help me with any unix commands or even fortran... (7 Replies)
Discussion started by: reva
7 Replies

9. UNIX for Dummies Questions & Answers

Remove Duplicate lines from File

I have a log file "logreport" that contains several lines as seen below: 04:20:00 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 06:38:08 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 07:11:05 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but... (18 Replies)
Discussion started by: Nysif Steve
18 Replies

10. Shell Programming and Scripting

Remove Duplicate Lines in File

I am doing KSH script to remove duplicate lines in a file. Let say the file has format below. FileA 1253-6856 3101-4011 1827-1356 1822-1157 1822-1157 1000-1410 1000-1410 1822-1231 1822-1231 3101-4011 1822-1157 1822-1231 and I want to simply it with no duplicate line as file... (5 Replies)
Discussion started by: Teh Tiack Ein
5 Replies
Login or Register to Ask a Question