Remove duplicate lines, sort it and save it as file itself


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove duplicate lines, sort it and save it as file itself
# 1  
Old 04-11-2015
Remove duplicate lines, sort it and save it as file itself

Hi, all

I have a csv file that I would like to remove duplicate lines based on 1st field and sort them by the 1st field. If there are more than 1 line which is same on the 1st field, I want to keep the first line of them and remove the rest. I think I have to use uniq or something, but I still have no idea how to do it. And when I tried to use head and tail to sort, it doesn't work with my script. I just don't know why.

Code:
SourceFile,Airspeed,GPSLatitude,GPSLongitude,Temperature,Pressure,Altitude,Roll,Pitch,Yaw
/home/intannf/foto5/2015_0313_090651_219.JPG,0.,-7.77223,110.37310,30.75,996.46,148.75,180.94,182.00,63.92
/home/intannf/foto5/2015_0313_085929_083.JPG,0.,-7.77224,110.37312,30.73,996.46,148.76,181.00,181.95,63.96
/home/intannf/foto5/2015_0313_090323_155.JPG,0.,-7.77224,110.37312,30.73,996.46,148.76,181.01,181.92,63.82
/home/intannf/foto5/2015_0313_085929_083.JPG,0.,-7.77224,110.37312,30.73,996.46,148.76,181.03,181.98,63.73 -->remove this duplicate
/home/intannf/foto5/2015_0313_085929_083.JPG,0.,-7.77224,110.37312,30.73,996.46,148.75,181.06,182.09,63.64 -->remove this duplicate
/home/intannf/foto5/2015_0313_085929_083.JPG,0.,-7.77224,110.37312,30.73,996.46,148.75,181.14,182.08,63.63 -->remove this duplicate
/home/intannf/foto5/2015_0313_090142_124.JPG,0.,-7.77224,110.37312,30.73,996.46,148.75,181.13,182.06,63.87
/home/intannf/foto5/2015_0313_085929_083.JPG,0.,-7.77224,110.37312,30.72,996.46,148.75,181.20,182.08,63.91 -->remove this duplicate
/home/intannf/foto5/2015_0313_090710_225.JPG,0.,-7.77224,110.37312,30.72,996.46,148.75,181.19,182.10,63.68
/home/intannf/foto5/2015_0313_090710_225.JPG,0.,-7.77224,110.37312,30.72,996.46,148.76,181.25,182.09,63.36 -->remove this duplicate
/home/intannf/foto5/2015_0313_090628_212.JPG,0.,-7.77223,110.37310,30.72,996.47,148.67,181.09,181.91,63.87
/home/intannf/foto5/2015_0313_085942_087.JPG,0.,-7.77219,110.37317,30.76,996.47,148.71,181.12,182.17,63.78
/home/intannf/foto5/2015_0313_090717_227.JPG,0.,-7.77217,110.37315,30.77,996.48,148.66,181.06,182.21,63.87

Code:
SourceFile,Airspeed,GPSLatitude,GPSLongitude,Temperature,Pressure,Altitude,Roll,Pitch,Yaw
/home/intannf/foto5/2015_0313_085929_083.JPG,0.,-7.77224,110.37312,30.73,996.46,148.76,181.00,181.95,63.96
/home/intannf/foto5/2015_0313_085942_087.JPG,0.,-7.77219,110.37317,30.76,996.47,148.71,181.12,182.17,63.78
/home/intannf/foto5/2015_0313_090142_124.JPG,0.,-7.77224,110.37312,30.73,996.46,148.75,181.13,182.06,63.87
/home/intannf/foto5/2015_0313_090323_155.JPG,0.,-7.77224,110.37312,30.73,996.46,148.76,181.01,181.92,63.82
/home/intannf/foto5/2015_0313_090628_212.JPG,0.,-7.77223,110.37310,30.72,996.47,148.67,181.09,181.91,63.87
/home/intannf/foto5/2015_0313_090651_219.JPG,0.,-7.77223,110.37310,30.75,996.46,148.75,180.94,182.00,63.92
/home/intannf/foto5/2015_0313_090710_225.JPG,0.,-7.77224,110.37312,30.72,996.46,148.75,181.19,182.10,63.68
/home/intannf/foto5/2015_0313_090717_227.JPG,0.,-7.77217,110.37315,30.77,996.48,148.66,181.06,182.21,63.87

Please help me to figure it out. Thanks in advance.

Regards,
Intan
# 2  
Old 04-11-2015
We would know the failure reasons even less than you do as we don't see your results as you see them. Why shouldn't head and tail work for you? Did you run them individually, trying to figure out how they work and how they cooperate to give the results you need?

man sort would show you the -u option to keep only unique key values, although it is not guaranteed that those will be the respective first line that occurred. You'd need a compound statement with awk and sort to get what you want.
# 3  
Old 04-11-2015
Hi RudiC,

When i run those script individually as Don Cragun wrote, it can work well. But when i try to put them in my whole script, like this:
Code:
..... (matching fields script using awk)
}' $first.csv $second.csv > $result.csv
(head -1 $result.csv && tail -n+2 $result.csv | sort) > debug.csv && cp debug.csv result.csv; rm -f debug.csv
.....

assume that before those code above, i have to input file of $first, $second and define a filename for $result.
Do you have another way to figure it out?

How about to remove duplicate lines? I have tried using this code, but i think something's missing.
Code:
sort -u -t, -k1 file

Thanks in advance.

Regards,
Intan
# 4  
Old 04-11-2015
Did you try to export the result variable as you are running the head ... sort in a subshell that does not inherit variables by default.
The key -k1 will use field 1 till the end of line as a key. Try -k1,1.
# 5  
Old 04-11-2015
Note: sort -t, -u -k1,1 will work with some sorts, but not every sort works that way.

An alternative is to use awk with sort without the -u option, like RudiC suggested:
Code:
awk -F, '!A[$1]++' | sort ...


Last edited by Scrutinizer; 04-12-2015 at 01:58 AM..
# 6  
Old 04-11-2015
Quote:
Originally Posted by refrain
Hi RudiC,

When i run those script individually as Don Cragun wrote, it can work well. But when i try to put them in my whole script, like this:
Code:
..... (matching fields script using awk)
}' $first.csv $second.csv > $result.csv
(head -1 $result.csv && tail -n+2 $result.csv | sort) > debug.csv && cp debug.csv result.csv; rm -f debug.csv
.....

assume that before those code above, i have to input file of $first, $second and define a filename for $result.
Do you have another way to figure it out?

How about to remove duplicate lines? I have tried using this code, but i think something's missing.
Code:
sort -u -t, -k1 file

Thanks in advance.

Regards,
Intan
In addition to what Scrutinizer and RudiC have already pointed out...

There is a HUGE difference between $result.csv and result.csv unless somewhere in your script you also had a shell assignment statement like:
result=result

If you want our help debugging your script, you need to show us your script! (Not just bits and pieces that work fine when you run them separately, but don't work in your whole script.) Most of us aren't very good at guessing what:
Code:
..... (matching fields script using awk)

and:
Code:
.....

actually expand to in your script, but it obviously makes a huge difference in what your script will do.

Are you trying to remove all but the 1st line in your file for each unique field one value, and hoping that sort -u will do that for you? Or, are you trying to remove all but the 1st line in your file for each unique field one value and also need to sort the output?

What operating system (including release numbers) and shell are you using? Do you need to be able to run this script only on that operating system, or are you trying to write portable code that will work on any UNIX or Linux system?

If you have a problem to solve, stop and think about what the problem is. Describe the entire problem. Describe your inputs. Describe your desired outputs.

Piecewise refinement is great when you've got a big problem to solve, but if you don't know your end target when you start, a lot of those pieces may be wasted since they won't lead to your final goal.

Please help us help you. Tell us in detail, about your inputs, your outputs, and the code you've tried to get to your goal.
# 7  
Old 04-13-2015
Hi, all

Finally i have figured how to deal with this problem. I have edited Don Cragun's script. This is my script and it works well with my whole script.
Code:
(head -1 $result && tail -n+2 $result | sort) > $$.csv && cp $$.csv $result.csv; rm -f $$.csv; rm -f $result

After sorting the field, then i remove the duplicate lines in the field. I used the script as Scrutinizer suggested to me. Here's my script.
Code:
awk -F, '!A[$1]++' $result.csv > $$.csv && cp $$.csv $result.csv; rm -f $$.csv

Both of scripts works well with my whole script. Thank you so much for helping me!

But i need your suggestion. Can i use both of those script as one script (like merge both of the script and make it one)? How to do it? Thanks in advance.

Regards,
Intan
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicate lines from a file

Hi, I have a csv file which contains some millions of lines in it. The first line(Header) repeats at every 50000th line. I want to remove all the duplicate headers from the second occurance(should not remove the first line). I don't want to use any pattern from the Header as I have some... (7 Replies)
Discussion started by: sudhakar T
7 Replies

2. Shell Programming and Scripting

Remove duplicate lines based on field and sort

I have a csv file that I would like to remove duplicate lines based on field 1 and sort. I don't care about any of the other fields but I still wanna keep there data intact. I was thinking I could do something like this but I have no idea how to print the full line with this. Please show any method... (8 Replies)
Discussion started by: cokedude
8 Replies

3. Shell Programming and Scripting

How to remove blank lines in a file and save the file with same name?

I have a text file which has blank lines. I want them to be removed before upload it to DB using SQL *Loader. Below is the command line, i use to remove blank lines. sed '/^ *$/d' /loc/test.txt If i use the below command to replace the file after removing the blank lines, it replace the... (6 Replies)
Discussion started by: vel4ever
6 Replies

4. Shell Programming and Scripting

Remove duplicate lines from a 50 MB file size

hi, Please help me to write a command to delete duplicate lines from a file. And the size of file is 50 MB. How to remove duplicate lins from such a big file. (6 Replies)
Discussion started by: vsachan
6 Replies

5. Shell Programming and Scripting

How do I remove the duplicate lines in this file?

Hey guys, need some help to fix this script. I am trying to remove all the duplicate lines in this file. I wrote the following script, but does not work. What is the problem? The output file should only contain five lines: Later! (5 Replies)
Discussion started by: Ernst
5 Replies

6. Shell Programming and Scripting

remove duplicate lines from file linux/sh

greetings, i'm hoping there is a way to cat a file, remove duplicate lines and send that output to a new file. the file will always vary but be something similar to this: please keep in mind that the above could be eight occurrences of each hostname or it might simply have another four of an... (2 Replies)
Discussion started by: crimso
2 Replies

7. Shell Programming and Scripting

Sort and Remove Duplicate on file

How do we sort and remove duplicate on column 1,2 retaining the record with maximum date (in feild 3) for the file with following format. aaa|1234|2010-12-31 aaa|1234|2010-11-10 bbb|345|2011-01-01 ccc|346|2011-02-01 bbb|345|2011-03-10 aaa|1234|2010-01-01 Required Output ... (5 Replies)
Discussion started by: mabarif16
5 Replies

8. UNIX for Dummies Questions & Answers

How to delete or remove duplicate lines in a file

Hi please help me how to remove duplicate lines in any file. I have a file having huge number of lines. i want to remove selected lines in it. And also if there exists duplicate lines, I want to delete the rest & just keep one of them. Please help me with any unix commands or even fortran... (7 Replies)
Discussion started by: reva
7 Replies

9. UNIX for Dummies Questions & Answers

Remove Duplicate lines from File

I have a log file "logreport" that contains several lines as seen below: 04:20:00 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 06:38:08 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 07:11:05 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but... (18 Replies)
Discussion started by: Nysif Steve
18 Replies

10. Shell Programming and Scripting

Remove Duplicate Lines in File

I am doing KSH script to remove duplicate lines in a file. Let say the file has format below. FileA 1253-6856 3101-4011 1827-1356 1822-1157 1822-1157 1000-1410 1000-1410 1822-1231 1822-1231 3101-4011 1822-1157 1822-1231 and I want to simply it with no duplicate line as file... (5 Replies)
Discussion started by: Teh Tiack Ein
5 Replies
Login or Register to Ask a Question