Sponsored Content
Top Forums Shell Programming and Scripting Remove duplicate lines, sort it and save it as file itself Post 302941021 by refrain on Saturday 11th of April 2015 06:13:25 AM
Old 04-11-2015
Remove duplicate lines, sort it and save it as file itself

Hi, all

I have a csv file that I would like to remove duplicate lines based on 1st field and sort them by the 1st field. If there are more than 1 line which is same on the 1st field, I want to keep the first line of them and remove the rest. I think I have to use uniq or something, but I still have no idea how to do it. And when I tried to use head and tail to sort, it doesn't work with my script. I just don't know why.

Code:
SourceFile,Airspeed,GPSLatitude,GPSLongitude,Temperature,Pressure,Altitude,Roll,Pitch,Yaw
/home/intannf/foto5/2015_0313_090651_219.JPG,0.,-7.77223,110.37310,30.75,996.46,148.75,180.94,182.00,63.92
/home/intannf/foto5/2015_0313_085929_083.JPG,0.,-7.77224,110.37312,30.73,996.46,148.76,181.00,181.95,63.96
/home/intannf/foto5/2015_0313_090323_155.JPG,0.,-7.77224,110.37312,30.73,996.46,148.76,181.01,181.92,63.82
/home/intannf/foto5/2015_0313_085929_083.JPG,0.,-7.77224,110.37312,30.73,996.46,148.76,181.03,181.98,63.73 -->remove this duplicate
/home/intannf/foto5/2015_0313_085929_083.JPG,0.,-7.77224,110.37312,30.73,996.46,148.75,181.06,182.09,63.64 -->remove this duplicate
/home/intannf/foto5/2015_0313_085929_083.JPG,0.,-7.77224,110.37312,30.73,996.46,148.75,181.14,182.08,63.63 -->remove this duplicate
/home/intannf/foto5/2015_0313_090142_124.JPG,0.,-7.77224,110.37312,30.73,996.46,148.75,181.13,182.06,63.87
/home/intannf/foto5/2015_0313_085929_083.JPG,0.,-7.77224,110.37312,30.72,996.46,148.75,181.20,182.08,63.91 -->remove this duplicate
/home/intannf/foto5/2015_0313_090710_225.JPG,0.,-7.77224,110.37312,30.72,996.46,148.75,181.19,182.10,63.68
/home/intannf/foto5/2015_0313_090710_225.JPG,0.,-7.77224,110.37312,30.72,996.46,148.76,181.25,182.09,63.36 -->remove this duplicate
/home/intannf/foto5/2015_0313_090628_212.JPG,0.,-7.77223,110.37310,30.72,996.47,148.67,181.09,181.91,63.87
/home/intannf/foto5/2015_0313_085942_087.JPG,0.,-7.77219,110.37317,30.76,996.47,148.71,181.12,182.17,63.78
/home/intannf/foto5/2015_0313_090717_227.JPG,0.,-7.77217,110.37315,30.77,996.48,148.66,181.06,182.21,63.87

Code:
SourceFile,Airspeed,GPSLatitude,GPSLongitude,Temperature,Pressure,Altitude,Roll,Pitch,Yaw
/home/intannf/foto5/2015_0313_085929_083.JPG,0.,-7.77224,110.37312,30.73,996.46,148.76,181.00,181.95,63.96
/home/intannf/foto5/2015_0313_085942_087.JPG,0.,-7.77219,110.37317,30.76,996.47,148.71,181.12,182.17,63.78
/home/intannf/foto5/2015_0313_090142_124.JPG,0.,-7.77224,110.37312,30.73,996.46,148.75,181.13,182.06,63.87
/home/intannf/foto5/2015_0313_090323_155.JPG,0.,-7.77224,110.37312,30.73,996.46,148.76,181.01,181.92,63.82
/home/intannf/foto5/2015_0313_090628_212.JPG,0.,-7.77223,110.37310,30.72,996.47,148.67,181.09,181.91,63.87
/home/intannf/foto5/2015_0313_090651_219.JPG,0.,-7.77223,110.37310,30.75,996.46,148.75,180.94,182.00,63.92
/home/intannf/foto5/2015_0313_090710_225.JPG,0.,-7.77224,110.37312,30.72,996.46,148.75,181.19,182.10,63.68
/home/intannf/foto5/2015_0313_090717_227.JPG,0.,-7.77217,110.37315,30.77,996.48,148.66,181.06,182.21,63.87

Please help me to figure it out. Thanks in advance.

Regards,
Intan
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove Duplicate Lines in File

I am doing KSH script to remove duplicate lines in a file. Let say the file has format below. FileA 1253-6856 3101-4011 1827-1356 1822-1157 1822-1157 1000-1410 1000-1410 1822-1231 1822-1231 3101-4011 1822-1157 1822-1231 and I want to simply it with no duplicate line as file... (5 Replies)
Discussion started by: Teh Tiack Ein
5 Replies

2. UNIX for Dummies Questions & Answers

Remove Duplicate lines from File

I have a log file "logreport" that contains several lines as seen below: 04:20:00 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 06:38:08 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 07:11:05 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but... (18 Replies)
Discussion started by: Nysif Steve
18 Replies

3. UNIX for Dummies Questions & Answers

How to delete or remove duplicate lines in a file

Hi please help me how to remove duplicate lines in any file. I have a file having huge number of lines. i want to remove selected lines in it. And also if there exists duplicate lines, I want to delete the rest & just keep one of them. Please help me with any unix commands or even fortran... (7 Replies)
Discussion started by: reva
7 Replies

4. Shell Programming and Scripting

Sort and Remove Duplicate on file

How do we sort and remove duplicate on column 1,2 retaining the record with maximum date (in feild 3) for the file with following format. aaa|1234|2010-12-31 aaa|1234|2010-11-10 bbb|345|2011-01-01 ccc|346|2011-02-01 bbb|345|2011-03-10 aaa|1234|2010-01-01 Required Output ... (5 Replies)
Discussion started by: mabarif16
5 Replies

5. Shell Programming and Scripting

remove duplicate lines from file linux/sh

greetings, i'm hoping there is a way to cat a file, remove duplicate lines and send that output to a new file. the file will always vary but be something similar to this: please keep in mind that the above could be eight occurrences of each hostname or it might simply have another four of an... (2 Replies)
Discussion started by: crimso
2 Replies

6. Shell Programming and Scripting

How do I remove the duplicate lines in this file?

Hey guys, need some help to fix this script. I am trying to remove all the duplicate lines in this file. I wrote the following script, but does not work. What is the problem? The output file should only contain five lines: Later! (5 Replies)
Discussion started by: Ernst
5 Replies

7. Shell Programming and Scripting

Remove duplicate lines from a 50 MB file size

hi, Please help me to write a command to delete duplicate lines from a file. And the size of file is 50 MB. How to remove duplicate lins from such a big file. (6 Replies)
Discussion started by: vsachan
6 Replies

8. Shell Programming and Scripting

How to remove blank lines in a file and save the file with same name?

I have a text file which has blank lines. I want them to be removed before upload it to DB using SQL *Loader. Below is the command line, i use to remove blank lines. sed '/^ *$/d' /loc/test.txt If i use the below command to replace the file after removing the blank lines, it replace the... (6 Replies)
Discussion started by: vel4ever
6 Replies

9. Shell Programming and Scripting

Remove duplicate lines based on field and sort

I have a csv file that I would like to remove duplicate lines based on field 1 and sort. I don't care about any of the other fields but I still wanna keep there data intact. I was thinking I could do something like this but I have no idea how to print the full line with this. Please show any method... (8 Replies)
Discussion started by: cokedude
8 Replies

10. Shell Programming and Scripting

Remove duplicate lines from a file

Hi, I have a csv file which contains some millions of lines in it. The first line(Header) repeats at every 50000th line. I want to remove all the duplicate headers from the second occurance(should not remove the first line). I don't want to use any pattern from the Header as I have some... (7 Replies)
Discussion started by: sudhakar T
7 Replies
sort5(1)						      General Commands Manual							  sort5(1)

Name
       sort5 - internationalized System 5 sort and/or merge files

Syntax
       sort5 [-cmu] [-ooutput] [-ykmem] [-zrecsz] [-X] [-dfiMnr] [-btx] [+pos1 [-pos2]] [files]

Description
       The  command  sorts lines of the named files together and writes the result on the standard output.  The standard input is read if a hyphen
       (-) is used as a file name or if no input files are named.

       Comparisons are based on one or more sort keys extracted from each line of input.  By default, there is one  sort  key,	the  entire  input
       line,  and  ordering is determined by the collating sequence specified by the LC_COLLATE locale. The LC_COLLATE locale is controlled by the
       settings of either the LANG or LC_COLLATE environment variables. See for more information.

Options
       The following options alter the default behavior:

       -c   Checks that the input file is sorted according to the ordering rules; gives no output unless the file is out of order.

       -m   Merges only; the input files are already sorted.

       -u   Suppresses all but one in each set of lines having equal keys.

       -ooutput
	    Specifies the name of an output file to use instead of the standard output.  The file may be the same as one of  the  inputs.   Blanks
	    between -o and output are optional.

       -ykmem
	    Specifies  the  number  of	kilobytes  of  memory  to use when sorting a file.  If this option is omitted, sort5 begins using a system
	    default memory size, and continues to use more space as needed.  If kmem is specified, sort5 starts using that number of kilobytes	of
	    memory.   If  the administrative minimum or maximum is violated, the value of the corresponding minimum or maximum is used.  Thus, -y0
	    is guaranteed to start with minimum memory.  By convention, -y (with no argument) starts with maximum memory.

       -zrecsz
	    Records the size of the longest line read in the sort phase so buffers can be allocated during the merge phase.  If the sort phase	is
	    omitted using either the -c or -m options, a system default size is used.  Lines longer than the buffer size cause to terminate abnor-
	    mally.  Supplying the actual number of bytes (or some larger value) in the longest line to be merged prevents abnormal termination.

       -X   Sorts using tags. Upon input each key is converted to a tag value which is sorted efficiently. This option makes international sorting
	    faster but it consumes more memory since both key and tag must be stored.

       The following options override the default ordering rules:

       -d   Specifies Dictionary order.  Only letters, digits and blanks (spaces and tabs) are significant in comparisons.

       -f   Folds lower case letters into upper case.

       -i   Ignores characters outside the ASCII range 040-0176 in non-numeric comparisons.

       -n   Sorts  an  initial	numeric  string, consisting of optional blanks, optional minus sign, and zero or more digits with optional decimal
	    point, by arithmetic value.  The -n option implies the -b option, which tells the command to ignore leading  blanks  when  determining
	    the starting and ending positions of a restricted sort key.

       -r   Reverses the sense of comparisons.

       When ordering options appear before restricted sort key specifications, the requested ordering rules are applied globally to all sort keys.
       When attached to a specific sort key (described below), the specified ordering options override all global ordering options for that key.

       The notation +pos1 -pos2 restricts a sort key to one beginning at pos1 and ending at pos2.  The characters at positions pos1 and  pos2  are
       included in the sort key (provided that pos2 does not precede pos1).  A missing -pos2 means the end of the line.

       Specifying  pos1  and pos2 involves the notion of a field, that is a minimal sequence of characters followed by a field separator or a new-
       line.  By default, the first blank of a sequence of blanks acts as the field separator.	The blank can be either a space  or  a	tab.   All
       blanks  in a sequence of blanks are interpreted as a part of the next field; for example, all blanks at the beginning of a line are consid-
       ered to be part of the first field.  The treatment of field separators is altered using the following options:

       -tx  Uses x as the field separator character.  Although it may be included in a sort key, x is not considered part of a field.  Each occur-
	    rence of x is significant (for example, xx delimits an empty field).

       -b   Ignores  leading  blanks  when  determining the starting and ending positions of a restricted sort key.  If the -b option is specified
	    before the first +pos1 argument, it is applied to all +pos1 arguments.  Otherwise, the b flag may be attached  independently  to  each
	    +pos1 or -pos2 argument.

       Pos1  and  pos2	each  have  the form m.n optionally followed by one or more of the flags bdfinr.  A starting position specified by +m.n is
       interpreted to mean the n+1st character in the m+1st field.  A missing .n means .0, indicating the first character of the m+1st field.	If
       the  b  flag  is  in  effect n is counted from the first non-blank in the m+1st field; +m.0b refers to the first non-blank character in the
       m+1st field.

       A last position specified by -m.n is interpreted to mean the nth character (including separators) after the last  character  of	the  m	th
       field.  A missing .n means .0, indicating the last character of the mth field.  If the b flag is in effect n is counted from the last lead-
       ing blank in the m+1st field; -m.1b refers to the first non-blank in the m+1st field.

       When there are multiple sort keys, later keys are compared only after all earlier keys are found to be equal.  Lines that otherwise compare
       equal are ordered with all bytes significant.

Examples
       Sort the contents of infile with the second field as the sort key:

	      sort5 +1 -2 infile

       Sort,  in  reverse  order,  the	contents of infile1 and infile2, placing the output in outfile and using the first character of the second
       field as the sort key:

	      sort5 -r -o outfile +1.0 -1.2 infile1 infile2

       Sort, in reverse order, the contents of infile1 and infile2 using the first non-blank character of the second field as the sort key:

	      sort5 -r +1.0b -1.1b infile1 infile2

       Print the password file sorted by the numeric user ID (the third colon-separated field):

	      sort5 -t: +2n -3 /etc/passwd

       Print the lines of the already sorted file infile, suppressing all but the first occurrence of lines  having  the  same	third  field  (the
       options -um with just one input file make the choice of a unique representative from a set of equal lines predictable):

	      sort5 -um +2 -3 infile

Diagnostics
       Comments  and exits with non-zero status for various trouble conditions (for example, when input lines are too long), and for disorder dis-
       covered under the -c option.

       When the last line of an input file is missing a new-line character, sort5 appends one, prints a warning message, and continues.

Files
       /usr/tmp/stm???

See Also
       comm(1), join(1), uniq(1), setlocale(3int), strcoll(3int)

																	  sort5(1)
All times are GMT -4. The time now is 05:57 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy