07-10-2008
Deleting all occurences of a duplicate row
Hi,
I need to delete all occurences of the repeated lines from a file and retain only the lines that is not repeated elsewhere in the file. As seen below the first two lines are same except that for the string "From BaseLine" and "From SMS".I shouldn't consider the string "From SMS" and "From BaseLine" for checking the repeated lines. I want to retain only the third line.
From BaseLine - 0T001 000 999999999 00101 20080411000000T1023.27
From SMS - 0T001 000 999999999 00101 20080411000000T1023.27
From BaseLine - 0T001 000 999999999 00101 20080411000000T109.019
My output should be the third line alone.
These file size would range from 100 MB to 900MB. The performance factor should also be considered. Can you please help me out?
Regards,
Ragav.
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
I have a pipe delimited file. Key is field 2, date is field 5 (as example, my real file is more complicated of course, but the KEY and DATE are accurate)
There can be duplicate rows for a key with different dates.
I need to keep only rows with latest date in this case.
Example data: ... (4 Replies)
Discussion started by: LisaS
4 Replies
2. Shell Programming and Scripting
I'm trying to remove lines of data that contain duplicate data in a specific column.
For example.
apple 12345
apple 54321
apple 14234
orange 55656
orange 88989
orange 99898
I only want to see
apple 12345
orange 55656
How would i go about doing this? (5 Replies)
Discussion started by: spartan22
5 Replies
3. Shell Programming and Scripting
Hi,
How to identify duplicate columns in a row?
Input data: may have 30 columns
9211480750 LK 120070417 920091030
9211480893 AZ 120070607
9205323621 O7 120090914 120090914 1420090914 2020090914 2020090914
9211479568 AZ 120070327 320090730
9211479571 MM 120070326
9211480892 MM 120070324... (3 Replies)
Discussion started by: suresh3566
3 Replies
4. Shell Programming and Scripting
Hello,
I'm have a file of xy data with over 1000 records. I want to delete both x and y values for any record that has the same x value as any previous record thus removing the duplicates from my file.
Can anyone help?
Thanks,
Dan (3 Replies)
Discussion started by: DFr0st
3 Replies
5. Shell Programming and Scripting
Hi all
I have a big file like this in rows and columns from 2 column onwards the next column is desciption of previous column means 3rd columns is description of 2 columns and 5 column is description of 4 column.
All cloumns are separated by comma
... (1 Reply)
Discussion started by: manigrover
1 Replies
6. Shell Programming and Scripting
Hello,
I have a large database in which name homonyms are arranged in a row. Since the database is large and generated by hand, very often dupes creep in. I want to remove the dupes either using an awk or perl script.
An input is given below
The expected output is given below:
As can be... (2 Replies)
Discussion started by: gimley
2 Replies
7. Shell Programming and Scripting
Hi, I want to move a new row to another table if the field from new row doesn't have the first word that I categorized (like: IRC blablabla, PTM blablabla, ADM blablabla, BS blablabla).
I already use this script but doesn't work as I expected.
CHECK_KEYWORD="$( mysql -uroot -p123456 smsd -N... (7 Replies)
Discussion started by: jazzyzha
7 Replies
8. Shell Programming and Scripting
Hi, I already succeed moving a new row to another table if the field from new row doesn't have the first word that I categorized (like: IRC blablabla, PTM blablabla, ADM blablabla, BS blablabla).
But it can't delete the old row. Please help me with the script.
my php script:
INSERT INTO... (2 Replies)
Discussion started by: jazzyzha
2 Replies
9. Shell Programming and Scripting
Hi all,
how can delete duplicate files in file form, e.g.
$cat file1
aaa 123 234 345 456
bbb 345 345 657 568
ccc 345 768 897 456
aaa 123 234 345 456
ddd 786 784 234 263
ccc 345 768 897 456
aaa 123 234 345 456
ccc 345 768 897 456
then i need ouput file1 some, (4 Replies)
Discussion started by: aav1307
4 Replies
10. Shell Programming and Scripting
Hi,
I have an input file as shown below:
20140102;13:30;FR-AUD-LIBOR-1W;2.495
20140103;13:30;FR-AUD-LIBOR-1W;2.475
20140106;13:30;FR-AUD-LIBOR-1W;2.495
20140107;13:30;FR-AUD-LIBOR-1W;2.475
20140108;13:30;FR-AUD-LIBOR-1W;2.475
20140109;13:30;FR-AUD-LIBOR-1W;2.475... (2 Replies)
Discussion started by: shash
2 Replies
UNIQ(1) FSF UNIQ(1)
NAME
uniq - remove duplicate lines from a sorted file
SYNOPSIS
uniq [OPTION]... [INPUT [OUTPUT]]
DESCRIPTION
Discard all but one of successive identical lines from INPUT (or standard input), writing to OUTPUT (or standard output).
Mandatory arguments to long options are mandatory for short options too.
-c, --count
prefix lines by the number of occurrences
-d, --repeated
only print duplicate lines
-D, --all-repeated[=delimit-method] print all duplicate lines
delimit-method={none(default),prepend,separate} Delimiting is done with blank lines.
-f, --skip-fields=N
avoid comparing the first N fields
-i, --ignore-case
ignore differences in case when comparing
-s, --skip-chars=N
avoid comparing the first N characters
-u, --unique
only print unique lines
-w, --check-chars=N
compare no more than N characters in lines
--help display this help and exit
--version
output version information and exit
A field is a run of whitespace, then non-whitespace characters. Fields are skipped before chars.
AUTHOR
Written by Richard Stallman and David MacKenzie.
REPORTING BUGS
Report bugs to <bug-coreutils@gnu.org>.
COPYRIGHT
Copyright (C) 2002 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICU-
LAR PURPOSE.
SEE ALSO
The full documentation for uniq is maintained as a Texinfo manual. If the info and uniq programs are properly installed at your site, the
command
info uniq
should give you access to the complete manual.
uniq (coreutils) 4.5.3 February 2003 UNIQ(1)