REMOVE DUPLICATE IN a ROW AFTER CHECKING THE FIRST SIMILAR NAME


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting REMOVE DUPLICATE IN a ROW AFTER CHECKING THE FIRST SIMILAR NAME
# 1  
Old 08-11-2012
REMOVE DUPLICATE IN a ROW AFTER CHECKING THE FIRST SIMILAR NAME

Hi all


I have a big file like this in rows and columns from 2 column onwards the next column is desciption of previous column means 3rd columns is description of 2 columns and 5 column is description of 4 column.

All cloumns are separated by comma

Code:
CHST3,docetaxel,xyznox,tyurppw,notavailble,docetaxel,xyznox,jfhdkg,notavailable

ESRT4,ghtscjgh,notavailable,Ghjfuti,notavailable,manhfd, kdcvgh,Ghjfuti,not available,manhfd, kdcvgh

I want to remove duplicates. The problem is I want it shuld check that whether n column entry equals to n+2 then n+2 and n+3 column should be reomve other wise not

so expected output is:

Code:
CHST3,docetaxel,xyznox,tyurppw,notavailble,jfhdkg,notavailable

ESRT4,ghtscjgh,notavailable,Ghjfuti,notavailable,manhfd,kdcvgh

# 2  
Old 08-12-2012
If I understand your requirements, this might do the trick:


Code:
awk '
    {
        n = split( $0, a, "," );
        for( i = 2; i < n; i += 2 )
        {
            if( a[i] != "" )
                for( j = i+2; j < n; j += 2 )
                    if( a[i] == a[j] && a[i+1] == a[j+1] )
                        a[j] = a[j+1] = "";
        }

        for( i = 1; i <= n; i++ )
            if( a[i] )
                printf( "%s%s", a[i], i == n ? "" : "," );
        printf( "\n" );
    }
' input-file >output-file

Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Delete duplicate row

Hi all, how can delete duplicate files in file form, e.g. $cat file1 aaa 123 234 345 456 bbb 345 345 657 568 ccc 345 768 897 456 aaa 123 234 345 456 ddd 786 784 234 263 ccc 345 768 897 456 aaa 123 234 345 456 ccc 345 768 897 456 then i need ouput file1 some, (4 Replies)
Discussion started by: aav1307
4 Replies

2. Shell Programming and Scripting

Request to check:remove entries with duplicate numbers in first row

Hi I have a file 1 xyz 456 1 xyz 456 1 xyz 456 2 abc 8459 3 gfd 657 4 ghf 658 4 ghf 658 I want the output 1 xyz 456 2 abc 8459 3 gfd 657 4 ghf 658 (3 Replies)
Discussion started by: manigrover
3 Replies

3. Shell Programming and Scripting

Searching for similar row(s) across multiple files

Hello Esteemed Members, I need to write a script to search for files that have one or more than one rows similar. Please note that there is no specific pattern that I am searching for. The rows can be different, I just need to find out two or more similar records in two or more files. There... (7 Replies)
Discussion started by: Yoodit
7 Replies

4. Shell Programming and Scripting

Checking for duplicate code

I have a short line of code that checks very rudimentary for duplicate code: sort myfile.cpp | uniq -c | grep -v "^.*1 " | grep -v "}" It sorts the file, counts occurrences of each line, removes single occurrences and removes the ubiquitous closing brace. The language is C++, but is easily... (3 Replies)
Discussion started by: figaro
3 Replies

5. Shell Programming and Scripting

checking duplicate entry in file

Hi i have a file like 110.10 120.10 -1120 110.10 and the lines are having more than 10k. do we have anycommand to check the duplicate entries in the file. I applied the while loop by greping each line with whole file, but it is taking huge amount of time as the file size is large. ... (5 Replies)
Discussion started by: saluja.deepak
5 Replies

6. Shell Programming and Scripting

remove row if string is same as previous row

I have data like: Blue Apple 6 Red Apple 7 Yellow Apple 8 Green Banana 2 Purple Banana 8 Orange Pear 11 What I want to do is if $2 in a row is the same as $2 in the previous row remove that row. An identical $2 may exist more than one time. So the out file would look like: Blue... (4 Replies)
Discussion started by: dcfargo
4 Replies

7. Shell Programming and Scripting

how to identify duplicate columns in a row

Hi, How to identify duplicate columns in a row? Input data: may have 30 columns 9211480750 LK 120070417 920091030 9211480893 AZ 120070607 9205323621 O7 120090914 120090914 1420090914 2020090914 2020090914 9211479568 AZ 120070327 320090730 9211479571 MM 120070326 9211480892 MM 120070324... (3 Replies)
Discussion started by: suresh3566
3 Replies

8. Shell Programming and Scripting

Delete a row that has a duplicate column

I'm trying to remove lines of data that contain duplicate data in a specific column. For example. apple 12345 apple 54321 apple 14234 orange 55656 orange 88989 orange 99898 I only want to see apple 12345 orange 55656 How would i go about doing this? (5 Replies)
Discussion started by: spartan22
5 Replies

9. Shell Programming and Scripting

Deleting all occurences of a duplicate row

Hi, I need to delete all occurences of the repeated lines from a file and retain only the lines that is not repeated elsewhere in the file. As seen below the first two lines are same except that for the string "From BaseLine" and "From SMS".I shouldn't consider the string "From SMS" and "From... (7 Replies)
Discussion started by: ragavhere
7 Replies
Login or Register to Ask a Question