Kindly check:remove duplicates with similar data in front of it


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Kindly check:remove duplicates with similar data in front of it
# 1  
Old 07-28-2012
Kindly check:remove duplicates with similar data in front of it

Hi all,

I have 2 files containing data like this:


Quote:
1 x
1 x
2 y
2 z
3 s
3 s
3 s
4 g
4 h
5 i
6 k
7 y
Quote:
1 x y z
1 x y z
2 y h f
2 z s k
3 s
3 s
3 s
4 g
4 h
5 i
6 k
7 y
so if there is same entry repeated in the column like1,2,3,4
I have to check if there is different entries column like 2,4
but similar entries for duplicatein column 2 like1,3

the output shuld be like this for first file

Quote:
1 x
2 y,z
3 s
4 g,h
5 i
6 k
7 y
Please let me know scripting regarding this.

In the same way for second file as well if data in colmn 2 is diferent print for duplicate entries arranged it like this
Quote:
1 x y z
2 y,z h,s f,k

Last edited by manigrover; 07-28-2012 at 04:52 AM..
# 2  
Old 07-28-2012
awk

Hi,

Try this one,
Code:
awk '{t=$0;r=$1" ";sub(r,"",t);if(a[$1]!~t){a[$1]=a[$1]" "t;}else{if(!a[$1]){a[$1]=t;}}}END{for(i in a){print i,a[i];}}' file1

It will work for both the files. I have not yet tested this.
Do you want combine these two files and do the rest?
Cheers,
Ranga:-)
# 3  
Old 07-28-2012
Request to check

Hi

Thanks a lot Ranga

it has worked with the first file but nor with tthe second file

I dont have to combine both files

I have run separately

it has worked with first file but not with second

and it shows some sort of error like this, u might not able to understand because values are not like 1,2,3 and xyz as mentione din input but it follow the same pattern.there seems a littile error. Kindly check it

Quote:
bash-3.2$ awk '{t=$0;r=$1" ";sub(r,"",t);if(a[$1]!~t){a[$1]=a[$1]" "t;}else{if(!a[$1]){a[$1]=t;}}}END{for(i in a){print i,a[i];}}' saradrugbankdrug.txt >saradrugbankdrugnewlist.txt
awk: (FILENAME=saradrugbankdrug.txt FNR=132) fatal: Invalid range end: /PDE3B (5r)-6-(4-{[2-(3-Iodobenzyl)-3-Oxocyclohex-1-En-1-Yl]Amino}Phenyl)-5-Methyl-4,5-Dihydropyridazin-3(2h)-One Not Available T2D,CD,T1D/
bash-3.2$
Moderator's Comments:
Mod Comment Please use code tags instead of quote tags

Last edited by Scrutinizer; 07-28-2012 at 08:27 AM..
# 4  
Old 07-28-2012
awk

Hi,
The input file has some pattern match related characters like []. I have not tested the below code. Make a try with this.
Code:
awk '{$0=gensub(/([\]\[\(\)\{\}])/,"\\\1","g",$0);t=$0;r=$1"";sub(r,"",t);if(a[$1]!~t){a[$1]=a[$1]""t;}else{if(!a[$1]){a[$1]=t;}}}END{for(i in a){print i,a[i];}}' file1

you have to escape the special characters before going to use those in regex. You can also use quotemeta function in perl and then pass those output lines to awk.
Cheers,
Ranga:-)

Last edited by rangarasan; 07-28-2012 at 08:40 AM.. Reason: add perl func name
# 5  
Old 07-28-2012
Request to check

Thankyouvery muchSmilieSmilieSmilieSmilieSmilie
I want to write many!!
# 6  
Old 07-28-2012
Please use code tags to wrap your post so that future user's will get benefit:-)
Cheers,
Ranga:-)
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Sort data by date first and then remove duplicates

Hi , I have below data inside a file named ref.psv . I want to create a shell script which will do the below 2 points : (1) sort the file content first based on the latest date which is the last column in the file (actual file its the 175th column) (2)after sorting the file based on latest date... (3 Replies)
Discussion started by: samrat dutta
3 Replies

2. Shell Programming and Scripting

common entries of first column in 2 or 3 files:kindly check

Hi all, I have 3 files with such data first files second file third file I have to find common entries of first column in two ways 1) between 2 files (2 Replies)
Discussion started by: manigrover
2 Replies

3. Shell Programming and Scripting

Kindly check it: Camparison of files only column1 of 2 files

Hi all, I have 2 files in which i have to find commom entries in column 1 an dif soemthing is common write other data of both files in front of it mentioned. Gene symbol and disease name column 1 column2 ARFGEF2 CAD DDEF2 CAD PSCD3 CAD PSCD4 CAD CAMK1... (15 Replies)
Discussion started by: manigrover
15 Replies

4. Shell Programming and Scripting

Request to check:remove duplicates only in first column

Hi all, I have an input file like this Now I have to remove duplicates only in first column and nothing has to be changed in second and third column. so that output would be Please let me know scripting regarding this (20 Replies)
Discussion started by: manigrover
20 Replies

5. Shell Programming and Scripting

Request to check:remove duplicates and write sytematically

Hi all I have a file with following input It contains 5 columns gene name drug drug ID disease approved Now the same gene is repeated many times with different data in column2,3 ,4,5 I want to arrange dat in such a way that there shuld be one entry in the column(no... (2 Replies)
Discussion started by: manigrover
2 Replies

6. Shell Programming and Scripting

Request to check remove duplicates but write before it

Hi alll I have a file with following kind input I want in output duplicates should not be there but there should be numbering mentioned before that like (4 Replies)
Discussion started by: manigrover
4 Replies

7. Shell Programming and Scripting

Request to check:Remove duplicates

Hi all I have a file with following kind of data I want to remove duplicates according to first column so that output contains Kindly let me scripting regading this. (4 Replies)
Discussion started by: manigrover
4 Replies

8. Shell Programming and Scripting

sh, ksh: command to remove front spaces from a string?

dear pro-coders, is there any command out there that takes out the front spaces from a string? sample strings: 4 members 5 members 3 members but it has to be like so: 4 members 5 members 3 members (3 Replies)
Discussion started by: pseudocoder
3 Replies

9. Shell Programming and Scripting

remove space in front or end of each field

Hi, I have a txt file called a.txt which contain over 10,000 records and I would like to remove space before comma or after comma....like below: The input (for example two record 00001,00002): 00001,client,card limited ,02292,N ,162:41 , 192, ... (6 Replies)
Discussion started by: happyv
6 Replies
Login or Register to Ask a Question