Request to check:remove duplicates only in first column


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Request to check:remove duplicates only in first column
# 1  
Old 07-25-2012
Request to check:remove duplicates only in first column

Hi all,

I have an input file like this

Quote:
machr fgf djfh dfhdj
machr fdj hdf hfdshf
machr dfg
nachr fdjk
nachr usd
nachr yeuio


Now
I have to remove duplicates only in first column and nothing has to be changed in second and third column. so that output would be

Quote:
machr fgf djfh dfhdj
fdj hdf hfdshf
dfg
nachr fdjk
usd
yeuio
Please let me know scripting regarding this
# 2  
Old 07-25-2012
Code:
awk '++a[$1] > 1{$1=""}1' inputfile

Do you use a template for creating new threads? These always contain "Request to check" in the title and end with "Please let me know scripting regarding this"...Smilie
This User Gave Thanks to elixir_sinari For This Post:
# 3  
Old 07-25-2012
Quote:
Originally Posted by elixir_sinari
Code:
awk '++a[$1] > 1{$1=""}1' inputfile

Do you use a template for creating new threads? These always contain "Request to check" in the title and end with "Please let me know scripting regarding this"...Smilie
Wow, i really didn't know awk was that powerful/flexible Smilie
# 4  
Old 07-25-2012
Hi

I checked I m getting result properly

for eg

Quote:
the input is
Muscarinic acetylcholine receptor Bethanechol DAP000263 Urinary retention Approved
Muscarinic acetylcholine receptor Trospium DAP000342 Spasm Approved
Muscarinic acetylcholine receptor Oxyphencyclimine DAP000835 Gastrointestinal disorders Approved
Muscarinic acetylcholine receptor Tridihexethyl DAP000836 Acquired nystagmus Approved
Muscarinic acetylcholine receptor Anisotropine Methylbromide DAP000837 Peptic ulcer disease Approved
Muscarinic acetylcholine receptor Hyoscyamine DAP001108 Gastrointestinal disorders Approved
Muscarinic acetylcholine receptor Methantheline DAP001109 Irritable bowel syndrome Approved
Muscarinic acetylcholine receptor Procyclidine DAP001110 Parkinson's disease Approved
Muscarinic acetylcholine receptor Cyclopentolate DAP001111 Pediatric eye examinations Approved
Muscarinic acetylcholine receptor Ipratropium DAP001112 Obstructive lung diseases Approved
Muscarinic acetylcholine receptor Pilocarpine DAP001113 Glaucoma Approved
Muscarinic acetylcholine receptor Flavoxate DAP001114 Muscle Relaxant Approved
Muscarinic acetylcholine receptor Mepenzolate DAP001115 Peptic ulcer disease Approved
Muscarinic acetylcholine receptor Ispaghula DAP001486 Irritable bowel syndrome Approved
Muscarinic acetylcholine receptor Mebeverine DAP001494 Irritable bowel syndrome Approved
Muscarinic acetylcholine receptor Trihexyphenidyl HCl DAP001532 Parkinson's Disease Approved
Muscarinic acetylcholine receptor Aclidinium bromide DCL000677 Chronic obstructive pulmonary disease Phase III
Muscarinic acetylcholine receptor CHF 5407 DCL000750 Chronic obstructive pulmonary disease Phase I
Muscarinic acetylcholine receptor GSK233705 DCL000823 Chronic obstructive pulmonary disease Phase II completed
Muscarinic acetylcholine receptor NVA237 DCL000901 Chronic obstructive pulmonary disease Phase III
Muscarinic acetylcholine receptor Org-23366 DCL000911 Schizophrenia No development reported
Muscarinic acetylcholine receptor OrM3 DCL000913 Chronic obstructive pulmonary disease Phase IIb
Muscarinic acetylcholine receptor M1 Pirenzepine DAP000492 Peptic ulcer disease Approved
Muscarinic acetylcholine receptor M1 Glycopyrrolate DAP001116 Anesthetic Approved
Muscarinic acetylcholine receptor M1 Clidinium DAP001117 Abdominal/stomach pain Approved
Muscarinic acetylcholine receptor M1 Dicyclomine DAP001118 Irritable bowel syndrome Approved
Muscarinic acetylcholine receptor M1 Ethopropazine DAP001119 Parkinson's disease Approved
Muscarinic acetylcholine receptor M1 Cycrimine DAP001120 Parkinson's disease Approved
Muscarinic acetylcholine receptor M1 Benztropine DAP001121 Parkinson's disease Approved
Muscarinic acetylcholine receptor M1 Trihexyphenidyl DAP001122 Parkinson's disease Approved
Muscarinic acetylcholine receptor M1 Propantheline DAP001123 Excessive sweating (hyperhidrosis) Approved
Muscarinic acetylcholine receptor M1 Oxyphenonium DAP001124 Spasm Approved
Muscarinic acetylcholine receptor M1 Biperiden DAP001125 Parkinson's disease Approved
Muscarinic acetylcholine receptor M1 Talsaclidine isomer DCL000268 Alzheimer's disease Discontinued
Muscarinic acetylcholine receptor M1 Sabcomeline hydrochloride DCL000279 Cardiovascular diseases Phase IIa
Muscarinic acetylcholine receptor M1 Talsaclidine fumarate DCL000303 Alzheimer's disease Discontinued
Muscarinic acetylcholine receptor M1 Xanomeline tartrate DCL000328 Alzheimer's disease Phase II
Muscarinic acetylcholine receptor M1 GSK573719 DCL000381 Chronic Obstructive Pulmonary Disease (COPD) Phase II
Muscarinic acetylcholine receptor M1 GSK961081 DCL000397 Chronic Obstructive Pulmonary Disease (COPD) Phase II completed
Muscarinic acetylcholine receptor M1 GSK1034702



the output is

Quote:
Muscarinic acetylcholine receptor Bethanechol DAP000263 Urinary retention Approved
acetylcholine receptor Trospium DAP000342 Spasm Approved
acetylcholine receptor Oxyphencyclimine DAP000835 Gastrointestinal disorders Approved
acetylcholine receptor Tridihexethyl DAP000836 Acquired nystagmus Approved
acetylcholine receptor Anisotropine Methylbromide DAP000837 Peptic ulcer disease Approved
acetylcholine receptor Hyoscyamine DAP001108 Gastrointestinal disorders Approved
acetylcholine receptor Methantheline DAP001109 Irritable bowel syndrome Approved
acetylcholine receptor Procyclidine DAP001110 Parkinson's disease Approved
acetylcholine receptor Cyclopentolate DAP001111 Pediatric eye examinations Approved
acetylcholine receptor Ipratropium DAP001112 Obstructive lung diseases Approved
acetylcholine receptor Pilocarpine DAP001113 Glaucoma Approved
acetylcholine receptor Flavoxate DAP001114 Muscle Relaxant Approved
acetylcholine receptor Mepenzolate DAP001115 Peptic ulcer disease Approved
acetylcholine receptor Ispaghula DAP001486 Irritable bowel syndrome Approved
acetylcholine receptor Mebeverine DAP001494 Irritable bowel syndrome Approved
acetylcholine receptor Trihexyphenidyl HCl DAP001532 Parkinson's Disease Approved
acetylcholine receptor Aclidinium bromide DCL000677 Chronic obstructive pulmonary disease Phase III
acetylcholine receptor CHF 5407 DCL000750 Chronic obstructive pulmonary disease Phase I
acetylcholine receptor GSK233705 DCL000823 Chronic obstructive pulmonary disease Phase II completed
acetylcholine receptor NVA237 DCL000901 Chronic obstructive pulmonary disease Phase III
acetylcholine receptor Org-23366 DCL000911 Schizophrenia No development reported
acetylcholine receptor OrM3 DCL000913 Chronic obstructive pulmonary disease Phase IIb
acetylcholine receptor M1 Pirenzepine DAP000492 Peptic ulcer disease Approved
acetylcholine receptor M1 Glycopyrrolate DAP001116 Anesthetic Approved
acetylcholine receptor M1 Clidinium DAP001117 Abdominal/stomach pain Approved
acetylcholine receptor M1 Dicyclomine DAP001118 Irritable bowel syndrome Approved
acetylcholine receptor M1 Ethopropazine DAP001119 Parkinson's disease Approved

I ahve to completely remove those entries in first column which are ampletely similar to each other.
# 5  
Old 07-25-2012
Expected output? I think you've got what you asked for.
# 6  
Old 07-25-2012
Quote:
Originally Posted by manigrover
Hi

I checked I m not getting result properly

for eg






the output is




I have to completely remove those entries in first column which are ampletely similar to each other.
---------- Post updated at 05:45 AM ---------- Previous update was at 05:43 AM ----------

The expected output is something like this in which all other columns are as it is but only duplicates entries in first column are remove no other change et all. sorry I didnt remove all entries in first column and there other column entires are moving left hand side which suld not happen in expected output

Quote:
Muscarinic acetylcholine receptor Bethanechol DAP000263 Urinary retention Approved
Trospium DAP000342 Spasm Approved
Oxyphencyclimine DAP000835 Gastrointestinal disorders Approved
Tridihexethyl DAP000836 Acquired nystagmus Approved
Anisotropine Methylbromide DAP000837 Peptic ulcer disease Approved
Muscarinic acetylcholine receptor Hyoscyamine DAP001108 Gastrointestinal disorders Approved
Methantheline DAP001109 Irritable bowel syndrome Approved
Procyclidine DAP001110 Parkinson's disease Approved
Cyclopentolate DAP001111 Pediatric eye examinations Approved
Ipratropium DAP001112 Obstructive lung diseases Approved
Pilocarpine DAP001113 Glaucoma Approved
r Flavoxate DAP001114 Muscle Relaxant Approved
Mepenzolate DAP001115 Peptic ulcer disease Approved
Ispaghula DAP001486 Irritable bowel syndrome Approved
Mebeverine DAP001494 Irritable bowel syndrome Approved
Trihexyphenidyl HCl DAP001532 Parkinson's Disease Approved
Aclidinium bromide DCL000677 Chronic obstructive pulmonary disease Phase III
CHF 5407 DCL000750 Chronic obstructive pulmonary disease Phase I
GSK233705 DCL000823 Chronic obstructive pulmonary disease Phase II completed
NVA237 DCL000901 Chronic obstructive pulmonary disease Phase III
Org-23366 DCL000911 Schizophrenia No development reported
OrM3 DCL000913 Chronic obstructive pulmonary disease Phase IIb
M1 Pirenzepine DAP000492 Peptic ulcer disease Approved
Glycopyrrolate DAP001116 Anesthetic Approved
1 Clidinium DAP001117 Abdominal/stomach pain Approved
DAP001118 Irritable bowel syndrome Approved
Ethopropazine DAP001119 Parkinson's disease Approved
Cycrimine DAP001120 Parkinson's disease Approved
Benztropine DAP001121 Parkinson's disease Approved
Trihexyphenidyl DAP001122 Parkinson's disease Approved
Propantheline DAP001123 Excessive sweating (hyperhidrosis) Approved
Oxyphenonium DAP001124 Spasm Approved
Biperiden DAP001125 Parkinson's disease Approved
Talsaclidine isomer DCL000268 Alzheimer's disease Discontinued

Last edited by manigrover; 07-25-2012 at 07:51 AM..
# 7  
Old 07-25-2012
Does this work for you?
Code:
awk '{
for(i=1;i<=NF;i++)
{
if(FNR==1)
{
 count[$i,i]++
 continue
}
count[$i,i]++
if(count[$i,i]==1)
 break
else
 $i=""
}
}1' inputfile

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicates according to their frequency in column

Hi all, I have huge a tab-delimited file with the following format and I want to remove the duplicates according to their frequency based on Column2 and Column3. Column1 Column2 Column3 Column4 Column5 Column6 Column7 1 user1 access1 word word 3 2 2 user2 access2 ... (10 Replies)
Discussion started by: corfuitl
10 Replies

2. Shell Programming and Scripting

Remove duplicates within row and separate column

Hi all I have following kind of input file ESR1 PA156 leflunomide PA450192 leflunomide CHST3 PA26503 docetaxel Pa4586; thalidomide Pa34958; decetaxel docetaxel docetaxel I want to remove duplicates and I want to separate anything before and after PAxxxx entry into columns or... (1 Reply)
Discussion started by: manigrover
1 Replies

3. Shell Programming and Scripting

Request to check:remove duplicates and write sytematically

Hi all I have a file with following input It contains 5 columns gene name drug drug ID disease approved Now the same gene is repeated many times with different data in column2,3 ,4,5 I want to arrange dat in such a way that there shuld be one entry in the column(no... (2 Replies)
Discussion started by: manigrover
2 Replies

4. Shell Programming and Scripting

Request to check remove duplicates but write before it

Hi alll I have a file with following kind input I want in output duplicates should not be there but there should be numbering mentioned before that like (4 Replies)
Discussion started by: manigrover
4 Replies

5. Shell Programming and Scripting

Request to check:Remove duplicates

Hi all I have a file with following kind of data I want to remove duplicates according to first column so that output contains Kindly let me scripting regading this. (4 Replies)
Discussion started by: manigrover
4 Replies

6. Shell Programming and Scripting

Request to check:remove entries more than once in different column

Hi I have a file 12m 345693460 12 1234 12 1234 34 345 34 345 And I want output fiel as 12m 345693460 12 1234 34 345 hw can it be done Thanks (1 Reply)
Discussion started by: manigrover
1 Replies

7. Shell Programming and Scripting

Request to check:remove entries with N/A mentioned

Hi I have a file with following entries 122 N/A 123 5654656 123423 43534543 4544 45435 435454 N/A i Have to remove entries with N/A so that only 123 5654656 123423 43534543 4544 45435 remain in output file can anybody guide for a code/unix/perl (2 Replies)
Discussion started by: manigrover
2 Replies

8. Shell Programming and Scripting

Request to check:remove entries more than once

Hi I have a file like this 1234 2345 567889 567889 2345 234899420 83743 2345 67890 67890 ................ so on I want to delete entries which are more than once like 2345, 567889 and 67890 so that these appear once (4 Replies)
Discussion started by: manigrover
4 Replies

9. Shell Programming and Scripting

remove duplicates based on single column

Hello, I am new to shell scripting. I have a huge file with multiple columns for example: I have 5 columns below. HWUSI-EAS000_29:1:105 + chr5 76654650 AATTGGAA HHHHG HWUSI-EAS000_29:1:106 + chr5 76654650 AATTGGAA B@HYL HWUSI-EAS000_29:1:108 + ... (4 Replies)
Discussion started by: Diya123
4 Replies

10. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Given a file such as this I need to remove the duplicates. 00060011 PAUL BOWSTEIN ad_waq3_921_20100826_010517.txt 00060011 PAUL BOWSTEIN ad_waq3_921_20100827_010528.txt 0624-01 RUT CORPORATION ad_sade3_10_20100827_010528.txt 0624-01 RUT CORPORATION ... (13 Replies)
Discussion started by: script_op2a
13 Replies
Login or Register to Ask a Question