Visit Our UNIX and Linux User Community


Help with remove duplicate content and only keep the first content detail


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help with remove duplicate content and only keep the first content detail
# 1  
Old 12-20-2010
Help with remove duplicate content and only keep the first content detail

Input
Code:
data_10 SSA
data_2 TYUE
data_3 PEOCV
data_6 SSAT
data_21 SSA
data_19 TYUEC
data_14 TYUE
data_15 SSA
data_32 PEOCV
.
.

Desired Output
Code:
data_10 SSA
data_2 TYUE
data_3 PEOCV
data_6 SSAT
data_19 TYUEC
.
.

From the above data, if the data in column two is same (eg. data_10, data_21, and data_15 all got SSA), I would only keep the data which appear first (eg. keep data_10 SSA, remove data_21 SSA, and data_15 SSA)
Thanks.

Last edited by patrick87; 12-20-2010 at 10:07 AM..
# 2  
Old 12-20-2010
Code:
cat input_file | cut -f2 | uniq | while read line
do
    grep "$line" input_file | head -1 >> output_file
done

R0H0N
# 3  
Old 12-20-2010
awk '{if(!a[$2]) print;a[$2]++;}' inPutfile
This User Gave Thanks to anurag.singh For This Post:
# 4  
Old 12-20-2010
Hi ROHON,
I just try it out.
It seems like can't get desired output result?
Thanks.

---------- Post updated at 05:14 AM ---------- Previous update was at 05:05 AM ----------

Thanks for your awk command.
It able to remove the duplicate line in column two successfully.
Unfortunately, its (duplicate data in column) respectively column one detail still keep at the data?
# 5  
Old 12-20-2010
Quote:
Unfortunately, its (duplicate data in column) respectively column one detail still keep at the data?
Didn't get you. If you are looking for a different output, pls post expected output
# 6  
Old 12-20-2010
Quote:
Originally Posted by patrick87
Hi ROHON,
I just try it out.
It seems like can't get desired output result?
Thanks.

Code:
cat input_file | cut -f2 | uniq | while read line
do
   grep " ${line}$" input_file | head -1 >> output_file
done

R0H0N
# 7  
Old 12-20-2010
Hi singh,

I just edit my question.
Hopefully it is more clear now.
Thanks for your advice.

Previous Thread | Next Thread
Test Your Knowledge in Computers #703
Difficulty: Medium
Cross-Origin Resource Sharing (CORS) is a mechanism that uses additional HTTP headers to tell browsers to give a web application running at one origin, access to selected resources from a different origin.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to remove exisiting file content from a file and have to append new file content?

hi all, i had the below script x=`cat input.txt |wc -1` awk 'NR>1 && NR<'$x' ' input.txt > output.txt by using above script i am able to remove the head and tail part from the input file and able to append the output to the output.txt but if i run it for second time the output is... (2 Replies)
Discussion started by: hemanthsaikumar
2 Replies

2. Shell Programming and Scripting

Remove the duplicate content in a file

Here is the contents of test.txt Dependencies Resolved Changes in packages about to be updated: ChangeLog for: 1:perl-Archive-Extract-0.38-131.el6_4.x86_64, - Resolves: #915692 - CVE-2013-1667 (DoS in rehashing code) Dependencies Resolved Changes in packages about to be updated: ... (5 Replies)
Discussion started by: ashokvpp
5 Replies

3. Shell Programming and Scripting

Facing issues with Content-Type:application/x-download Content-Disposition:attachment

I am in the process of developing a perl cgi page. I had succeeded in developing the page but there are few errors/issues with the page. description about cgi page: My CGI page retrieves all the file names from an directory and displays the files in drop down menu for downloading the... (5 Replies)
Discussion started by: scriptscript
5 Replies

4. Shell Programming and Scripting

Help with duplicate common data content

Input file: #data_131 0 >content..._* 1 >content..._at_+/97.20% #data_137 0 >content..._* 1 >content..._at_+/97.20% 2 >seq..._* 3 >content..._at_+/97.20% 4 >content..._at_+/97.20% #data_141 0 >content..._* #data_150 0 >content..._* 1 >content..._at_+/97.20% 2 >seq..._* 3... (3 Replies)
Discussion started by: perl_beginner
3 Replies

5. Shell Programming and Scripting

Help with duplicate data content problem asking

Input file: A_69510335_ASD>aw 1199470 USA A_119571157_C>awe,QWEQE 113932840 USA C_34646666_qwe>TAWTT,G,TT 112736796 UK C_69510335_QW>T 1199470 USA D_70520237_WR>QEE,G 34459863 UK D_71380003_QWR>T 145418226 IK . Desired output: A_69510335_ASD>aw 1199470 USA... (1 Reply)
Discussion started by: perl_beginner
1 Replies

6. Shell Programming and Scripting

Help with replace duplicate content

Input file: CCNI data564_input1 264 CORO1A data564_input2 155 ABC-B data17_input1 3466 ABC-B data17_input2 1133 ABC-B data17_input3 2162 ABC-B data17_input4 2019 HNRNPA2B1 data95_input1 101 HNRNPA2B1 data95_input2 340 IFITM1 data105_input2 291 IFITM2 data105_input1 505... (3 Replies)
Discussion started by: cpp_beginner
3 Replies

7. Shell Programming and Scripting

Help with remove duplicate content

Input file data_1 10 US data_1 2 US data_1 5 UK data_2 20 ENGLAND data_2 12 KOREA data_3 4 CHINA . . data_60 123 US data_60 23 UK data_60 45 US Desired output file data_1 10 US data_1 5 UK data_2 20 ENGLAND data_2 12 KOREA (2 Replies)
Discussion started by: perl_beginner
2 Replies

8. Shell Programming and Scripting

Help with remove duplicated content

Input file: hcmv-US25-2-3p hsa-3160-5 hcmv-US33 hsa-47 hcmv-UL70-3p hsa-4508 hcmv-UL70-3p hsa-4486 hcms-US25 hsa-360-5 hcms-US25 hsa-4 hcms-US25 hsa-458 hcms-US25 hsa-44812 . . Desired Output file: hcmv-US25-2-3p hsa-3160-5 hcmv-US33 hsa-47 hcmv-UL70-3p hsa-4508 hsa-4486... (3 Replies)
Discussion started by: perl_beginner
3 Replies

9. Shell Programming and Scripting

Way to extract detail and its content above specific value problem asking

Input file: >position_10 sample:68711 coords:5453-8666 number:3 type:complete len:344 MSINQYSSDFHYHSLMWQQQQQQQQHQNDVVEEKEALFEKPLTPSDVGKLNRLVIPKQHA ERYFPLAAAAADAVEKGLLLCFEDEEGKPWRFRYSYWNSSQSYVLTKGWSRYVKEKHLDA NRTS* >position_4 sample:68711 coords:553-866 number:4 type:partial len:483... (7 Replies)
Discussion started by: patrick87
7 Replies

10. Shell Programming and Scripting

Remove duplicate line detail based on column one data

My input file: AVI.out <detail>named as the RRM .</detail> AVI.out <detail>Contains 1 RRM .</detail> AR0.out <detail>named as the tellurite-resistance.</detail> AWG.out <detail>Contains 2 HTH .</detail> ADV.out <detail>named as the DENR family.</detail> ADV.out ... (10 Replies)
Discussion started by: patrick87
10 Replies

Featured Tech Videos