Help with replace duplicate content


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help with replace duplicate content
# 1  
Old 12-22-2011
Help with replace duplicate content

Input file:
Code:
CCNI	data564_input1	264
CORO1A	data564_input2	155
ABC-B	data17_input1	3466
ABC-B	data17_input2	1133
ABC-B	data17_input3	2162
ABC-B	data17_input4	2019
HNRNPA2B1	data95_input1	101
HNRNPA2B1	data95_input2	340
IFITM1	data105_input2	291
IFITM2	data105_input1	505
MYL12A	data352_input2	212
MYL12B	data352_input1	131
MYL12B	data352_input3	76

Desired output file:
Code:
CCNI	data564_input1	264
CORO1A	data564_input2	155
ABC-B	data17_input1	3466
	data17_input2	1133
	data17_input3	2162
	data17_input4	2019
HNRNPA2B1	data95_input1	101
		data95_input2	340
IFITM1	data105_input2	291
IFITM2	data105_input1	505
MYL12A	data352_input2	212
MYL12B	data352_input1	131
	data352_input3	76

A tab delimiter "\t" is located in between each column.
I would like to replace the those duplicate content in column 1 with empty.
Thanks for any advice.
# 2  
Old 12-22-2011
Code:
$ nawk '{print $1}' test | sort -u | while read a; do grep $a test | nawk '{if(NR>1){printf("\t%s\t%s\n",$2,$3)}else{print $0}}'; done            
ABC-B   data17_input1   3466
        data17_input2   1133
        data17_input3   2162
        data17_input4   2019
CCNI    data564_input1  264
CORO1A  data564_input2  155
HNRNPA2B1       data95_input1   101
        data95_input2   340
IFITM1  data105_input2  291
IFITM2  data105_input1  505
MYL12A  data352_input2  212
MYL12B  data352_input1  131
        data352_input3  76

---------- Post updated at 02:06 PM ---------- Previous update was at 02:03 PM ----------

in the above example test is the input file
This User Gave Thanks to itkamaraj For This Post:
# 3  
Old 12-22-2011
here is your code :-)

Code:
first="   "
while read line
do
        first2=$( echo $line | awk -F' ' '{print $1}' )
        if [[ "$first" == "$first2" ]]
        then
                gg=$( echo "$line" | awk -F' ' '{print $2"        "$3}' )
                echo "         "$gg
        else
                echo $line
        fi
        first=$( echo $line | awk -F' ' '{print $1}' )
done < infile

This User Gave Thanks to vivek d r For This Post:
# 4  
Old 12-22-2011
try this
Code:
nawk '{y=x;x=$1}x==y{sub($1,"")}1' yourfile

This User Gave Thanks to ctsgnb For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Replace Content

Hello all ; ) I'got a file1 with a lot of emails like : fistname.lastname@domaine1.comAnd another file2 with emails like fistname.lastname@domaine2.ct.netI need a shell script that will read each line from the file1 and try to find if in file2 the fistname.lastname exist. If yes, the... (1 Reply)
Discussion started by: Aswex
1 Replies

2. Shell Programming and Scripting

Help with replace all the content within ()

Hi, Below is my input file : AAAG(12) TC(14) AACCCT(66) AACCCT(30) AACCCT(18) AACCCT(48) TCTG(12) TCTG(20) TCTG(16) AC(12) AC(12) TCTG(16) TCTG(12) AC(12) AC(12) AC(12) AC(26) AC(14) AGTG(12) AC(24) AGTG(12) TCC(12) Desired output : AAAG TC AACCCT AACCCT AACCCT AACCCT TCTG TCTG... (4 Replies)
Discussion started by: perl_beginner
4 Replies

3. Shell Programming and Scripting

Remove the duplicate content in a file

Here is the contents of test.txt Dependencies Resolved Changes in packages about to be updated: ChangeLog for: 1:perl-Archive-Extract-0.38-131.el6_4.x86_64, - Resolves: #915692 - CVE-2013-1667 (DoS in rehashing code) Dependencies Resolved Changes in packages about to be updated: ... (5 Replies)
Discussion started by: ashokvpp
5 Replies

4. Shell Programming and Scripting

Sed: replace content from file with the content from file

Hi, I am having trouble while using 'sed' with reading files. Please help. I have 3 files. File A, file B and file C. I want to find content of file B in file A and replace it by content in file C. Thanks a lot!! Here is a sample of my question. e.g. (file A: a.txt; file B: b.txt; file... (3 Replies)
Discussion started by: dirkaulo
3 Replies

5. Shell Programming and Scripting

Help with duplicate common data content

Input file: #data_131 0 >content..._* 1 >content..._at_+/97.20% #data_137 0 >content..._* 1 >content..._at_+/97.20% 2 >seq..._* 3 >content..._at_+/97.20% 4 >content..._at_+/97.20% #data_141 0 >content..._* #data_150 0 >content..._* 1 >content..._at_+/97.20% 2 >seq..._* 3... (3 Replies)
Discussion started by: perl_beginner
3 Replies

6. Shell Programming and Scripting

Replace duplicate columns with values from first occurrence

I've a text file with below values viz. multiple rows with same values in column 3, 4 and 5, which need to be considered as duplicates. For all such cases, the rows from second occurrence onwards should be modified in a way that their values in first two columns are replaced with values as in first... (4 Replies)
Discussion started by: asyed
4 Replies

7. Shell Programming and Scripting

Help with duplicate data content problem asking

Input file: A_69510335_ASD>aw 1199470 USA A_119571157_C>awe,QWEQE 113932840 USA C_34646666_qwe>TAWTT,G,TT 112736796 UK C_69510335_QW>T 1199470 USA D_70520237_WR>QEE,G 34459863 UK D_71380003_QWR>T 145418226 IK . Desired output: A_69510335_ASD>aw 1199470 USA... (1 Reply)
Discussion started by: perl_beginner
1 Replies

8. Shell Programming and Scripting

Search duplicate field and replace one of them with new value

Dear All, I have file with 4 columns: 1 AA 0 21 2 BB 0 31 3 AA 0 21 4 CC 0 41 I would like to find the duplicate record based on column 2 and replace the 4th column of the duplicate by a new value. So, the output will be: 1 AA 0 21 2 BB 0 31 3 AA 0 -21 4 CC 0 41 Any suggestions... (3 Replies)
Discussion started by: ezhil01
3 Replies

9. Shell Programming and Scripting

Help with remove duplicate content

Input file data_1 10 US data_1 2 US data_1 5 UK data_2 20 ENGLAND data_2 12 KOREA data_3 4 CHINA . . data_60 123 US data_60 23 UK data_60 45 US Desired output file data_1 10 US data_1 5 UK data_2 20 ENGLAND data_2 12 KOREA (2 Replies)
Discussion started by: perl_beginner
2 Replies

10. Shell Programming and Scripting

Help with remove duplicate content and only keep the first content detail

Input data_10 SSA data_2 TYUE data_3 PEOCV data_6 SSAT data_21 SSA data_19 TYUEC data_14 TYUE data_15 SSA data_32 PEOCV . . Desired Output data_10 SSA data_2 TYUE data_3 PEOCV data_6 SSAT data_19 TYUEC (9 Replies)
Discussion started by: patrick87
9 Replies
Login or Register to Ask a Question