Sponsored Content
Top Forums Shell Programming and Scripting Help with replace duplicate content Post 302584039 by cpp_beginner on Thursday 22nd of December 2011 03:15:20 AM
Old 12-22-2011
Help with replace duplicate content

Input file:
Code:
CCNI	data564_input1	264
CORO1A	data564_input2	155
ABC-B	data17_input1	3466
ABC-B	data17_input2	1133
ABC-B	data17_input3	2162
ABC-B	data17_input4	2019
HNRNPA2B1	data95_input1	101
HNRNPA2B1	data95_input2	340
IFITM1	data105_input2	291
IFITM2	data105_input1	505
MYL12A	data352_input2	212
MYL12B	data352_input1	131
MYL12B	data352_input3	76

Desired output file:
Code:
CCNI	data564_input1	264
CORO1A	data564_input2	155
ABC-B	data17_input1	3466
	data17_input2	1133
	data17_input3	2162
	data17_input4	2019
HNRNPA2B1	data95_input1	101
		data95_input2	340
IFITM1	data105_input2	291
IFITM2	data105_input1	505
MYL12A	data352_input2	212
MYL12B	data352_input1	131
	data352_input3	76

A tab delimiter "\t" is located in between each column.
I would like to replace the those duplicate content in column 1 with empty.
Thanks for any advice.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with remove duplicate content and only keep the first content detail

Input data_10 SSA data_2 TYUE data_3 PEOCV data_6 SSAT data_21 SSA data_19 TYUEC data_14 TYUE data_15 SSA data_32 PEOCV . . Desired Output data_10 SSA data_2 TYUE data_3 PEOCV data_6 SSAT data_19 TYUEC (9 Replies)
Discussion started by: patrick87
9 Replies

2. Shell Programming and Scripting

Help with remove duplicate content

Input file data_1 10 US data_1 2 US data_1 5 UK data_2 20 ENGLAND data_2 12 KOREA data_3 4 CHINA . . data_60 123 US data_60 23 UK data_60 45 US Desired output file data_1 10 US data_1 5 UK data_2 20 ENGLAND data_2 12 KOREA (2 Replies)
Discussion started by: perl_beginner
2 Replies

3. Shell Programming and Scripting

Search duplicate field and replace one of them with new value

Dear All, I have file with 4 columns: 1 AA 0 21 2 BB 0 31 3 AA 0 21 4 CC 0 41 I would like to find the duplicate record based on column 2 and replace the 4th column of the duplicate by a new value. So, the output will be: 1 AA 0 21 2 BB 0 31 3 AA 0 -21 4 CC 0 41 Any suggestions... (3 Replies)
Discussion started by: ezhil01
3 Replies

4. Shell Programming and Scripting

Help with duplicate data content problem asking

Input file: A_69510335_ASD>aw 1199470 USA A_119571157_C>awe,QWEQE 113932840 USA C_34646666_qwe>TAWTT,G,TT 112736796 UK C_69510335_QW>T 1199470 USA D_70520237_WR>QEE,G 34459863 UK D_71380003_QWR>T 145418226 IK . Desired output: A_69510335_ASD>aw 1199470 USA... (1 Reply)
Discussion started by: perl_beginner
1 Replies

5. Shell Programming and Scripting

Replace duplicate columns with values from first occurrence

I've a text file with below values viz. multiple rows with same values in column 3, 4 and 5, which need to be considered as duplicates. For all such cases, the rows from second occurrence onwards should be modified in a way that their values in first two columns are replaced with values as in first... (4 Replies)
Discussion started by: asyed
4 Replies

6. Shell Programming and Scripting

Help with duplicate common data content

Input file: #data_131 0 >content..._* 1 >content..._at_+/97.20% #data_137 0 >content..._* 1 >content..._at_+/97.20% 2 >seq..._* 3 >content..._at_+/97.20% 4 >content..._at_+/97.20% #data_141 0 >content..._* #data_150 0 >content..._* 1 >content..._at_+/97.20% 2 >seq..._* 3... (3 Replies)
Discussion started by: perl_beginner
3 Replies

7. Shell Programming and Scripting

Sed: replace content from file with the content from file

Hi, I am having trouble while using 'sed' with reading files. Please help. I have 3 files. File A, file B and file C. I want to find content of file B in file A and replace it by content in file C. Thanks a lot!! Here is a sample of my question. e.g. (file A: a.txt; file B: b.txt; file... (3 Replies)
Discussion started by: dirkaulo
3 Replies

8. Shell Programming and Scripting

Remove the duplicate content in a file

Here is the contents of test.txt Dependencies Resolved Changes in packages about to be updated: ChangeLog for: 1:perl-Archive-Extract-0.38-131.el6_4.x86_64, - Resolves: #915692 - CVE-2013-1667 (DoS in rehashing code) Dependencies Resolved Changes in packages about to be updated: ... (5 Replies)
Discussion started by: ashokvpp
5 Replies

9. Shell Programming and Scripting

Help with replace all the content within ()

Hi, Below is my input file : AAAG(12) TC(14) AACCCT(66) AACCCT(30) AACCCT(18) AACCCT(48) TCTG(12) TCTG(20) TCTG(16) AC(12) AC(12) TCTG(16) TCTG(12) AC(12) AC(12) AC(12) AC(26) AC(14) AGTG(12) AC(24) AGTG(12) TCC(12) Desired output : AAAG TC AACCCT AACCCT AACCCT AACCCT TCTG TCTG... (4 Replies)
Discussion started by: perl_beginner
4 Replies

10. Shell Programming and Scripting

Replace Content

Hello all ; ) I'got a file1 with a lot of emails like : fistname.lastname@domaine1.comAnd another file2 with emails like fistname.lastname@domaine2.ct.netI need a shell script that will read each line from the file1 and try to find if in file2 the fistname.lastname exist. If yes, the... (1 Reply)
Discussion started by: Aswex
1 Replies
bup-margin(1)						      General Commands Manual						     bup-margin(1)

NAME
bup-margin - figure out your deduplication safety margin SYNOPSIS
bup margin [options...] DESCRIPTION
bup margin iterates through all objects in your bup repository, calculating the largest number of prefix bits shared between any two entries. This number, n, identifies the longest subset of SHA-1 you could use and still encounter a collision between your object ids. For example, one system that was tested had a collection of 11 million objects (70 GB), and bup margin returned 45. That means a 46-bit hash would be sufficient to avoid all collisions among that set of objects; each object in that repository could be uniquely identified by its first 46 bits. The number of bits needed seems to increase by about 1 or 2 for every doubling of the number of objects. Since SHA-1 hashes have 160 bits, that leaves 115 bits of margin. Of course, because SHA-1 hashes are essentially random, it's theoretically possible to use many more bits with far fewer objects. If you're paranoid about the possibility of SHA-1 collisions, you can monitor your repository by running bup margin occasionally to see if you're getting dangerously close to 160 bits. OPTIONS
--predict Guess the offset into each index file where a particular object will appear, and report the maximum deviation of the correct answer from the guess. This is potentially useful for tuning an interpolation search algorithm. --ignore-midx don't use .midx files, use only .idx files. This is only really useful when used with --predict. EXAMPLE
$ bup margin Reading indexes: 100.00% (1612581/1612581), done. 40 40 matching prefix bits 1.94 bits per doubling 120 bits (61.86 doublings) remaining 4.19338e+18 times larger is possible Everyone on earth could have 625878182 data sets like yours, all in one repository, and we would expect 1 object collision. $ bup margin --predict PackIdxList: using 1 index. Reading indexes: 100.00% (1612581/1612581), done. 915 of 1612581 (0.057%) SEE ALSO
bup-midx(1), bup-save(1) BUP
Part of the bup(1) suite. AUTHORS
Avery Pennarun <apenwarr@gmail.com>. Bup unknown- bup-margin(1)
All times are GMT -4. The time now is 08:43 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy