Sponsored Content
Top Forums Shell Programming and Scripting Number of matches in 2 strings Post 302657897 by newbie83 on Monday 18th of June 2012 12:06:01 PM
Old 06-18-2012
Number of matches in 2 strings

Hello all,

I have a file with column header which looks like this.

Code:
C1 C2 C3 
A A G
T T A
G C C

I want to make columnwise (and bitwise) comparison of strings and calculate the number of matches.

So the number of matches between C1 and C2 will be comparing ATG and ATC.
Here there are two matches, A and T.

Similarly C1 and C3 has no matches and C2 and C3 has 1 match C.

The output should look like

Code:
C1 C2 2
C1 C3 0
C2 C3 1

I have 10 million rows and 320 columns, so i`m not being able to deal with such a huge file in Windows.

Can someone please help with code for this. I have access to UNIX bash with Red Hat Enterprise Linux 6.

Thanks
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Select matches between line number and end of file?

Hi Guys/Gals, I have a log file that is updated once every few seconds and I am looking for a way to speed up one of my scripts. Basically what I am trying to do is grep through a text file from start to finish once. Then each subsequent grep starts at the last line of the previous grep to... (4 Replies)
Discussion started by: Jerrad
4 Replies

2. Shell Programming and Scripting

Display LineNo Incase Total Number Of Delimiter Does matches in a given variable

I have many files .dat extension. requirement is to display line no if no of delimiter does not matches in a given variable lets say File: REF_BETOS.dat HCPCS_OR_CPT_CODE~BETOS_CODE~TERMINATION_DATE 0001F~Z2~ 0003T~I4~B20061231 0005F~Z2~~~ 0008T~P8~B20061231... (1 Reply)
Discussion started by: ainuddin
1 Replies

3. Shell Programming and Scripting

Get line number when matches a string

If I have a file something like as shown below, ARM*187878*hjhj BAG*88778*jjjj COD*7777*kkkk BAG*87878*kjjhjk DEF*65656*89989*khjkk I need the line numbers to be added with a colon when it matches the string "BAG". Here in my case, I need something like ARM*187878*hjhj... (4 Replies)
Discussion started by: Muthuraj K
4 Replies

4. Shell Programming and Scripting

grep - match files containing minimum number of pattern matches

I want to search a bunch of files and list only those containing a minimum number of pattern matches. So if I want to identify files containing 3 (or more) instances of the pattern "said:" and I have file1 that contains the lines: He said: She said: and file2 that contains the lines: He... (3 Replies)
Discussion started by: stumpyuk
3 Replies

5. Shell Programming and Scripting

Help in printing n number of lines if a search string matches in a file

Hi I have below script which is used to grep specific errors and if error string matches send an email alert. Script is working fine , however , i wish to print next 10 lines of the string match to get the details of error in the email alert Current code:- #!/bin/bash tail -Fn0 --retry... (2 Replies)
Discussion started by: neha0785
2 Replies

6. Shell Programming and Scripting

Count number of pattern matches per line for all files in directory

I have a directory of files, each with a variable (though small) number of lines. I would like to go through each line in each file, and print the: -file name -line number -number of matches to the pattern /comp/ for each line. Two example files: cat... (4 Replies)
Discussion started by: pathunkathunk
4 Replies

7. Shell Programming and Scripting

Exclude lines in a file with matches with multiple Strings using egrep

Hi I have a txt file and I would like to use egrep without using -v option to exclude the lines which matches with multiple Strings. Let's say I have some text in the txt file. The command should not fetch lines if they have strings something like CAT MAT DAT The command should fetch me... (4 Replies)
Discussion started by: Sathwik
4 Replies

8. Shell Programming and Scripting

Number of matches and matched pattern(s) in awk

input: !@#$%2QW5QWERTAB$%^&* The string above is not separated (or FS=""). For clarity sake one could re-write the string by including a "|" as FS as follow: !|@|#|$|%|2QW|5QWERT|A|B|$|%|^|&|* Here, I am only interested in patterns (their numbers are variable between records) containing... (16 Replies)
Discussion started by: beca123456
16 Replies

9. Shell Programming and Scripting

Print line if values in fields matches number and text

datafile: 2017-03-24 10:26:22.098566|5|'No Route for Sndr:RETEK RMS 00040 /ZZ Appl:PF Func:PD Txn:832 Group Cntr:None ISA CntlNr:None Ver:003050 '|'2'|'PFI'|'-'|'EAI_ED_DeleteAll'|'EAI_ED'|NULL|NULL|NULL|139050594|ActivityLog| 2017-03-27 02:50:02.028706|5|'No Route for... (7 Replies)
Discussion started by: SkySmart
7 Replies

10. Shell Programming and Scripting

Replace all string matches in file with unique random number

Hello Take this file... Test01 Ref test Version 01 Test02 Ref test Version 02 Test66 Ref test Version 66 Test99 Ref test Version 99 I want to substitute every occurrence of Test{2} with a unique random number, so for example, if I was using sed, substitution would be something... (1 Reply)
Discussion started by: funkman
1 Replies
bup-margin(1)						      General Commands Manual						     bup-margin(1)

NAME
bup-margin - figure out your deduplication safety margin SYNOPSIS
bup margin [options...] DESCRIPTION
bup margin iterates through all objects in your bup repository, calculating the largest number of prefix bits shared between any two entries. This number, n, identifies the longest subset of SHA-1 you could use and still encounter a collision between your object ids. For example, one system that was tested had a collection of 11 million objects (70 GB), and bup margin returned 45. That means a 46-bit hash would be sufficient to avoid all collisions among that set of objects; each object in that repository could be uniquely identified by its first 46 bits. The number of bits needed seems to increase by about 1 or 2 for every doubling of the number of objects. Since SHA-1 hashes have 160 bits, that leaves 115 bits of margin. Of course, because SHA-1 hashes are essentially random, it's theoretically possible to use many more bits with far fewer objects. If you're paranoid about the possibility of SHA-1 collisions, you can monitor your repository by running bup margin occasionally to see if you're getting dangerously close to 160 bits. OPTIONS
--predict Guess the offset into each index file where a particular object will appear, and report the maximum deviation of the correct answer from the guess. This is potentially useful for tuning an interpolation search algorithm. --ignore-midx don't use .midx files, use only .idx files. This is only really useful when used with --predict. EXAMPLE
$ bup margin Reading indexes: 100.00% (1612581/1612581), done. 40 40 matching prefix bits 1.94 bits per doubling 120 bits (61.86 doublings) remaining 4.19338e+18 times larger is possible Everyone on earth could have 625878182 data sets like yours, all in one repository, and we would expect 1 object collision. $ bup margin --predict PackIdxList: using 1 index. Reading indexes: 100.00% (1612581/1612581), done. 915 of 1612581 (0.057%) SEE ALSO
bup-midx(1), bup-save(1) BUP
Part of the bup(1) suite. AUTHORS
Avery Pennarun <apenwarr@gmail.com>. Bup unknown- bup-margin(1)
All times are GMT -4. The time now is 07:42 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy