Find out match characters on all lines


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Find out match characters on all lines
# 1  
Old 09-24-2012
Find out match characters on all lines

I have a file with 22 lines. Each line has only 5 different chars, no white space, and each line is 3,278,824 in length. The 5 chars is "-", "A", "B", "C", "D".
Below is an example of the first 25 chars of the first four lines of the file.
Code:
-----ABCDA--CD-BBBBB----D
--A--ABCD--DCD-BBBBC-----
A-A--ABCD---CD-BBBB------
--A--ABCDA-D-D-BBBBC----D

my desire output from the above example is
(1) the number of fully matched alphabet characters of each line on all lines: 9. Which are "ABCD" at column 6~9, "D" at column 14, "BBBB" at 16~19, total 9 chars of fully match. Note, "-" does not count.
(2) the fully matched alphabet characters: ABCDDBBBB
(3) each line outputs a file of not matched alphabet characters:
line1: ACBD
line2: ADCC
line3: AAC
line4: AADCD

The program I could utilize includes bash shell, awk, sed, python, perl, R, mysql, java, c etc. I just couldn't find a way to do it. Smilie
Please help, thanks in advance~!
# 2  
Old 09-24-2012
I think you can do it with awk ?
the 1)
Code:
awk '
/^.........patern/ {n+=1}
/^.....patern/ {m+=1}
end {print n ; print m}
' file

6 "." is for the patern in the 7th column.
This User Gave Thanks to delugeag For This Post:
# 3  
Old 09-24-2012
Quote:
Originally Posted by cwzkevin
Code:
-----ABCDA--CD-BBBBB----D
--A--ABCD--DCD-BBBBC-----
A-A--ABCD---CD-BBBB------
--A--ABCDA-D-D-BBBBC----D

(2) the fully matched alphabet characters: ABCDDBBBB
For your second requirement..

assuming you have same number of characters per line..

try this..
Code:
sed -e 's/.\{1\}/& /g' file | awk '{ for(i=1;i<=NF;i++){a[NR,i]=$i;max=NF}maN=NR;}END{
for (i=1;i<=max;i++){
k=0;p="";
for (j=1;j<=maN;j++){
if(p){if(p != a[j,i]){k=1}}else{p=a[j,i];};
}
if(k != 1 && p != "-"){ print p;}
}
}'

This User Gave Thanks to pamu For This Post:
# 4  
Old 09-24-2012
Quote:
Originally Posted by delugeag
I think you can do it with awk ?
the 1)
Code:
awk '
/^.........patern/ {n+=1}
/^.....patern/ {m+=1}
end {print n ; print m}
' file

6 "." is for the patern in the 7th column.
Thanks!
My bad! The example I have up there is just for example. The matching actually has no pattern at all. I don't know where the matching(s) is/are happening and I don't know how many of them are out there and I don't know how long of each matching. Since each line is 3,278,824 in length...
# 5  
Old 09-24-2012
Quote:
Originally Posted by cwzkevin
My bad! The example I have up there is just for example. The matching actually has no pattern at all. I don't know where the matching(s) is/are happening and I don't know how many of them are out there and I don't know how long of each matching. Since each line is 3,278,824 in length...
Please provide some extra info about your input file.
# 6  
Old 09-24-2012
Quote:
Originally Posted by pamu
For your second requirement..

assuming you have same number of characters per line..

try this..
Code:
sed -e 's/.\{1\}/& /g' file | awk '{ for(i=1;i<=NF;i++){a[NR,i]=$i;max=NF}maN=NR;}END{
for (i=1;i<=max;i++){
k=0;p="";
for (j=1;j<=maN;j++){
if(p){if(p != a[j,i]){k=1}}else{p=a[j,i];};
}
if(k != 1 && p != "-"){ print p;}
}
}'

Thanks, your code works on my example. But it outputs the matching char one at a line. I need to do something like
Code:
 | tr -d '\n'

to remove the newline of every line to get it to print out onto one line only. Thanks, I need to test it on my real file. Should work. Thanks.
# 7  
Old 09-24-2012
Quote:
Originally Posted by pamu
Please provide some extra info about your input file.
file.zip
The attach is the first 8192 chars of the first 4 lines. I just need to test the script you provided through and through.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to combine lines if fields match in lines

In the awk below, what I am attempting to do is check each line in the tab-delimeted input, which has ~20 lines in it, for a keyword SVTYPE=Fusion. If the keyword is found I am splitting $3 using the . (dot) and reading the portion before and after the dot in an array a. If it does have that... (12 Replies)
Discussion started by: cmccabe
12 Replies

2. Shell Programming and Scripting

Match the value & print lines from the match

Hello, I have a file contains two columns. I need to print the lines after “xxx” so i'm trying to match "xxx" & cut the lines after that. I'm trying with the grep & cut command, if there any simple way to extract this please help me. Sample file : name id AAA 123 AAB 124 AAC 125... (4 Replies)
Discussion started by: Shenbaga.d
4 Replies

3. UNIX for Dummies Questions & Answers

awk - (URGENT!) Print lines sort and move lines if match found

URGENT HELP IS NEEDED!! I am looking to move matching lines (01 - 07) from File1 and 77 tab the matching string from File2, to File3.txt. I am almost done but - Currently, script is not printing lines to File3.txt in order. - Also the matching lines are not moving out of File1.txt ... (1 Reply)
Discussion started by: High-T
1 Replies

4. Shell Programming and Scripting

Based on column in file1, find match in file2 and print matching lines

file1: file2: I need to find matches for any lines in file1 that appear in file2. Desired output is '>' plus the file1 term, followed by the line after the match in file2 (so the title is a little misleading): This is honestly beyond what I can do without spending the whole night on it, so I'm... (2 Replies)
Discussion started by: pathunkathunk
2 Replies

5. Shell Programming and Scripting

Need to find lines where the length is less than 50 characters

Hi, I have a big file say abc.csv. And in that file, I need to find lines whose length is less than 50 characters. How can it be achieved? Thanks in advance. Thanks (4 Replies)
Discussion started by: Gangadhar Reddy
4 Replies

6. UNIX for Dummies Questions & Answers

awk display the match and 2 lines after the match is found.

Hello, can someone help me how to find a word and 2 lines after it and then send the output to another file. For example, here is myfile1.txt. I want to search for "Error" and 2 lines below it and send it to myfile2.txt I tried with grep -A but it's not supported on my system. I tried with awk,... (4 Replies)
Discussion started by: eurouno
4 Replies

7. Shell Programming and Scripting

find a word and print n lines before and after the match

how to find a word and print n lines before and after the match until a blank line is encounterd (14 Replies)
Discussion started by: chidori
14 Replies

8. Shell Programming and Scripting

How to find lines that match exact input and count?

I am writing a package manager in BASH and I would like a small snippet of code that finds lines that match exact input and count them. For example, my file contains: xyz xyz-lib2.0+ xyz-lib2.0 xyz-lib1.5 and "grep -c xyz" returns 4. The current function is: # $1 is the package name.... (3 Replies)
Discussion started by: cooprocks123e
3 Replies

9. Shell Programming and Scripting

sed problem - delete all lines until a match on 2 lines

First of all, I know this can be more eassily done with perl or other scripting languages but, that's not the issue. I need this in sed. (or wander if it's possible ) I got a file (trace file to recreate the control file from oracle for the dba boys) which contains some lines another line... (11 Replies)
Discussion started by: plelie2
11 Replies

10. Shell Programming and Scripting

Find lines greater than 80 characters in a file

Hi, Can anyone please give me the grep command to find all the lines in a file that exceed 80 columns Thanks, gubbala (8 Replies)
Discussion started by: mrgubbala
8 Replies
Login or Register to Ask a Question