Find out match characters on all lines


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Find out match characters on all lines
# 8  
Old 09-24-2012
Quote:
Originally Posted by cwzkevin
I have a file with 22 lines. Each line has only 5 different chars, no white space, and each line is 3,278,824 in length. The 5 chars is "-", "A", "B", "C", "D". ...
22 lines of 3,278,824 chars to be compared char by char!? Wouldn't it be much easier if we could transpose that matrix (yet I don't know how, right now, from the top of my head) ?
This User Gave Thanks to RudiC For This Post:
# 9  
Old 09-24-2012
Quote:
Originally Posted by RudiC
22 lines of 3,278,824 chars to be compared char by char!? Wouldn't it be much easier if we could transpose that matrix (yet I don't know how, right now, from the top of my head) ?
Yes, transpose it and use awk with NF=22, NR=3278824 is the direction of solution in my mind. Actually, pamu's solution is done by transpose it with awk and also count it by awk!!! GREAT!

Last edited by cwzkevin; 09-24-2012 at 05:39 PM..
# 10  
Old 09-25-2012
The sample zip file you provided is not too good an example. I managed to transpose it though, but lines seem identical on the first sight. Could you provide a sample with 22 lines and, say a few thousand chars per line?
# 11  
Old 09-25-2012
Quote:
Originally Posted by RudiC
The sample zip file you provided is not too good an example. I managed to transpose it though, but lines seem identical on the first sight. Could you provide a sample with 22 lines and, say a few thousand chars per line?
Thank you, Rudi.

I think Pamu's solution works. (Thank you Pamu.)
# 12  
Old 09-25-2012
This works (on linux with bash and GNU tools!) for your four line 24 char example from post #1:
Code:
$ cat sedfile
1 {s/\(.\)/\1\n/g;w1.tmp
  }
2 {s/\(.\)/\1\n/g;w2.tmp
  }
3 {s/\(.\)/\1\n/g;w3.tmp
  }
4 {s/\(.\)/\1\n/g;w4.tmp
  }
$ sed -nf sedfile infile
$ paste -d" " ?.tmp >filetransposed
$ awk  '{split ($0, b)
          L=1
          for (i=1;i<NF;i++) {L=L && b[i]==b[i+1] && b[i]!="-"; if (!L) break}
         }
         L {n++; printf "%s", b[1]}
         !L {for (i=1;i<=NF;i++) if (b[i] != "-") printf "%s",b[i] >"line"i}
         END {print "\t",n; for (i=1;i<=NF;i++) print "" > "line"i}
        ' filetransposed
ABCDDBBBB     9
$ cat line?
ACBD
ADCC
AAC
AADCD

---------- Post updated at 06:04 PM ---------- Previous update was at 05:58 PM ----------

Quote:
Originally Posted by cwzkevin
I think Pamu's solution works. (Thank you Pamu.)
OK, but does it meet your requirement 3)?
This User Gave Thanks to RudiC For This Post:
# 13  
Old 09-27-2012
Thanks, Rudi. I got all my solutions now. Thank you!
Quote:
Originally Posted by RudiC
This works (on linux with bash and GNU tools!) for your four line 24 char example from post #1:
Code:
$ cat sedfile
1 {s/\(.\)/\1\n/g;w1.tmp
  }
2 {s/\(.\)/\1\n/g;w2.tmp
  }
3 {s/\(.\)/\1\n/g;w3.tmp
  }
4 {s/\(.\)/\1\n/g;w4.tmp
  }
$ sed -nf sedfile infile
$ paste -d" " ?.tmp >filetransposed
$ awk  '{split ($0, b)
          L=1
          for (i=1;i<NF;i++) {L=L && b[i]==b[i+1] && b[i]!="-"; if (!L) break}
         }
         L {n++; printf "%s", b[1]}
         !L {for (i=1;i<=NF;i++) if (b[i] != "-") printf "%s",b[i] >"line"i}
         END {print "\t",n; for (i=1;i<=NF;i++) print "" > "line"i}
        ' filetransposed
ABCDDBBBB     9
$ cat line?
ACBD
ADCC
AAC
AADCD

---------- Post updated at 06:04 PM ---------- Previous update was at 05:58 PM ----------



OK, but does it meet your requirement 3)?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to combine lines if fields match in lines

In the awk below, what I am attempting to do is check each line in the tab-delimeted input, which has ~20 lines in it, for a keyword SVTYPE=Fusion. If the keyword is found I am splitting $3 using the . (dot) and reading the portion before and after the dot in an array a. If it does have that... (12 Replies)
Discussion started by: cmccabe
12 Replies

2. Shell Programming and Scripting

Match the value & print lines from the match

Hello, I have a file contains two columns. I need to print the lines after “xxx” so i'm trying to match "xxx" & cut the lines after that. I'm trying with the grep & cut command, if there any simple way to extract this please help me. Sample file : name id AAA 123 AAB 124 AAC 125... (4 Replies)
Discussion started by: Shenbaga.d
4 Replies

3. UNIX for Dummies Questions & Answers

awk - (URGENT!) Print lines sort and move lines if match found

URGENT HELP IS NEEDED!! I am looking to move matching lines (01 - 07) from File1 and 77 tab the matching string from File2, to File3.txt. I am almost done but - Currently, script is not printing lines to File3.txt in order. - Also the matching lines are not moving out of File1.txt ... (1 Reply)
Discussion started by: High-T
1 Replies

4. Shell Programming and Scripting

Based on column in file1, find match in file2 and print matching lines

file1: file2: I need to find matches for any lines in file1 that appear in file2. Desired output is '>' plus the file1 term, followed by the line after the match in file2 (so the title is a little misleading): This is honestly beyond what I can do without spending the whole night on it, so I'm... (2 Replies)
Discussion started by: pathunkathunk
2 Replies

5. Shell Programming and Scripting

Need to find lines where the length is less than 50 characters

Hi, I have a big file say abc.csv. And in that file, I need to find lines whose length is less than 50 characters. How can it be achieved? Thanks in advance. Thanks (4 Replies)
Discussion started by: Gangadhar Reddy
4 Replies

6. UNIX for Dummies Questions & Answers

awk display the match and 2 lines after the match is found.

Hello, can someone help me how to find a word and 2 lines after it and then send the output to another file. For example, here is myfile1.txt. I want to search for "Error" and 2 lines below it and send it to myfile2.txt I tried with grep -A but it's not supported on my system. I tried with awk,... (4 Replies)
Discussion started by: eurouno
4 Replies

7. Shell Programming and Scripting

find a word and print n lines before and after the match

how to find a word and print n lines before and after the match until a blank line is encounterd (14 Replies)
Discussion started by: chidori
14 Replies

8. Shell Programming and Scripting

How to find lines that match exact input and count?

I am writing a package manager in BASH and I would like a small snippet of code that finds lines that match exact input and count them. For example, my file contains: xyz xyz-lib2.0+ xyz-lib2.0 xyz-lib1.5 and "grep -c xyz" returns 4. The current function is: # $1 is the package name.... (3 Replies)
Discussion started by: cooprocks123e
3 Replies

9. Shell Programming and Scripting

sed problem - delete all lines until a match on 2 lines

First of all, I know this can be more eassily done with perl or other scripting languages but, that's not the issue. I need this in sed. (or wander if it's possible ) I got a file (trace file to recreate the control file from oracle for the dba boys) which contains some lines another line... (11 Replies)
Discussion started by: plelie2
11 Replies

10. Shell Programming and Scripting

Find lines greater than 80 characters in a file

Hi, Can anyone please give me the grep command to find all the lines in a file that exceed 80 columns Thanks, gubbala (8 Replies)
Discussion started by: mrgubbala
8 Replies
Login or Register to Ask a Question