Matching numbers of characters in two lines


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Matching numbers of characters in two lines
# 1  
Old 03-09-2011
Matching numbers of characters in two lines

Dear all,

I'm stuck on a certain problem regarding counting the number of characters in one line and then adjusting the number of characters of another line to this number.

This was my original input data:

Code:
@HWI-ST471_57:1:1:1231:2079/2
TTGGTTTATATGGTTTCGGTTGCCTTCTATTAGGCTGTGATTGGCTCATGTAATTGANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
IFF;CGBAFFFFFDBBBD=EAA7DBDFDCDF?A.C:;:7:@4@,8,.787>+AAD>?###########################################
@HWI-ST471_57:1:1:1220:2195/2
TTGGATGGAGCAGAGCAGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
<;>?59,<99B@C@<CDDC#################################################################################
@HWI-ST471_57:1:1:1230:2241/2
GGCTGGGTGAGGTTTCCCGTGTTGAGTCAANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
GDGG;D6+D=@A6@9BEEE>D=D@<5<?@C######################################################################

Now I'm using this comand to delete T's and A's at the beginning of every 4th line starting at line #2:

Code:
sed -e '2~4 s/T*//' -e '2~4 s/A*//'

This results in
Code:
@HWI-ST471_57:1:1:1231:2079/2
GGTTTATATGGTTTCGGTTGCCTTCTATTAGGCTGTGATTGGCTCATGTAATTGANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
IFF;CGBAFFFFFDBBBD=EAA7DBDFDCDF?A.C:;:7:@4@,8,.787>+AAD>?###########################################
@HWI-ST471_57:1:1:1220:2195/2
GGATGGAGCAGAGCAGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
<;>?59,<99B@C@<CDDC#################################################################################
@HWI-ST471_57:1:1:1230:2241/2
GGCTGGGTGAGGTTTCCCGTGTTGAGTCAANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
GDGG;D6+D=@A6@9BEEE>D=D@<5<?@C######################################################################

Now I want the number of characters of every 4th line starting at line 1 to match the characters of every 4th line starting at line 2. The desired output is:

Code:
@HWI-ST471_57:1:1:1231:2079/2
GGTTTATATGGTTTCGGTTGCCTTCTATTAGGCTGTGATTGGCTCATGTAATTGANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
F;CGBAFFFFFDBBBD=EAA7DBDFDCDF?A.C:;:7:@4@,8,.787>+AAD>?###########################################
@HWI-ST471_57:1:1:1220:2195/2
GGATGGAGCAGAGCAGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
>?59,<99B@C@<CDDC#################################################################################
@HWI-ST471_57:1:1:1230:2241/2
GGCTGGGTGAGGTTTCCCGTGTTGAGTCAANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
GDGG;D6+D=@A6@9BEEE>D=D@<5<?@C######################################################################

Do you have any idea how I can alter my approach or add another command using awk/sed to reach this result?

THx!
# 2  
Old 03-09-2011
Hope this could help you,

Code:
awk 'NR%4==2{sub("T*|A*","");l=length($0)}NR%4==0{$0=substr($0,length($0)-l+1)}1' file

This User Gave Thanks to yinyuemi For This Post:
# 3  
Old 03-09-2011
Quote:
Originally Posted by DerSeb
Dear all,
Code:
sed -e '2~4 s/T*//' -e '2~4 s/A*//'

What does "2~4" in the above code signifies? I am damn sure it is not selecting lines from 2 to 4 when I tested it. It just process the 2nd line only.
# 4  
Old 03-10-2011
Quote:
Originally Posted by royalibrahim
What does "2~4" in the above code signifies? I am damn sure it is not selecting lines from 2 to 4 when I tested it. It just process the 2nd line only.
No, it's not selecting lines 2 to 4.

It means start at line 2 (included) and from there select every fourth line. At least that's what I hope it does Smilie

Therefor it should process all lines that describe the ATGC sequence in my file.

---------- Post updated at 04:21 AM ---------- Previous update was at 04:15 AM ----------

Quote:
Originally Posted by yinyuemi
Hope this could help you,

Code:
awk 'NR%4==2{sub("T*|A*","");l=length($0)}NR%4==0{$0=substr($0,length($0)-l+1)}1' file

thanks, this works great!

Is it also possible to print only those sets of four lines where more that 15 Ts or As where deleted?
# 5  
Old 03-11-2011
Quote:
Originally Posted by DerSeb
No, it's not selecting lines 2 to 4.

It means start at line 2 (included) and from there select every fourth line. At least that's what I hope it does Smilie.
Yes, thanks for the clarification Smilie
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Compare file1 for matching line in file2 and print the difference in matching lines

Hello, I have two files file 1 and file 2 each having result of a query on certain database tables and need to compare for Col1 in file1 with Col3 in file2, compare Col2 with Col4 and output the value of Col1 from File1 which is a) not present in Col3 of File2 b) value of Col2 is different from... (2 Replies)
Discussion started by: RasB15
2 Replies

2. Shell Programming and Scripting

Insert lines above matching line with content from matching

Hi, I have text file: Name: xyz Gender: M Address: "120_B_C; ksilskdj; lsudlfw" Zip: 20392 Name: KLM Gender: F Address: "65_D_F; wnmlsi;lsuod;,...." Zip:90233I want to insert 2 new lines before the 'Address: ' line deriving value from this Address line value The Address value in quotes... (1 Reply)
Discussion started by: ysrini
1 Replies

3. Shell Programming and Scripting

Matching and retreiving numbers

Hi, I have one file 1.txt with one field consist of following Ids (shortlisted 10 but showing 3 here): 00052 00184 00607 and then second file 2.txt with three fields (very big file): 00052 00184 12.73062 00052 00598 13.51205 00052 00599 13.92554 00052 00600 13.73358... (2 Replies)
Discussion started by: bioinfo
2 Replies

4. Shell Programming and Scripting

Adding numbers matching with words

Hi All, I have a file which looks like this: abc 1 abc 2 abc 3 abc 4 abc 5 bcd 1 bcd 3 bcd 3 bcd 5 cde 7 This file is just a miniature version of what I really have. Original file is some 1 million lines long. I have tried to come up with the code for what I wish to accomplish... (1 Reply)
Discussion started by: shoaibjameel123
1 Replies

5. Shell Programming and Scripting

Regex/egrep matching numbers in brackets

Experts: I don't know that regular expressions will ever be easy for me, so if one of you guru's could help out, I'd appreciate it. I'm trying to match a line in our syslog, but I can't figure out how to match a number inside a bracket. This is what I'm trying to match. "Jul 16 00:01:34... (2 Replies)
Discussion started by: jdveencamp
2 Replies

6. Shell Programming and Scripting

Matching Numbers in Bash/AWK

Hi, I need to match up some numbers in one file to the closest numbers in other file and produce an output file. File one (f1.txt) is laid out like this PCode Lon Lat AB10 1AA 57.148235 -2.096648 BB2 3JD 53.728563 -2.47852 LU4 9ET... (4 Replies)
Discussion started by: ian_gooch
4 Replies

7. Shell Programming and Scripting

Perl XML, find matching condition and grep lines and put the lines somewhere else

Hi, my xml files looks something like this <Instance Name="New York"> <Description></Description> <Instance Name="A"> <Description></Description> <PropertyValue Key="false" Name="Building A" /> </Instance> <Instance Name="B"> ... (4 Replies)
Discussion started by: tententen
4 Replies

8. Shell Programming and Scripting

Finding lines matching the Pattern and their previous lines in a file

Hi, I am trying to locate the occurences of certain pattern like 'Possible network disconnect' in a text file. I can get the actual lines matching the pttern using: grep -w 'Possible network disconnect' file_name. But I am more interested in getting the timing of these events which are... (7 Replies)
Discussion started by: sagarparadkar
7 Replies

9. Shell Programming and Scripting

AIX equivalent to GNU grep's -B and -A [print lines after or before matching lines]

Hi folks I am not allowed to install GNU grep on AIX. Here my code excerpt: grep_fatal () { /usr/sfw/bin/gegrep -B4 -A2 "FATAL|QUEUE|SIGHUP" } Howto the same on AIX based machine? from manual GNU grep ‘--after-context=num’ Print num lines of trailing context after... (4 Replies)
Discussion started by: slashdotweenie
4 Replies

10. Shell Programming and Scripting

matching numbers

hi all I have a 2 files. both the files have some numbers and i want to find out each number in file1 is existing or not in file2. if not then put it into new file. if yes then also in a seperate file i can not use diff command as the files are different and no order has been defined. ... (2 Replies)
Discussion started by: infyanurag
2 Replies
Login or Register to Ask a Question