Urgent Need Help! Merging lines in .txt file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Urgent Need Help! Merging lines in .txt file
# 1  
Old 07-04-2008
Urgent Need Help! Merging lines in .txt file

I need to write a script that reads through an input .txt file and replaces the end value with the end value of the next line for lines that have distance <=4000. The first label line is not actually in the input. In the below example, 3217 is the distance from the end of the first line to the start of the second line. 14021 is the distance from the previous line (not included) to the start of the first line. So once the script finds a distance <=4000, it needs to replace the end of the previous line with the end of the current line.

Any help would be greatly appreciated! Thanks!

INPUT:

chrm start end block length distance
chr7 27398704 27399096 ENm010Block536 392 14021
chr7 27402314 27402466 ENm010Block537 152 3217
chr7 27412536 27412726 ENm010Block538 190 10069
chr7 27416032 27416424 ENm010Block539 392 3305
chr7 27420022 27420972 ENm010Block540 950 3597

Desired OUTPUT:

chr7 27398704 27402466
chr7 27412536 27420972

Last edited by awknerd; 07-04-2008 at 04:42 PM..
# 2  
Old 07-04-2008
If I understand correctly the output should be:

Code:
chr7 27398704 27402466
chr7 27412536 27416424

Am I missing something?
# 3  
Old 07-04-2008
No, actually I meant to put the original
27420972

because the next distance is <=4000 as well, so those two would get merged as well. See, when you have several distances<=4000 consecutively, you continue to merge them, until the distance is no longer <=4000.
# 4  
Old 07-04-2008
Could you post a bigger input sample with the desired input?
# 5  
Old 07-04-2008
Sure! This may take me a little while since I'm doing it manually, but it should be up in about 15 minutes. Thanks for your interest! Smilie
# 6  
Old 07-04-2008
INPUT: Use distance <=1000 to merge

chr7 27104483 27104633 ENm010Block71 150 0
chr7 27104634 27104812 ENm010Block72 178 0
chr7 27104813 27105154 ENm010Block73 341 0
chr7 27106872 27106977 ENm010Block74 105 1717
chr7 27106978 27107481 ENm010Block75 503 0
chr7 27107482 27108156 ENm010Block76 674 0
chr7 27108157 27108194 ENm010Block77 37 0
chr7 27108422 27108700 ENm010Block78 278 227
chr7 27109258 27109365 ENm010Block79 107 557
chr7 27109366 27109431 ENm010Block80 65 0
chr7 27109432 27110017 ENm010Block81 585 0
chr7 27110018 27110056 ENm010Block82 38 0
chr7 27110057 27110309 ENm010Block83 252 0
chr7 27110310 27110435 ENm010Block84 125 0
chr7 27110436 27110489 ENm010Block85 53 0
chr7 27110490 27110550 ENm010Block86 60 0
chr7 27110551 27110789 ENm010Block87 238 0
chr7 27111956 27112348 ENm010Block88 392 1166
chr7 27112374 27112830 ENm010Block89 456 25
chr7 27114388 27114881 ENm010Block90 493 1557
chr7 27114882 27115338 ENm010Block91 456 0
chr7 27115339 27115870 ENm010Block92 531 0
chr7 27116098 27116173 ENm010Block93 75 227
chr7 27116174 27116705 ENm010Block94 531 0
chr7 27116706 27116755 ENm010Block95 49 0
chr7 27116756 27116781 ENm010Block96 25 0
chr7 27116782 27116945 ENm010Block97 163 0
chr7 27116946 27117276 ENm010Block98 330 0
chr7 27117277 27117960 ENm010Block99 683 0
chr7 27118910 27119137 ENm010Block100 227 949
chr7 27119138 27119213 ENm010Block101 75 0
chr7 27119214 27119365 ENm010Block102 151 0
chr7 27119366 27119783 ENm010Block103 417 0
chr7 27119784 27119822 ENm010Block104 38 0
chr7 27119823 27119948 ENm010Block105 125 0
chr7 27119949 27119985 ENm010Block106 36 0
chr7 27119986 27120353 ENm010Block107 367 0
chr7 27120354 27120430 ENm010Block108 76 0
chr7 27120431 27120734 ENm010Block109 303 0
chr7 27120735 27120784 ENm010Block110 49 0
chr7 27120785 27121113 ENm010Block111 328 0
chr7 27121114 27121886 ENm010Block112 772 0
chr7 27121887 27121912 ENm010Block113 25 0
chr7 27121950 27122139 ENm010Block114 189 37
chr7 27122140 27122368 ENm010Block115 228 0
chr7 27122369 27122596 ENm010Block116 227 0
chr7 27123470 27123811 ENm010Block117 341 873
chr7 27123812 27124306 ENm010Block118 494 0
chr7 27124307 27125180 ENm010Block119 873 0
chr7 27126966 27127320 ENm010Block120 354 1785
chr7 27127612 27127725 ENm010Block121 113 291
chr7 27127726 27128410 ENm010Block122 684 0
chr7 27128411 27129055 ENm010Block123 644 0
chr7 27129056 27129182 ENm010Block124 126 0
chr7 27129183 27129550 ENm010Block125 367 0
chr7 27130006 27130043 ENm010Block126 37 455
chr7 27130044 27130880 ENm010Block127 836 0
chr7 27130881 27131260 ENm010Block128 379 0
chr7 27135440 27135630 ENm010Block129 190 4179
chr7 27136554 27136807 ENm010Block130 253 923
chr7 27136808 27136820 ENm010Block131 12 0
chr7 27136821 27136845 ENm010Block132 24 0
chr7 27136846 27136895 ENm010Block133 49 0
chr7 27136896 27137035 ENm010Block134 139 0
chr7 27137036 27137071 ENm010Block135 35 0
chr7 27137072 27137237 ENm010Block136 165 0
chr7 27137238 27137580 ENm010Block137 342 0
chr7 27137581 27137618 ENm010Block138 37 0
chr7 27137619 27137796 ENm010Block139 177 0


OUPUT:

chr7 27104483 27105154
chr7 27106872 27110789
chr7 27111956 27112830
chr7 27114388 27125180
chr7 27126966 27131260
chr7 27135440 27137618
chr7 27137619 27137796
# 7  
Old 07-04-2008
Hm,
with this code (use nawk or /usr/xpg4/bin/awk on Solaris):

Code:
awk 'END { print _, __ } 
1 == NR || $NF >= 1000 {
  if (c) print _, __ 
  _ = $1 FS $2
  c = 1
  }  
{ __ = $3 }' file

I get this output:

Code:
chr7 27104483 27105154
chr7 27106872 27110789
chr7 27111956 27112830
chr7 27114388 27125180
chr7 27126966 27131260
chr7 27135440 27137796

Do you realy want to treat the last line as in the example output?

It makes the code a bit ugly:

Code:
awk 'END { print _, ___ RS ____ } 
1 == NR || $NF >= 1000 {
  if (c) print _, __ 
  _ = $1 FS $2
  c = 1 
  }  
{ ___ = __
  __ = $3 
  ____ = $1 FS $2 FS $3 }
' file

You may need to check if the last line has $NF >= 1000,
if that matters, I should add more code.

And ..., don't blame me for choosing such variable names,
if you don't like them, just change them Smilie

Last edited by radoulov; 07-05-2008 at 03:30 AM.. Reason: modified
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Merging the lines of a file

Hello, I have a file with few lines starting with a digit (1-5 only ) followed by a dot (.). Remaining all the lines to be merged with its previous numbered lines. Merging must be done with a space. E.g., Source file: 3. abc def xyz 5. pqr mno def 4. jkl uvw 7. ghi 1. abc xyz 6. mno... (4 Replies)
Discussion started by: magnus29
4 Replies

2. Shell Programming and Scripting

Merging multiple files using lines from one file

I have been working of this script for a very long time and I have searched the internet for direction but I am stuck here. I have about 3000 files with two columns each. The length of each file is 50000. Each of these files is named this way b.4, b.5, b.6, b.7, b.8, b.9, b.10, b.11, b.12... (10 Replies)
Discussion started by: iconig
10 Replies

3. Shell Programming and Scripting

merging two .txt files by alternating x lines from file 1 and y lines from file2

Hi everyone, I have two files (A and B) and want to combine them to one by always taking 10 rows from file A and subsequently 6 lines from file B. This process shall be repeated 40 times (file A = 400 lines; file B = 240 lines). Does anybody have an idea how to do that using perl, awk or sed?... (6 Replies)
Discussion started by: ink_LE
6 Replies

4. Shell Programming and Scripting

sed to cp lines x->y from 1.txt into lines a->b in file2.txt

I have one base file, and multiple target files-- each have uniform line structure so no need to use grep to find things-- can just define sections by line number. My question is quite simple-- can I use sed to copy a defined block of lines (say lines 5-10) from filename1.txt to overwrite an... (3 Replies)
Discussion started by: czar21
3 Replies

5. Shell Programming and Scripting

Merging lines in a text file

hi, I have a file as below: Name: some_name Date: some_date Function Name: <some_function_name(jjjjjjjjj, fjddddd, gggg, ggg)> Changes:<Change A more of change A> Name: some_name Date: some_date Function Name: some_function_nameB(jjjjjjjjj, fjddddd, gggg, ggg) Changes:Change B... (15 Replies)
Discussion started by: flamingo_l
15 Replies

6. Shell Programming and Scripting

Urgent help needed on merging lines with similar words

Hi everyone, I need help with a merging problem. Basically, I have a file with several lines (in this example 9 lines) such as: Amie, Jay, Sasha, Rob, Kay Mia, Frank Jay, Nancy, Cecil Paul, Ked, Nancy, 17, Fred 14, 16, 18, 20 9, 11 12, Frank 18, Peter, 62 Nancy, 27 A delimiter is... (3 Replies)
Discussion started by: awb221
3 Replies

7. Shell Programming and Scripting

Merging lines in a file

Hi, I want to merge the lines starting with a comma symbol with the previous line of the file. Input : cat file.txt name1,name2 ,name3,name4 emp1,emp2,emp3 ,emp4 ,emp5 user1,user2 ,user3 Output name1,name2,name3,name4 emp1,emp2,emp3,emp4,emp5 (9 Replies)
Discussion started by: mohan_tuty
9 Replies

8. Shell Programming and Scripting

Merging lines based on occurances of a particular character in a file

Hi, Is there any way to merge two lines based on specific occurance of a character in a file. I am having a flat file which contains multiple records. Each row in the file should contain specified number of delimiter. For a particular row , if the delimiter count is not matched with... (2 Replies)
Discussion started by: mohan_tuty
2 Replies

9. UNIX for Dummies Questions & Answers

merging two lines in a file

Hi All, I want to merge two lines in a file till the end of the file. So what could be the command to get so. say file name : sample.txt contents: country=1 send apps =1 rece=2 country=2 send apps =3 rece=3 .. ... output: country=1;send apps =1 rece=2 country=2;send apps =3... (6 Replies)
Discussion started by: thaduka
6 Replies

10. Shell Programming and Scripting

[Urgent]how to print the file names into a txt file???

HI, I have a folder with some 120 files...i just want to print all the file filenames(not the content or anything else) onto a file say .txt. please help me with this command Thanks a lot. (15 Replies)
Discussion started by: kumarsaravana_s
15 Replies
Login or Register to Ask a Question