how to fetch substring from records into another file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting how to fetch substring from records into another file
# 8  
Old 08-01-2008
I tried
perl -pe 's/(.{70})(?!$)/$1\n/g unless m/^>/' file_test

it is giving the error---

perl -pe 's/(.{70})(?file_test)/$1\n/g unless m/^>/' file_test
Sequence (?f...) not recognized in regex; marked by <-- HERE in m/(.{70})(?f <-- HERE ile_test)/ at -e line 1.
# 9  
Old 08-01-2008
its giving an error

I tried this

perl -pe 's/(.{70})(?!$)/$1\n/g unless m/^>/' file_test

and it gave the error

Sequence (?f...) not recognized in regex; marked by <-- HERE in m/(.{70})(?f <-- HERE ile_test)/ at -e line 1.

what exactly is (?!$) doing?
# 10  
Old 08-01-2008
Quote:
Originally Posted by smriti_shridhar
I run that script on files which need to extract longer substrings and the format of output file is getting disturbed. I tried some options of printf but coudn't correct it. Please help me out.

===FILE1====
>bi|37779709|geb|AAP20876.1| kinetic [ ternata]
MASKLLLFLLPAILGLIIPRPAVAVGTNYLLSGETLDTDGHLKNGDFDFIMQEDCNAVLYNGNWQSNTAN
KGRDCKLTLTDRGELVINNGEGSAVWRSGSQSAKGNYAAVLHPEGKLVIYGPSVFKINPWVPGLNSLRLG
NVPFTCNMLFSGQVLYGDGKITARNHMLVMQGDCNLVLYGGKCDWQSNTHGNGEHCFLRLNHKGELIIKD
DDFKSIWSSQSSSKQGDYVFILQDNGYGVIYGPAIWATSSKRSVAAQETMIGMVTEKVN
>bi|146403769|geb|ABQ32294.1| pure [an eg]
MAKLLLFLLPAILGLLIPRSAVALGTNYLLSGQTLNTDGHLKNGDFDLVMQNDCNLVLYNGNWQSNTANN
GRDCKLTLTDYGELVIKNGDGSTVWRSRAKSVKGNYAAVLHPDGRLVVFGPSVFKIDPWVPGLNSLRFRN
IPFTDNLLFSGQVLYGDGRLTAKNHQLVMQGDCNLVLYGGKYGWQSNTHGNGEHCFLRLNHKGELIIKDD
DFRPSGAAVPAPSR

===FILE2====
bi|37779709|geb|AAP20876.1| 28 264
bi|146403769|geb|ABQ32294.1| 27 224

===OUTPUTFILE===
>gi|37779709|gb|AAP20876.1| lectin [Pinellia ternata] (28-264)
NYLLSGETLDTDGHLKNGDFDFIMQEDCNAVLYNGNWQSNTANKGRDCKLTLTDRGELVINNGEGSAVWRSGSQSAKGNYAAVLHPEGKLVIYGPSVFKI NPWVPGLNSLRLGNVPFTCNMLFSGQVLYGDGKITARNHMLVMQGDCNLVLYGGKCDWQSNTHGNGEHCFLRLNHKGELIIKDDDFKSIWSSQSSSKQGD YVFILQDNGYGVIYGPAIWATSSKRSVAAQETMIGM
>gi|146403769|gb|ABQ32294.1| lectin [Colocasia esculenta] (27-224)
NYLLSGQTLNTDGHLKNGDFDLVMQNDCNLVLYNGNWQSNTANNGRDCKLTLTDYGELVIKNGDGSTVWRSRAKSVKGNYAAVLHPDGRLVVFGPSVFKI DPWVPGLNSLRFRNIPFTDNLLFSGQVLYGDGRLTAKNHQLVMQGDCNLVLYGGKYGWQSNTHGNGEHCFLRLNHKGELIIKDDDFRPSGAAVPAPS


where as the output should be like this:
>gi|37779709|gb|AAP20876.1| lectin [Pinellia ternata] (28-264)
NYLLSGETLDTDGHLKNGDFDFIMQEDCNAVLYNGNWQSNTANKGRDCKLTLTDRGELVINNGEGSAVWR
SGSQSAKGNYAAVLHPEGKLVIYGPSVFKINPWVPGLNSLRLGNVPFTCNMLFSGQVLYGDGKITARNHML
VMQGDCNLVLYGGKCDWQSNTHGNGEHCFLRLNHKGELIIKDDDFKSIWSSQSSSKQGDYVFILQDNGY
GVIYGPAIWATSSKRSVAAQETMIGM
>gi|146403769|gb|ABQ32294.1| lectin [Colocasia esculenta] (27-224)
NYLLSGQTLNTDGHLKNGDFDLVMQNDCNLVLYNGNWQSNTANNGRDCKLTLTDYGELVIKNGDGSTVWR
SRAKSVKGNYAAVLHPDGRLVVFGPSVFKIDPWVPGLNSLRFRNIPFTDNLLFSGQVLYGDGRLTAKNHQLV
MQGDCNLVLYGGKYGWQSNTHGNGEHCFLRLNHKGELIIKDDDFRPSGAAVPAPS

where each line after header line should not have more than 70 characters.

I will be thankful to you. Smilie
-smriti
Replace the print_selected function by :
Code:
function print_selected(    i,k,p,str) {
   if (selected) {
      k = substr(Header, 1, Key_len);
      for (i=1; i<=Keys[k]; i++) {
         printf("%s (%s-%s)\n", Header, From[k,i], To[k,i]);
         str = substr(Alphabets, From[k,i], Len[k,i]);
         p = 1
         while (p<=Len[k,i]) {
            print substr(str, p, 70);
            p += 70;
         }
      }
   }
}

jean-Pierre.
# 11  
Old 08-01-2008
Do you have a really old version of Perl? In Perl 5 regular expressions, ?! is a negative assertion lookahead; it checks that the text just ahead of the match does not match the given regular expression (in this case, end of line). I used that to avoid the problem I had with the sed script, which caused the first line to be wrapped one character earlier than subsequent ones.

You could try this as well:

Code:
perl -pe 'unless (m/^>/) { chomp if (length() % 70 == 1); s/(.{70})/$1\n/g; }'

As aigles suggests, if the output from the awk script is wrong, fixing the awk script is probably better, though.

Last edited by era; 08-01-2008 at 09:55 AM.. Reason: Rewrote script; previous version had a bug
# 12  
Old 08-01-2008
Thanks Jean and era

your code is giving the desired output. Thanks.

ya, I'll prefer to change awk script coz that makes my task easier.

I hope one day i'll learn to code better, like you guys. Smilie

-smriti
# 13  
Old 08-01-2008
can u help me one more time?

The newer version is not working in case the substring is smaller than 70 characters.

Actually FILE2 is ouput of some other program and the substring can be as small as 10 characters and as long as 300 characters or more..

for the same FILE1, FILE2 can be--
bi|37779709|geb|AAP20876.1| 28 264
bi|146403769|geb|ABQ32294.1| 27 224
bi|146403769|geb|ABQ32294.1| 20 34

in which the third row needs to extract only 14 characters.. Smilie

-smriti
# 14  
Old 08-01-2008
I got the solution

I tried it again and its running with the following FILE2

bi|37779709|geb|AAP20876.1| 28 264
bi|146403769|geb|ABQ32294.1| 27 224
bi|146403769|geb|ABQ32294.1| 20 34

BUT IF DIFFERENT ID LIES BETWEEN TWO SAME IDs

LIKE-
bi|146403769|geb|ABQ32294.1| 20 34
bi|37779709|geb|AAP20876.1| 28 264
bi|146403769|geb|ABQ32294.1| 27 224

THEN IT IS SKIPPED AND I GET THE RESULT ONLY FOR
bi|146403769|geb|ABQ32294.1| 20 34
bi|146403769|geb|ABQ32294.1| 27 224

AND NOT FOR
bi|37779709|geb|AAP20876.1| 28 264

but i think i can solve this by sorting FILE2 first and then using it.

Thanks
-smriti
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to fetch matched records from files between two different directory?

awk 'NR==FNR{arr;next} $0 in arr' /tmp/Data_mismatch.sh /prd/HK/ACCTCARD_20160115.txt edit by bakunin: seems that one CODE-tag got lost somewhere. i corrected that, but please check your posts more carefully. Thank you. (5 Replies)
Discussion started by: suresh_target
5 Replies

2. Shell Programming and Scripting

Separate records of a file on 2 types of records

Hi I am new to shell programming in unix Please if I can provide help. I have a file structure of a header record and "N" detail records. The header record will be the total number of detail records I need to split the file in 2: One for the header Another for all detail records Could... (1 Reply)
Discussion started by: jamcogar
1 Replies

3. Shell Programming and Scripting

Deleting duplicate records from file 1 if records from file 2 match

I have 2 files "File 1" is delimited by ";" and "File 2" is delimited by "|". File 1 below (3 record shown): Doc1;03/01/2012;New York;6 Main Street;Mr. Smith 1;Mr. Jones Doc2;03/01/2012;Syracuse;876 Broadway;John Davis;Barbara Lull Doc3;03/01/2012;Buffalo;779 Old Windy Road;Charles... (2 Replies)
Discussion started by: vestport
2 Replies

4. Shell Programming and Scripting

make the name of file and fetch few things from log file

Hello All, I am working on a script where I need to fetch the value from a log file and log file creates with different name but few thing are common DEV_INFOMGT161_MULTI_PTC_BLD01.Stage_All_to_stp2perf1.042312114644.log STP_12_02_01_00_RC01.Stage_stp-domain_to_stp2perf2.042312041739.log ... (2 Replies)
Discussion started by: anuragpgtgerman
2 Replies

5. UNIX for Dummies Questions & Answers

Grep specific records from a file of records that are separated by an empty line

Hi everyone. I am a newbie to Linux stuff. I have this kind of problem which couldn't solve alone. I have a text file with records separated by empty lines like this: ID: 20 Name: X Age: 19 ID: 21 Name: Z ID: 22 Email: xxx@yahoo.com Name: Y Age: 19 I want to grep records that... (4 Replies)
Discussion started by: Atrisa
4 Replies

6. Shell Programming and Scripting

How to sca a sequential file and fetch some substring data from it

Hi, I have a task where i need to scan second column of seuential file and fetch first 3 digits of that column For e.g. FOLLOWING IS THE SAMPLE FOR MY SEQUENTIAL FILE AU_ID ACCT_NUM CRNCY_CDE THHSBC001 30045678 THB THHSBC001 10154267 THB THHSBC001 ... (2 Replies)
Discussion started by: manmeet
2 Replies

7. Shell Programming and Scripting

how to scan a sequential file to fetch some of the records?

Hi I am working on a script which needs to scan a sequential file and fetch the row where 2nd column = 'HUB' Can any one help me with this... Thanks (1 Reply)
Discussion started by: manmeet
1 Replies

8. Shell Programming and Scripting

Fetch lines from a file matching column2 of another file

Hi guys, Please help me out in this problem. I have two files FILE1 abc-23 : 4529675 cde-42 : 9824532 dge-91 : 1245367 gre-45 : 9824532 fgr-76 : 4529675 FILE2 4529675 : Gal Glu house-2-be 9824532 : cat mouse 1245367 : sirf surf-2-beta where FILE2 is a static file with fixed... (5 Replies)
Discussion started by: smriti_shridhar
5 Replies

9. Shell Programming and Scripting

fetch substring from html code

hello mates. please help me out once again. i have a html file where i want to fetch out one value from the entire html-code sample html code: ..... <b>Amount:<b> 12345</div> ... now i only want to fetch the 12345 from the html document. how to i tell sed to get me the value from... (2 Replies)
Discussion started by: scarfake
2 Replies

10. Shell Programming and Scripting

Count No of Records in File without counting Header and Trailer Records

I have a flat file and need to count no of records in the file less the header and the trailer record. I would appreciate any and all asistance Thanks Hadi Lalani (2 Replies)
Discussion started by: guiguy
2 Replies
Login or Register to Ask a Question