Sponsored Content
Top Forums Shell Programming and Scripting how to fetch substring from records into another file Post 302220690 by aigles on Friday 1st of August 2008 11:57:08 AM
Old 08-01-2008
The order of the output correspond to the order of headers in file1.
The logic is :
For each record in file 1
Proceed extractions specified in file 2 relative to this record
The script:
Code:
awk '

NR==FNR {
   if (NR==1)
      Key_len = length($1) + 1;
   k = ">" substr($0,1, Key_len-1);
   n = ++Keys[k];
   From[k,n] = $2;
     To[k,n] = $3;
    Len[k,n] = $3 - $2;
   next;
}

function print_selected(    i,k,p,str) {
   if (selected) {
      k = substr(Header, 1, Key_len);
      for (i=1; i<=Keys[k]; i++) {
         printf("%s (%s-%s)\n", Header, From[k,i], To[k,i]);
         str = substr(Alphabets, From[k,i], Len[k,i]);
         p = 1
         while (p<=Len[k,i]) {
            print substr(str, p, 70);
            p += 70;
         }
      }
   }
}

/^>/ {
   print_selected();
   selected  = (substr($0, 1, Key_len) in Keys);
   Header    = $0;
   Alphabets = "";
   next;
}

selected {
   Alphabets = Alphabets $0;
}

END {
   print_selected();
}
' sm2.dat sm1.dat

Input file 1 (sm1.dat):
Code:
>bi|2138271|geb|AAC15885.1|precursor [Sambucus nigra]
MRVIAAAMLYLYIVVLAICSVGIQGIDYPSVSFNLAGAKSATWDFLRMPHDLVGEDNKYNDGEPITGNII
GRDGLCVDVRNGYDTDGTPLQLWPCGTQRNQQWTFYTDDTIRSMGKCMTANGLSNGSNIMIFNCSTAVEN
AIKWEVTIDGSIINPSSG
>bi|21083|em|CAA26939.1| precursor [Ricinus communis]
MKPGGNTIVIWMYAVATWLCFGSTSGWSFTLEDNNIFPKQYPIINFTTAGATVQSYTNFIRAVRGRLTTG
ADVRHEIPVLPNRVGLPINQRFILVELSNHAELSVTLALDVTNAYVVGYRAGNSAYFFHPDNQEDAEAIT
HLFTDVQNRYTFAFGGNYDRLEQLAGNLRENIELGNGPLEEAISALYYYSTGGTQLPTL
>bi|19526601|geb|AAL87006.1| chain A [Viscum album]
YERLRLRVTHQTTGEEYFRFITLLRDYVSSGSFSNEIPLLRQSTIPVSDAQRFVLVELTNEGGDSITAAI
DVTNLYVVAYQAGDQSYFLRDAPRGAETHLFTGTTRSSLPFNGSYPDLERYAGHRDQIPLGIDQLIQSVT
ALRFPGGNTRTQARSILILIQMISEAARFNPILWRARQYINSGASFLPDVY

Input file 2 (sm2.dat) :
Code:
bi|2138271|geb|AAC15885 92      110
bi|19526601|geb|AAL8700 74      92
bi|2138271|geb|AAC15885 20      132
bi|21083|em|CAA26939.1| 19      37
bi|21083|em|CAA26939.1| 52      70
bi|2138271|geb|AAC15885 26      38

Output :
Code:
>bi|2138271|geb|AAC15885.1|precursor [Sambucus nigra] (92-110)
LWPCGTQRNQQWTFYTDD
>bi|2138271|geb|AAC15885.1|precursor [Sambucus nigra] (20-132)
SVGIQGIDYPSVSFNLAGAKSATWDFLRMPHDLVGEDNKYNDGEPITGNIIGRDGLCVDVRNGYDTDGTP
LQLWPCGTQRNQQWTFYTDDTIRSMGKCMTANGLSNGSNIMI
>bi|2138271|geb|AAC15885.1|precursor [Sambucus nigra] (26-38)
IDYPSVSFNLAG
>bi|21083|em|CAA26939.1| precursor [Ricinus communis] (19-37)
LCFGSTSGWSFTLEDNNI
>bi|21083|em|CAA26939.1| precursor [Ricinus communis] (52-70)
TVQSYTNFIRAVRGRLTT
>bi|19526601|geb|AAL87006.1| chain A [Viscum album] (74-92)
NLYVVAYQAGDQSYFLRD

Jean-Pierre.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Count No of Records in File without counting Header and Trailer Records

I have a flat file and need to count no of records in the file less the header and the trailer record. I would appreciate any and all asistance Thanks Hadi Lalani (2 Replies)
Discussion started by: guiguy
2 Replies

2. Shell Programming and Scripting

fetch substring from html code

hello mates. please help me out once again. i have a html file where i want to fetch out one value from the entire html-code sample html code: ..... <b>Amount:<b> 12345</div> ... now i only want to fetch the 12345 from the html document. how to i tell sed to get me the value from... (2 Replies)
Discussion started by: scarfake
2 Replies

3. Shell Programming and Scripting

Fetch lines from a file matching column2 of another file

Hi guys, Please help me out in this problem. I have two files FILE1 abc-23 : 4529675 cde-42 : 9824532 dge-91 : 1245367 gre-45 : 9824532 fgr-76 : 4529675 FILE2 4529675 : Gal Glu house-2-be 9824532 : cat mouse 1245367 : sirf surf-2-beta where FILE2 is a static file with fixed... (5 Replies)
Discussion started by: smriti_shridhar
5 Replies

4. Shell Programming and Scripting

how to scan a sequential file to fetch some of the records?

Hi I am working on a script which needs to scan a sequential file and fetch the row where 2nd column = 'HUB' Can any one help me with this... Thanks (1 Reply)
Discussion started by: manmeet
1 Replies

5. Shell Programming and Scripting

How to sca a sequential file and fetch some substring data from it

Hi, I have a task where i need to scan second column of seuential file and fetch first 3 digits of that column For e.g. FOLLOWING IS THE SAMPLE FOR MY SEQUENTIAL FILE AU_ID ACCT_NUM CRNCY_CDE THHSBC001 30045678 THB THHSBC001 10154267 THB THHSBC001 ... (2 Replies)
Discussion started by: manmeet
2 Replies

6. UNIX for Dummies Questions & Answers

Grep specific records from a file of records that are separated by an empty line

Hi everyone. I am a newbie to Linux stuff. I have this kind of problem which couldn't solve alone. I have a text file with records separated by empty lines like this: ID: 20 Name: X Age: 19 ID: 21 Name: Z ID: 22 Email: xxx@yahoo.com Name: Y Age: 19 I want to grep records that... (4 Replies)
Discussion started by: Atrisa
4 Replies

7. Shell Programming and Scripting

make the name of file and fetch few things from log file

Hello All, I am working on a script where I need to fetch the value from a log file and log file creates with different name but few thing are common DEV_INFOMGT161_MULTI_PTC_BLD01.Stage_All_to_stp2perf1.042312114644.log STP_12_02_01_00_RC01.Stage_stp-domain_to_stp2perf2.042312041739.log ... (2 Replies)
Discussion started by: anuragpgtgerman
2 Replies

8. Shell Programming and Scripting

Deleting duplicate records from file 1 if records from file 2 match

I have 2 files "File 1" is delimited by ";" and "File 2" is delimited by "|". File 1 below (3 record shown): Doc1;03/01/2012;New York;6 Main Street;Mr. Smith 1;Mr. Jones Doc2;03/01/2012;Syracuse;876 Broadway;John Davis;Barbara Lull Doc3;03/01/2012;Buffalo;779 Old Windy Road;Charles... (2 Replies)
Discussion started by: vestport
2 Replies

9. Shell Programming and Scripting

Separate records of a file on 2 types of records

Hi I am new to shell programming in unix Please if I can provide help. I have a file structure of a header record and "N" detail records. The header record will be the total number of detail records I need to split the file in 2: One for the header Another for all detail records Could... (1 Reply)
Discussion started by: jamcogar
1 Replies

10. Shell Programming and Scripting

How to fetch matched records from files between two different directory?

awk 'NR==FNR{arr;next} $0 in arr' /tmp/Data_mismatch.sh /prd/HK/ACCTCARD_20160115.txt edit by bakunin: seems that one CODE-tag got lost somewhere. i corrected that, but please check your posts more carefully. Thank you. (5 Replies)
Discussion started by: suresh_target
5 Replies
ICONV_SUBSTR(3) 							 1							   ICONV_SUBSTR(3)

iconv_substr - Cut out part of a string

SYNOPSIS
string iconv_substr (string $str, int $offset, [int $length = iconv_strlen($str, $charset)], [string $charset = ini_get("iconv.inter- nal_encoding")]) DESCRIPTION
Cuts a portion of $str specified by the $offset and $length parameters. PARAMETERS
o $str - The original string. o $offset - If $offset is non-negative, iconv_substr(3) cuts the portion out of $str beginning at $offset'th character, counting from zero. If $offset is negative, iconv_substr(3) cuts out the portion beginning at the position, $offset characters away from the end of $str. o $length - If $length is given and is positive, the return value will contain at most $length characters of the portion that begins at $offset (depending on the length of $string). If negative $length is passed, iconv_substr(3) cuts the portion out of $str from the $offset'th character up to the character that is $length characters away from the end of the string. In case $offset is also negative, the start position is calculated beforehand according to the rule explained above. o $charset - If $charset parameter is omitted, $string are assumed to be encoded in iconv.internal_encoding. Note that $offset and $length parameters are always deemed to represent offsets that are calculated on the basis of the character set determined by $charset, whilst the counterpart substr(3) always takes these for byte offsets. RETURN VALUES
Returns the portion of $str specified by the $offset and $length parameters. If $str is shorter than $offset characters long, FALSE will be returned. SEE ALSO
substr(3), mb_substr(3), mb_strcut(3). PHP Documentation Group ICONV_SUBSTR(3)
All times are GMT -4. The time now is 02:41 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy