Sponsored Content
Top Forums Shell Programming and Scripting Find key pattern and print selected lines for each record Post 302943862 by redse171 on Wednesday 13th of May 2015 11:11:30 AM
Old 05-13-2015
Find key pattern and print selected lines for each record

Hi,

I need help on a complicated file that I am working on. I wanted to extract important info from a very huge file. It is space delimited file. I have hundred thousands of records in this file. An example content of the inputfile as below:-

Code:
##
ID    Ser402             Old;         23 mins .
ACC   P669GM;
DAT   MAY-2014, the old episode.
TOS   Japanes Anime. one piece
TMA   Pirates; animation; cartoon.
POT   DownloadID=5445;
HEW   StreamID=792; watchop (eu).
HEW   AnotherOnlineID=823; narutowire (same).
COM   -@- Simple Comment: Ace died and Luffy is miserable. 
COM      None of his nakama was with him {SOV:000250}.
COM   -@- Full Comment: Host channel {SOV:000305}; Multi-chanel
COM      streaming {SOV:000305}.
COM   -@- Another Comment: Belongs to the same server.
COM      {SOV:000305}.
COM  -----------------------------------------------------------------------
COM   Can be watched online, see http://www.watchop.eu
DOR   Data; packet; -; Unknown; Anime.
DOR   TDP; TDP:0034; PPQ:host for sub channel; ASA:Subchannel.
DOR   TDP; TDP:0021; PPQ:internal channel; ASA:Unknown.
PPE   Torrent unapplicable;
KAW   Complete episode; Early release; Host channel;
KAW   Repeat; subchannel; subchannel host.
FEA   link          1    20         unavailable
FEA                                /F3184.
FEA   TOP_CHAN      1      1       unavailable (will be determined).
FEA   SUBCHAN       2      18      at 9 (confirmed!).
FEA   TOP_CHAN      19     117     unavailable (No info).
FEA   SUBCHAN       118    138     at 10 (confirmed!).
FEA   TOP_CHAN      139    145     unavailable (will be determined).
FEA   SUBCHAN       146    166     at 12 (confirmed!).
FEA   TOP_CHAN      167    269     unavailable (the source is unknown).
FEA   REP           1      146     A.
FEA   CAD           75     75      by host.
FEA                                {undetermined}.
SYN   synopsis for this episode is unavailable.
##
ID    MOV10               NewMov;         90 mins.
ACC   PPDFB1;
TOS   Japanes Anime. Naruto shippuden
TMA   Ninja; shinobi, konoha; hokage; Pain.
CC    Distributed under the Creative License
CC   -----------------------------------------------------------------------
DOR   Data; packet; -; Unknown; Anime movie.
DOR   movie; new movie; 90 mins only
DOR   MOVID; 299; -.
DOR   MOV3D; -; 1.
PPE   10; torrent
KAW   new movie; Complete movie.
FEA   Null         1    683        Unknown
FEA                                /F82.
FEA   mov       62    124       (SOV:005).
FEA   mov      155    259       (SOV:005).
FEA   mov      346    376       (SOV:025).
SYN   In this episode, Dresrossa has been surrounded by a cage known as birdcage by doflamingo.
      Luffy is moving towards the palace to defeat Doflamingo. 
##

All the records in this file are separated by “##”. What I need is an output that only shows the needed info based on matched patterns “ subchannel or subchannel host” in KAW line. In the example input, only the first records has this patterns. Then, the output should be like below:-

Code:
##
ID       Ser402
ACC	  P669GM
TOS     Japanes Anime. one piece
TMA     Pirates; animation; cartoon.
COM    -@- Full Comment: Host channel {SOV:000305}; Multi-chanel
COM       streaming {SOV:000305}.
DOR     TDP; TDP:0034; PPQ:host for sub channel; ASA:Subchannel.
DOR     TDP; TDP:0021; PPQ:internal channel; ASA:Unknown.
KAW     Complete episode; Early release; Host channel;
KAW     Repeat; subchannel; subchannel host.
FEA      link          1    20         unavailable
FEA                                /F3184.
FEA      TOP_CHAN     1      1       unavailable (will be determined).
FEA      SUBCHAN       2      18      at 9 (confirmed!).
FEA      TOP_CHAN     19     117     unavailable (No info).
FEA      SUBCHAN       118    138     at 10 (confirmed!).
FEA      TOP_CHAN      139    145     unavailable (will be determined).
FEA      SUBCHAN        146    166     at 12 (confirmed!).
FEA      TOP_CHAN      167    269     unavailable (the source is unknown).
FEA      REP                    1      146     A.
FEA      CAD                   75     75      by host.
FEA                                                   {undetermined}.
TT        3
##

As shown above, for line starts with COM, I just want the one with -@-Full Comment and another COM line following it, if any (bold in blue color). I also need to print line DOR followed by TDP only (bold in red color). While, In the last line, there should be a new line created named as “TT” and the value following it is the total number of the occurrences of pattern “FEA SUBCHAN”.

I don't have any idea how to print only selected lines there. I used below codes to find the key pattern. But it will only print all the lines for the matched records. I just need selected lines as shown in the sample output above.

Code:
awk '/##/{if(l)print s;l=0;s=$0;next}/subchannel/{l=1}{s=s RS $0}END{if(l)print s}' inputfile

would appreciate your kind help. Thanks.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

print selected lines

Hi everybody: I try to print in new file selected lines from another file wich depends on the first column. I have done a script like this: lines=( "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "21" "31" "41" "51" "55" "57" "58" ) ${lines} for lines in ${lines} do awk -v ... (6 Replies)
Discussion started by: tonet
6 Replies

2. Shell Programming and Scripting

Grep for a pattern and print entire record

Hi friends, This is my very first post on forum, so kindly excuse if my doubts are found too silly. I am trying to automate a piece of routine work and this is where I am stuck at the moment-I need to grep a particular ID through a file containing many records(which start with <LRECORD> and end... (6 Replies)
Discussion started by: faiz1985
6 Replies

3. Shell Programming and Scripting

trying to print selected fields of selected lines by AWK

I am trying to print 1st, 2nd, 13th and 14th fields of a file of line numbers from 29 to 10029. I dont know how to put this in one code. Currently I am removing the selected lines by awk 'NR==29,NR==10029' File1 > File2 and then doing awk '{print $1, $2, $13, $14}' File2 > File3 Can... (3 Replies)
Discussion started by: ananyob
3 Replies

4. Shell Programming and Scripting

Print selected lines from file in order

I need to extract selected lines from a log file, I can use grep to pull one line matching 'x' or matching 'y', how can I run through the log printing both matching lines in order top to bottom. i.e line 1 xyz - not needed line 2 User01 - needed line 3 123 - not needed line 4 Info - needed... (2 Replies)
Discussion started by: rosslm
2 Replies

5. Shell Programming and Scripting

Help with print out all relevant record if match particular pattern

Input file: data100_content1 420 700 data101_content1 107 516 data101_content2 194 773 data101_content3 195 917 data104_content2 36 325 data105_content1 505 605 data106_content1 291 565 ... (7 Replies)
Discussion started by: perl_beginner
7 Replies

6. Shell Programming and Scripting

awk to print record not equal specific pattern

how to use "awk" to print any record has pattern not equal ? for example my file has 5 records & I need to get all lines which $1=10 or 20 , $2=10 or 20 and $3 greater than "130302" as it shown : 10 20 1303252348212B030 20 10 1303242348212B030 40 34 1303252348212B030 10 20 ... (14 Replies)
Discussion started by: arm
14 Replies

7. Shell Programming and Scripting

Gawk Find Pattern Print Lines Before and After

Using grep I can easily use: cvs log |grep -iB 10 -A 10 'date: 2013-10-30' to display search results and 10 lines before and after. How can this be accompished using gawk? (4 Replies)
Discussion started by: metallica1973
4 Replies

8. Shell Programming and Scripting

Shell Script @ Find a key word and If the key word matches then replace next 7 lines only

Hi All, I have a XML file which is looks like as below. <<please see the attachment >> <?xml version="1.0" encoding="UTF-8"?> <esites> <esite> <name>XXX.com</name> <storeId>10001</storeId> <module> ... (4 Replies)
Discussion started by: Rajeev_hbk
4 Replies

9. Shell Programming and Scripting

Help with print out record if first and next line follow specific pattern

Input file: pattern1 100 250 US pattern2 50 3050 UK pattern3 100 250 US pattern1 70 1050 UK pattern1 170 450 Mal pattern2 40 750 UK . . Desired Output file: pattern1 100 250 US pattern2 50 3050 UK pattern1 170 450 Mal pattern2... (3 Replies)
Discussion started by: cpp_beginner
3 Replies

10. Shell Programming and Scripting

sed -- Find pattern -- print remainder -- plus lines up to pattern -- Minus pattern

The intended result should be : PDF converters 'empty line' gpdftext and pdftotext?xml version="1.0"?> xml:space="preserve"><note-content version="0.1" xmlns:/tomboy/link" xmlns:size="http://beatniksoftware.com/tomboy/size">PDF converters gpdftext and pdftotext</note-content>... (9 Replies)
Discussion started by: Klasform
9 Replies
All times are GMT -4. The time now is 05:21 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy