Blocks of text in a file - extract when matches...


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Blocks of text in a file - extract when matches...
# 1  
Old 05-26-2014
Blocks of text in a file - extract when matches...

I sat down yesterday to write this script and have just realised that my methodology is broken........

In essense I have.....
Code:
-----------------------------------------------------------------  (This line really is in the file)               
                     Service ID: 12345                
                        Event ID: 67890               
                      start_date: 0xdde8 21:00:00 (Sat May 31 22:00:00 2014)                
                        duration:  01:00:00                
                             name:     Any old name info could be in here                
                                text:        Extended information description......        
                 Content Type:  Specific set of 10 different flag types            
                    Event CRID:  /XXYYZZ 
----------------------------------------------------------------              
                      Service ID: 54321                
                         Event ID: 09876               
                       start_date: 0xdde9 20:00:00 (Sun Jun 1 21:00:00 2014)                
                         duration:  02:00:00                
                              name:     Any old name info could be in here                
                                 text:        Extended information description......        
                  Content Type:  Specific set of 10 different flag types
                     Event CRID:  /YYZZXX 
---------------------------------------------------------------------------------

Notes....there can be other new fields introduced from the source witout notice and with no control over naming.

And so on repeated up to a couple of thousand times.

What I want to do is the following..

Match against the Service ID variables (specific values, got from different external source) and partial match against the name for specific words.....

So I had coded this as a:-
Code:
cat /imputfile.txt | while read LINEVAR ; do  

if [[ $LINEVAR == "Service ID:"* ]] && [[  $LINEVAR ==  *$VARIABLE1*  ||  $LINEVAR == *$VARIABLE2*  ]]; then  

echo "Found a match........"  

do some stuff  

done

I've just realised my flawed thinking.......I need the data from the:-

start_date
duration
name
text
event crid

Lines to be able to complete my data process and of course as I'm reading this on a LINE by LINE basis I can't then identify the correct subsequent fields in the source file....basically I'm a twit!!

So I have to come up with a different method but I'm having a brain f*rt and can't think, partly because I don't do a lot of bash so syntax always has to be re-looked up.

Ideas?

Moderator's Comments:
Mod Comment Please use code tags next time for your code and data. Thanks

Last edited by vbe; 05-26-2014 at 11:02 AM.. Reason: code tags
# 2  
Old 05-26-2014
Using IFS=":" (after having save OLD IFS value of course...)
You could read your line as 2 variables! say VAR1 and VAR2
You can then test if
[ $VAR1 = "Service ID" ], then save VAR2 etc...
# 3  
Old 05-26-2014
Try this and adapt/extend:
Code:
awk -vRS="-------" '/Service ID: (54321|12345)/ {match ($0, /start_date[^\n]*/); print substr ($0, RSTART, RLENGTH)} ' file
start_date: 0xdde8 21:00:00 (Sat May 31 22:00:00 2014)                
start_date: 0xdde9 20:00:00 (Sun Jun 1 21:00:00 2014)

# 4  
Old 05-29-2014
Quote:
Originally Posted by RudiC
Try this and adapt/extend:
Code:
awk -vRS="-------" '/Service ID: (54321|12345)/ {match ($0, /start_date[^\n]*/); print substr ($0, RSTART, RLENGTH)} ' file
start_date: 0xdde8 21:00:00 (Sat May 31 22:00:00 2014)                
start_date: 0xdde9 20:00:00 (Sun Jun 1 21:00:00 2014)

This is almost there, but how do I do it if the Service ID values are variables? and I want to extract multiple lines in a single awk statement, would the following work?

Code:
awk -vRS="-------" '/Service ID: ($VARIABLE1|$VARIABLE2)/ {match ($0, /start_date[^\n]*/); print substr ($0, RSTART, RLENGTH)} {match ($0,/duration[^\n]*/); print substr ($0, RSTART, RLENGTH)}' input.file

I've tried various formats of using single or double quotes to get the $VARIABLEx values to read properly but my lack of familiarity with awk syntax precision means I'm going round in circles a little....when I get the $VARIABLEx values correct it causes syntax problems later in the line.......
# 5  
Old 05-29-2014
You can pass the shell variables in using -v.

Like:
Code:
$ export SERVICEID=54321; awk -vRS="-------" -vVAR1=$SERVICEID '$0 ~ "Service ID: ("VAR1")" {match ($0, /start_date[^\n]*/); print substr ($0, RSTART, RLENGTH)} ' input.txt
start_date: 0xdde9 20:00:00 (Sun Jun 1 21:00:00 2014)

# 6  
Old 05-29-2014
Quote:
Originally Posted by CarloM
You can pass the shell variables in using -v.

Like:
Code:
$ export SERVICEID=54321; awk -vRS="-------" -vVAR1=$SERVICEID '$0 ~ "Service ID: ("VAR1")" {match ($0, /start_date[^\n]*/); print substr ($0, RSTART, RLENGTH)} ' input.txt
start_date: 0xdde9 20:00:00 (Sun Jun 1 21:00:00 2014)

Carlo

I don't think you've taken fully onboard what I'm trying to do....

I'm search for the Text Phrase "Service ID:" AND then the $VARIABLEx value (not Service ID=value).

Service ID: effectively marks the start of a block that ends with ---------, the $VARIABLEx value (more then one value is valid for $VARIABLEx) then determines if it's a valid match.

If it is I want to extract start_date, duration and other lines from that text block......that's why I gave the example I gave in my reply and asked how I can get multiple extractions in one awk line when I get a match.

Hope that makes it clearer.
# 7  
Old 05-30-2014
It's not supposed to be a perfect solution - it's an example on how to use shell variables in awk scripts.

If you want to use more than one variable then you need to pass them in individually and extend the regex.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Match text to lines in a file, iterate backwards until text or text substring matches, print to file

hi all, trying this using shell/bash with sed/awk/grep I have two files, one containing one column, the other containing multiple columns (comma delimited). file1.txt abc12345 def12345 ghi54321 ... file2.txt abc1,text1,texta abc,text2,textb def123,text3,textc gh,text4,textd... (6 Replies)
Discussion started by: shogun1970
6 Replies

2. Shell Programming and Scripting

Extract all the sentences from a text file that matches a pattern list

Hi I have a big text file. I want to extract all the sentences that matches at least 70% (seventy percent) of the words from each sentence based on a word list called A. Say the format of the text file is as given below: This is the first sentence which consists of fifteen words... (4 Replies)
Discussion started by: my_Perl
4 Replies

3. Shell Programming and Scripting

How to remove duplicate text blocks from a file?

Hi All I have a list of files which will have duplicate list of blocks of text. Following is a sample of the file, I have removed the sensitive information from the file. All the code samples starts from <TR BGCOLOR="white"> and Ends with IP address and two html tags like this. 10.14.22.22... (3 Replies)
Discussion started by: mahasona
3 Replies

4. Shell Programming and Scripting

Adding and removing blocks of text from file

Hello all, short story: I'm writing a script to add and remove dns records in dns files. Its on a RHEL 5.5 So far i've locked up the basic operations in a couple of functions: - validate the parameters - search for existant ip in file when adding - search for existant name records in... (6 Replies)
Discussion started by: maverick72
6 Replies

5. Shell Programming and Scripting

Extract sequences of bytes from binary for differents blocks

Hello to all, I would like to search sequences of bytes inside big binary file. The bin file contains blocks of information, each block begins is estructured as follow: 1- Each block begins with the hex 32 (1 byte) and ends with FF. After the FF of the last block, it follows 33. 2- Next... (59 Replies)
Discussion started by: Ophiuchus
59 Replies

6. Shell Programming and Scripting

how to split this file into blocks and then send these blocks as input to the tool called Yices?

Hello, I have a file like this: FILE.TXT: (define argc :: int) (assert ( > argc 1)) (assert ( = argc 1)) <check> # (define c :: float) (assert ( > c 0)) (assert ( = c 0)) <check> # now, i want to separate each block('#' is the delimeter), make them separate files, and then send them as... (5 Replies)
Discussion started by: paramad
5 Replies

7. Shell Programming and Scripting

extract blocks of text from a file

Hi, This is part of a large text file I need to separate out. I'd like some help to build a shell script that will extract the text between sets of dashed lines, write that to a new file using the whole or part of the first text string as the new file name, then move on to the next one and... (7 Replies)
Discussion started by: cajunfries
7 Replies

8. Shell Programming and Scripting

Extract sequence blocks

Hi, I have an one-line file consisting of a sequence of 660 letters. I would like to extract 9-letter blocks iteratively: ASDFGHJKLQWERTYUIOPZXCVBNM first block: ASDFGHJKL 1nd block: SDFGHJKLQ What I have so far only gives me the first block, can anyone please explain why? cat... (7 Replies)
Discussion started by: solli
7 Replies

9. Shell Programming and Scripting

Delete blocks of lines from text file

Hello, Hello Firends, I have file like below. I want to remove selected blocks say abc,pqr,lst. how can i remove those blocks from file. zone abc { blah blah blah } zone xyz { blah blah blah } zone pqr { blah blah blah } (4 Replies)
Discussion started by: nrbhole
4 Replies

10. Shell Programming and Scripting

Extract if pattern matches

Hi All, I have an input below. I tried to use the awk below but it seems that it ;s not working. Can anybody help ? My concept here is to find the 2nd field of the last occurrence of such pattern " ** XXX ccc ccc cc cc ccc 2007 " . In this case, the 2nd field is " XXX ". With this "XXX" term... (20 Replies)
Discussion started by: Raynon
20 Replies
Login or Register to Ask a Question