The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Moving a part of the text in a file srikanthgoodboy Shell Programming and Scripting 6 05-04-2009 11:58 AM
awk, perl Script for processing a single line text file hmsadiq Shell Programming and Scripting 1 04-12-2009 04:44 PM
Need help to modify perl script: Text file with line and more than 1 space srsahu75 Shell Programming and Scripting 3 03-20-2009 05:28 PM
Shell script to search for text in a file and copy file imeadows UNIX for Dummies Questions & Answers 9 11-12-2008 09:12 PM
Perl script to load text file into DB field aristegui Shell Programming and Scripting 2 09-15-2008 04:55 AM

Reply
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 4 Weeks Ago
asandy1234 asandy1234 is offline
Registered User
  
 

Join Date: Oct 2009
Posts: 14
Perl or Awk script to copy a part of text file.

Hi Gurus,
I'm a total newbie to Perl and Awk scripting. Let me explain the scenario, there is a DB2 table with 5 columns and one of the column is a CLOB datatype containing XML. We need all the 4 columns but only a portion of string from the XML column.
We decided to export DB2 table to a .del file and process it using Perl or Awk script. I need a script to process the .del file so that I have column1, column2, column3 and in column 4 which is XML, we just need the string which is in between <text> and </text> (there may be multiple occurrence of this so they can be seperated by number) plus the column 5.
I know it will be piece of cake for the experts.

Thanks,
  #2 (permalink)  
Old 4 Weeks Ago
frans's Avatar
frans frans is offline
Registered User
  
 

Join Date: Oct 2009
Location: Drôme, France
Posts: 100
using grep

try this
Code:
I=0
while read LINE
do
    TEXT[$I]=$(echo $LINE | grep -o '<text>.*</text>' | sed -e 's/<text>//' -e 's/<\/text>//'
    (( I ++ ))
done < $FILENAME
That creates an array where each item is the text contained between <text> and </text>.
Hope this helps
  #3 (permalink)  
Old 4 Weeks Ago
asandy1234 asandy1234 is offline
Registered User
  
 

Join Date: Oct 2009
Posts: 14
Quote:
Originally Posted by frans View Post
try this
Code:
I=0
while read LINE
do
    TEXT[$I]=$(echo $LINE | grep -o '<text>.*</text>' | sed -e 's/<text>//' -e 's/<\/text>//'
    (( I ++ ))
done < $FILENAME
That creates an array where each item is the text contained between <text> and </text>.
Hope this helps
Thanks Frans for your prompt reply but I've a question which may sound stupid to you but still.
1) What is the extension of the file I should save? Is it .awk or .ksh?
2) Where do I replace the input and output file name in the code?

Thanks,
  #4 (permalink)  
Old 4 Weeks Ago
frans's Avatar
frans frans is offline
Registered User
  
 

Join Date: Oct 2009
Location: Drôme, France
Posts: 100
Quote:
Originally Posted by asandy1234 View Post
Thanks Frans for your prompt reply but I've a question which may sound stupid to you but still.
1) What is the extension of the file I should save? Is it .awk or .ksh?
2) Where do I replace the input and output file name in the code?

Thanks,
It's bash scripting so on the first line of the script you write #!/bin/bash If the path of your shell is /bin/bash, of course.
The full code :
Code:
#!/bin/bash
while read LINE
do
    echo $LINE | grep -o '<text>.*</text>' | sed -e 's/<text>//' -e 's/<\/text>//'
    (( I ++ ))
done < INPUTFILE > OUTPUTFILE
You could use variables for the input and output files if they have to be re-used later, else just replace 'INPUTFILE' and 'OUTPUTFILE' by your own file names.

P.S. no matter the extension ! make it executable with chmod +x and go.

Last edited by frans; 4 Weeks Ago at 05:52 PM.. Reason: P.S.
  #5 (permalink)  
Old 3 Weeks Ago
asandy1234 asandy1234 is offline
Registered User
  
 

Join Date: Oct 2009
Posts: 14
Thank you very much Frans, I'll test and let you know.

---------- Post updated at 12:46 PM ---------- Previous update was at 12:05 PM ----------

Hi Frans,
There is no directory #!/bin/bash in the unix box, but there is directory for #/usr/bin/bsh. Are these 2 same? I tried to run the script but I get an error as below

grep: Not a recognized flag: o
Usage: grep [-r] [-R] [-H] [-L] [-E|-F] [-c|-l|-q] [-insvxbhwy] [-p[parasep]] -e pattern_list...
[-f pattern_file...] [file...]
Please clarify.

Thanks.
  #6 (permalink)  
Old 3 Weeks Ago
frans's Avatar
frans frans is offline
Registered User
  
 

Join Date: Oct 2009
Location: Drôme, France
Posts: 100
bsh is Bourne SHell, bash is Bourne Again SHell, so the options of the grep command doesn't seem to be the same.
the -o option tells grep to output only the matching part of the line, it's helpful.
Tell me what's the output when you use grep without option.
I believe that there's only one occurence of the <text>....</text> in each line.
To go faster in trying, don't redirect to the output file ( > OUTPUTFILE ) so you directly see what happens.
a possibility is to use the extract like
Code:
LINE="lbla bla bla jcjfd<text>what i want</text>flh%(j blablabla" # for testing
LINE=$(echo $LINE | grep <text>*</text>) # Returns every line containing the match.
echo $LINE
LINE=${LINE##*<text>} # Deletes the matching from the beginning
echo $LINE
LINE=${LINE%%</text>*} # Deletes the matching from the end
echo $LINE
and see what happens
  #7 (permalink)  
Old 3 Weeks Ago
asandy1234 asandy1234 is offline
Registered User
  
 

Join Date: Oct 2009
Posts: 14
Quote:
Originally Posted by frans View Post
bsh is Bourne SHell, bash is Bourne Again SHell, so the options of the grep command doesn't seem to be the same.
the -o option tells grep to output only the matching part of the line, it's helpful.
Tell me what's the output when you use grep without option.
I believe that there's only one occurence of the <text>....</text> in each line.
To go faster in trying, don't redirect to the output file ( > OUTPUTFILE ) so you directly see what happens.
a possibility is to use the extract like
Code:
LINE="lbla bla bla jcjfd<text>what i want</text>flh%(j blablabla" # for testing
LINE=$(echo $LINE | grep <text>*</text>) # Returns every line containing the match.
echo $LINE
LINE=${LINE##*<text>} # Deletes the matching from the beginning
echo $LINE
LINE=${LINE%%</text>*} # Deletes the matching from the end
echo $LINE
and see what happens
No there are multiple occurence of this pattern in the record
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 06:58 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0