Parsing blocked text


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Parsing blocked text
# 1  
Old 05-02-2014
Parsing blocked text

I do have a flat text file that are divided into blocks. Each block is demimited by '='. I would like to parse certain numbers and letters.

This is the format of the file I have. It has thousands of such blocks

Code:
>A B 1, 100
TTTT	100 95
>C D 1, 95
GHJKL
=
>A B 1, 72
GHUJKLO 72 84
>C D 1, 84
HJKLIGN
=

I would like to parse each value within each blocks:
block number (starts from 1, 2....), grep the 4th value in the first line, grep the 1st word in the second line, count of the letters in that word, grep the 4th value in the third line, grep the 1st word in the 4th line, count of the letters in the word.

Code:
1 100 TTTT 4 95 GHJKL 5
2 72 GHUJKLO 7 84 HJKLIGN 7

It would be great if you could help me to format this data using either awk or sed.
# 2  
Old 05-02-2014
Quote:
Originally Posted by Kanja
I would like to parse each value within each blocks:
block number (starts from 1, 2....), grep the 4th value in the first line, grep the 1st word in the second line, count of the letters in that word, grep the 4th value in the third line, grep the 1st word in the 4th line, count of the letters in the word.
The following is a solution in shell. It is not optimized for speed but probably the easiest to understand. You should try to optimize it yourself once you understand the logic and how things work. You can even port the logic to "awk" (sed will have a hard time incrementing a counter, although it is possible):

Code:
#! /bin/ksh
typeset -i iCnt=1
typeset    ach1stLine[1]=""
typeset    ach1stLine[2]=""
typeset    ach1stLine[3]=""
typeset    ach1stLine[4]=""

typeset    ach2ndLine[1]=""

typeset    ach4thLine[1]=""

while : ; do
     # read 4 lines, split into words
     read ach1stLine[1] ach1stLine[2] ach1stLine[3] ach1stLine[4] junk
     read ach2ndLine[1] junk
     read junk
     read ach4thLine[1] junk

     print - "$iCnt ${ach1stLine[4]} ${ach2ndLine[1]} ${#ach2ndLine[1]} ${ach4thLine[1]} ${#ach4thLine[1]}"
     (( iCnt += 1 ))

     if ! read junk ; then       # read the fifth line with the separator, exit on EOF
          break
     fi
done < /path/to/inputfile
exit 0

I hope this helps.

bakunin

Last edited by bakunin; 05-02-2014 at 04:25 PM..
# 3  
Old 05-02-2014
my awk solution:
Code:
mute@thedoctor:~$ ./script input
1 100 TTTT 4 95 GHJKL 5
2 72 GHUJKLO 7 84 HJKLIGN 7

Code:
mute@thedoctor:~$ cat script
#!/usr/bin/awk -f

/^=$/ { ++blk;c=0;printf RS;next }
/^>/ { printf "%s %d", c?"":1+blk, $4;c=1; next }
{ printf " %s %d", $1, length($1) }

# 4  
Old 05-02-2014
try also:
Code:
awk '$1=$1 {print NR, $4, $5, length($5), $7, $12, length($12)}' RS="=" in

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Text parsing

Hi All! Is it possible to convert text file: to: ? (6 Replies)
Discussion started by: y77
6 Replies

2. Shell Programming and Scripting

Parsing text file

Hi Friends, I am back for the second round today - :D My input text file is this way Home friends friendship meter Tools Mirrors Downloads My Data About Us Help My own results BLAT Search Results ACTIONS QUERY SCORE START END QSIZE IDENTITY CHRO STRAND ... (7 Replies)
Discussion started by: jacobs.smith
7 Replies

3. Shell Programming and Scripting

Parsing text file

I'm totally stumped with how to handle this huge text file I'm trying to deal with. I really need some help! Here is what is looks like: ab1ba67c331a3d731396322fad8dd71a3b627f89359827697645c806091c40b9 0.2 812a3c3684310045f1cb3157bf5eebc4379804e98c82b56f3944564e7bf5dab5 0.6 0.6... (3 Replies)
Discussion started by: comp8765
3 Replies

4. Programming

Parsing a Text file using C++

I was trying to parse the text file, which will looks like this ###XYZABC#### ############ int = 4 char = 1 float = 1 . . ############ like this my text file will contains lots of entries and I need to store these entries in the map eg. map.first = int and map.second = 4 same way I... (5 Replies)
Discussion started by: agupta2
5 Replies

5. Shell Programming and Scripting

Help with text/number parsing

Hello I have a file that contains 10 rows as below: "ID" "DP" "ID=GRMZM2G015073_T01" "23.6044288292005" "ID=GRMZM2G119852_T01" "59.7782287606723" "ID=GRMZM2G100242_T02" "61.4167813736184" "ID=GRMZM2G046274_T01" "6.63061838134219" "ID=GRMZM2G046274_T02" ... (5 Replies)
Discussion started by: cs_novice
5 Replies

6. Shell Programming and Scripting

Need help parsing a text file

I have a text file: router1#sh ip blah blah | incl --- Gi2/8 10.60.4.181 --- 10.60.123.175 11 0000 0000 355K Gi2/8 10.60.83.28 --- 224.10.10.26 11 F9FF 3840 154K Gi2/8 10.60.83.198 --- ... (1 Reply)
Discussion started by: streetfighter2
1 Replies

7. Shell Programming and Scripting

Parsing text

Hello all, I have some text formatted as follows Name: John doe Company: Address 1: 7 times the headache Address 2: City: my city State/Province: confusion Zip/Postalcode: 12345 and I'm trying to figure out how I could extract the data after the colon so that the result would be ... (6 Replies)
Discussion started by: mcgrailm
6 Replies

8. Shell Programming and Scripting

Parsing text from file

Any ideas? 1)loop through text file 2)extract everything between SOL and EOL 3)output files, for example: 123.txt and 124.txt for the file below So far I have: sed -n "/SOL/,/EOL/{p;/EOL/q;}" file Here is an example of my text file. SOL-123.go something goes here something goes... (0 Replies)
Discussion started by: ndnkyd
0 Replies

9. IP Networking

BitTorrent port 6969 blocked... how to get around the blocked port

Due to the massive Upload speeds killing .... or overstressing our schools network...... my school has blocked port 6969 (the most common BitTorrent port). So I cant connect to the tracker anymore, in other words no more downloading from school :( Does anyone know how I can get around the ports... (1 Reply)
Discussion started by: PenguinDevil
1 Replies

10. UNIX for Dummies Questions & Answers

Text parsing question

How would I split a file based on the location of a string, basically I want all entries above the string unix in this example 1 2 3 4 unix 5 6 7 Thanks, Chuck (3 Replies)
Discussion started by: 98_1LE
3 Replies
Login or Register to Ask a Question