Visit Our UNIX and Linux User Community


AWK Multi-Line Records Processing


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting AWK Multi-Line Records Processing
# 1  
Old 10-11-2007
AWK Multi-Line Records Processing

I am an Awk newbie and cannot wrap my brain around my problem:

Given multi-line records of varying lengths separated by a blank line I need to skip the first two lines
of every record and extract every-other line in each record unless the first line of the record has the word "(CONT)" in the line
then skip the second line and append those every-other lines to the previous records every-other lines.

I hope that makes some sort of sense. I tried the following awk to get every
other line but it doesn't come out right. So i haven't even begun to try to figure out the rest of my problem....
awk '(((NR % 2) == 0) && ( NR > 2 )) {print}' ~/Desktop/datafile

Any programming help would be appreciated! I have provided the following example input data of four records:

CHARGER R M 1972 9 3 3 1 $7,060 1570 INDY 13 $27,717 MICKEY E.& OLGA B.SMITH,VIENNA,NC.
LA72TAUR FORD CHEVY GMC 1.57.00Q DAVID R.MILLER,ALDEN,NY.
MD TEST 0321 1371 OFF OFF OFF SONNY SMITH, KASHMAN, DAVE
FEB 20-98 VLY 1041 2094 8 8 8 8 8 8 NB SMITH GW : - : - :LAPS : 8 : 40 : - : - : 1
GD TEST 0311 1354 3H 4H 2H 0304 VIC, YO HERSHEY, CHARGER
JAN 7-98 VLY 1030 2064 6 3 3 4 4 3 2071 NB SMITH GW : - : - :LAPS : 6 : 40 : - : - : 1
$2,000 MD NW2L5CD 0303 1343 1Q 2 2T 0314 WILD MICK, CHARGER, TEKLA
MAR 9-98 VLY $500 1024 2060 5 2 3 4 2 2 2063 1900 SMITH GW : - : - :LAPS : 8 : 36 : - : - : 1
$2,700 GD OPENRUN 0292 1312 2 H 1 0303 CHARGER, HAL THE BARBER, WOLFMAN
MAR 13-98 VLY $1,350 1004 2022 2 3 2 2 1 1 2022 1130 SARAMA GW : - : - :LAPS : 7 : 31 : - : - : 2
$2,700 FT NW2L5CD 0294 1320 Q Q H 0300 WHEELS WIN, CHARGER, ROCK
MAR 27-98 VLY $675 1013 2020 4 1 1 1 1 2 2020 *1750 SMITH GW : - : - :LAPS : 8 : 60 : - : - : 2
$3,000 FT OPEN 0291 1301 1 1 3Q 0293 CHARGER, OVERRUN, ROCK
MAY 1-98 VLY $1,500 0594 1594 2 1 1 1 1 1 1594 *9500 SMITH GW : - : - :LAPS : 7 : 70 : - : - : 9
$4,000 FT OPEN 0263 1280 2Q 1Q T 0283 CHARGER, GUARDIAN, TORRE
MAY 9-98 HILL $2,000 0581 1570 1 6 5 4 2 1 1570 *2200 SMITH GW : - : - :LAPS : 7 : 60 : - : - : 8
$4,400 FT WO4000LT 0292 1320 OFF OFF OFF TORRE, ROCK, TY ZOLAK
MAY 15-98 HILL 1003 2011 7 8 8 8 8 8 *265 SMITH GW : - : - :LAPS : 8 : 75 : - : - : 9
$8,000 FT TM1500CND 0290 1294 OFF OFF BOSTON BEEMER, THE CANNON, ZURICH TOYOTA
MAY 21-98 HILL 0593 2010 7 8 8 8 8 8 9550 SMITH GW : - : - :LAPS : 9 : 46 : - : - : 2

SPARKPLUG BLK M 1964 2 5 5 4 $10,534 2001 HILL 5 $14,926 JOHN DOE,TARPORT,DE.
N764CHVY FORD CHEVY GMC 2.00.10F ELMER SMITH,NY,NY.
$2,700 FT NW4L5CD 0294 1320 Q Q H 0300 WHEELS WIN, CHARGER, ROCK
FEB 22-98 VLY $675 1013 2020 4 1 1 1 1 2 2020 *1750 SMITH GW : - : - :LAPS : 8 : 60 : - : - : 3
$2,700 FT NW4L5CD 0291 1311 1H T LT 0294 HAL THE BARBER, CHARGER, MAC
APR 3-98 VLY $675 1001 2011 3 2 2 3 3 2 2011 *1550 SMITH GW : - : - :LAPS : 6 : 45 : - : - : 5

SPARKPLUG (CONT)
N764CHVY
$2,000 MD NW4L5CD 0303 1343 1Q 2 2T 0314 WILD MICK, CHARGER, TEKLA
MAR 8-99 VLY $500 1024 2060 5 2 3 4 2 2 2063 1900 SMITH GW : - : - :LAPS : 8 : 36 : - : - : 10
$2,700 GD OPENRUN 0292 1312 2 H 1 0303 CHARGER, HAL THE BARBER, WOLFMAN
MAR 13-99 VLY $1,350 1004 2022 2 3 2 2 1 1 2022 1130 SMITH GW : - : - :LAPS : 7 : 31 : - : - : 7
$2,700 FT NW4L5CD 0294 1320 Q Q H 0300 WHEELS WIN, CHARGER, ROCK

DUTCHESS W F 82 21 3 2 4 $10,834 2003 VLY 3 $10,858 TARP INC,VALLEY CITY,CA.
PN82TRCK FORD CHEVY GMC 2.00.30M RICK SMITH,RED CEDAR,ND.
$2,800 MD F-NW2CND 0284 1311 7 8Q CARD SHARK, PHP GIRL, BREEZY BREE
AUG 25-98 RIDC 1011 2011 9 4 3< 3< 7 7 2024 820 MILLER TF : - : - :MILE : 9 : 69 : - : - : 6
# 2  
Old 10-11-2007
Not sure which lines you are wanting to print, but there is a trick with awk. You can reset the value of NR anytime you want.

So a program like this

awk 'NR > 2 && (NR % 2 == 0 ){ print}
/^$/{NR=0}' <textfile> # /^$/ represent the blank line

got me output that looked like this:

FEB 20-98 VLY 1041 2094 8 8 8 8 8 8 NB SMITH GW : - : - :LAPS : 8 : 40 : - : - : 1
JAN 7-98 VLY 1030 2064 6 3 3 4 4 3 2071 NB SMITH GW : - : - :LAPS : 6 : 40 : - : - : 1
MAR 9-98 VLY $500 1024 2060 5 2 3 4 2 2 2063 1900 SMITH GW : - : - :LAPS : 8 : 36 : - : - : 1
MAR 13-98 VLY $1,350 1004 2022 2 3 2 2 1 1 2022 1130 SARAMA GW : - : - :LAPS : 7 : 31 : - : - : 2
MAR 27-98 VLY $675 1013 2020 4 1 1 1 1 2 2020 *1750 SMITH GW : - : - :LAPS : 8 : 60 : - : - : 2
MAY 1-98 VLY $1,500 0594 1594 2 1 1 1 1 1 1594 *9500 SMITH GW : - : - :LAPS : 7 : 70 : - : - : 9
MAY 9-98 HILL $2,000 0581 1570 1 6 5 4 2 1 1570 *2200 SMITH GW : - : - :LAPS : 7 : 60 : - : - : 8
MAY 15-98 HILL 1003 2011 7 8 8 8 8 8 *265 SMITH GW : - : - :LAPS : 8 : 75 : - : - : 9
MAY 21-98 HILL 0593 2010 7 8 8 8 8 8 9550 SMITH GW : - : - :LAPS : 9 : 46 : - : - : 2
FEB 22-98 VLY $675 1013 2020 4 1 1 1 1 2 2020 *1750 SMITH GW : - : - :LAPS : 8 : 60 : - : - : 3
APR 3-98 VLY $675 1001 2011 3 2 2 3 3 2 2011 *1550 SMITH GW : - : - :LAPS : 6 : 45 : - : - : 5
MAR 8-99 VLY $500 1024 2060 5 2 3 4 2 2 2063 1900 SMITH GW : - : - :LAPS : 8 : 36 : - : - : 10
MAR 13-99 VLY $1,350 1004 2022 2 3 2 2 1 1 2022 1130 SMITH GW : - : - :LAPS : 7 : 31 : - : - : 7

AUG 25-98 RIDC 1011 2011 9 4 3< 3< 7 7 2024 820 MILLER TF : - : - :MILE : 9 : 69 : - : - : 6
# 3  
Old 10-11-2007
Thanks for the reply! Your solution got exactly the lines i wanted to pick out!

I believe i should be able to solve the rest of my problem on my own using some type of regex for "(CONT)" on the first line and an if-else statement.

If i can't figure it out, i'll be back with another question Smilie

Thanks again for your help, that trick with resetting the NR is a good one to know as i was clueless and kept fiddling with the settings for the FS and RS which was getting me no-where fast. Your solution is simply elegant.
# 4  
Old 10-18-2007
Is there any way to get the info lined up in columns using printf? I've tried a few things but it never seems to come out right; maybe the data is just too funky to get it to line-up?

So, given INPUT like:
FEB 20-98 VLY 1041 2094 8 8 8 8 8 8 NB SMITH GW : - : - :LAPS : 8 : 40 : - : - : 1
JAN 7-98 VLY 1030 2064 6 3 3 4 4 3 2071 NB SMITH GW : - : - :LAPS : 6 : 40 : - : - : 1
MAR 9-98 VLY $500 1024 2060 5 2 3 4 2 2 2063 1900 SMITH GW : - : - :LAPS : 8 : 36 : - : - : 1
MAR 13-98 VLY $1,350 1004 2022 2 3 2 2 1 1 2022 1130 SARAMA GW : - : - :LAPS : 7 : 31 : - : - : 2

Can i get OUTPUT like:
Code:
FEB 20-98	VLY			1041	2094	8  8  8  8  8  8		NB    SMITH GW	: - : - :LAPS : 8 : 40 : - : - : 1
JAN 7-98	VLY			1030	2064	6  3  3  4  4  3	2071	NB    SMITH GW	: - : - :LAPS : 6 : 40 : - : - : 1
MAR 9-98	VLY		$500	1024	2060	5  2  3  4  2  2	2063	1900  SMITH GW	: - : - :LAPS : 8 : 36 : - : - : 1
MAR 13-98	VLY		$1,350	1004	2022	2  3  2  2  1  1	2022	1130  SARAMA GW	: - : - :LAPS : 7 : 31 : - : - : 2
MAR 27-98	VLY		$675	1013	2020	4  1  1  1  1  2	2020	*1750 SMITH GW	: - : - :LAPS : 8 : 60 : - : - : 2

# 5  
Old 10-18-2007
Instead of a print, use a printf command. It allows you to specifiy a mask, then the data to print. for example


printf("%-30s", "MY NAME");

will right justify the value in the column. If you are a C programmer, it follows that printing convention. I suggest looking up the online (free and in pdf) version of "Effective awk programming" by Arnold Robbins for more information.
# 6  
Old 10-18-2007
I've been reading about, and testing printf options for a while now, and am stuck on how to handle the above situation where one of the fields in a column is blank-whitespace. I tried using printf in the above code, specifically the following line using the first six fields only (i want to format the rest of the fields too, but for testing purposes only tried the first six to show my problem):
Code:
NR > 2 && (NR % 2 == 0 ) {printf "%-5s%-8s: %-10s: %-15s: %-10s: %-5s:\n",$1,$2,$3,$4,$5,$6} /^$/{NR=0}

I GET RETURNED:
FEB  20-98   : VLY       : 1041           : 2094      : 8    :
JAN  7-98    : VLY       : 1030           : 2064      : 6    :
MAR  9-98    : VLY       : $500           : 1024      : 2060 :
MAR  13-98   : VLY       : $1,350         : 1004      : 2022 :
MAR  27-98   : VLY       : $675           : 1013      : 2020 :
MAY  1-98    : VLY       : $1,500         : 0594      : 1594 :
MAY  9-98    : HILL      : $2,000         : 0581      : 1570 :
MAY  15-98   : HILL      : 1003           : 2011      : 7    :
MAY  21-98   : HILL      : 0593           : 2010      : 7    :
FEB  22-98   : VLY       : $675           : 1013      : 2020 :
APR  3-98    : VLY       : $675           : 1001      : 2011 :
MAR  8-99    : VLY       : $500           : 1024      : 2060 :
MAR  13-99   : VLY       : $1,350         : 1004      : 2022 :
             :           :                :           :      :
AUG  25-98   : RIDC      : 1011           : 2011      : 9    :

Which messes up which columns go where. So, how can i handle formatting a field that is whitespace?

It should be:
Code:
FEB  20-98   : VLY       :                : 1041      : 2094      : 8    :
JAN  7-98    : VLY       :                : 1030      : 2064      : 6    :
MAR  9-98    : VLY       : $500           : 1024      : 2060 :
MAR  13-98   : VLY       : $1,350         : 1004      : 2022 :
MAR  27-98   : VLY       : $675           : 1013      : 2020 :
MAY  1-98    : VLY       : $1,500         : 0594      : 1594 :
MAY  9-98    : HILL      : $2,000         : 0581      : 1570 :
MAY  15-98   : HILL      :                : 1003      : 2011      : 7    :
MAY  21-98   : HILL      :                : 0593      : 2010      : 7    :
FEB  22-98   : VLY       : $675           : 1013      : 2020 :
APR  3-98    : VLY       : $675           : 1001      : 2011 :
MAR  8-99    : VLY       : $500           : 1024      : 2060 :
MAR  13-99   : VLY       : $1,350         : 1004      : 2022 :

AUG  25-98   : RIDC      :                : 1011      : 2011      : 9    :

# 7  
Old 10-18-2007
Yeah, but it is going to get complicated (believe it or not, this has been pretty straightforward).

the problem comes up from awk not being able to recognize a whitespace column. If you used tab separators, you could have a -F parameter for the tabs, but if it is simply spaces, you have to make an programmatic decision.

for instance, it looks like if column 3 is a $ amount - if that is always true, you can check to see if it has a $, and print in the right column, or, if not, then you know everything has slid down one.

So you could have to printf statements
if ($3 ~ /\$/ )
{ print style 1 }
else
{ print style 2 }

As much as I hate to admit it, I would have to try some trial and error to make sure the search for the $ works, since that is and End_of_line indicator and I was thinking that escaping it was the right idea.

Previous Thread | Next Thread
Test Your Knowledge in Computers #305
Difficulty: Easy
A regular expression is an integer array that can be used to describe several sequences of characters.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with reformat single-line multi-fasta into multi-line multi-fasta

Input File: >Seq1 ASDADAFASFASFADGSDGFSDFSDFSDFSDFSDFSDFSDFSDFSDFSDFSD >Seq2 SDASDAQEQWEQeqAdfaasd >Seq3 ASDSALGHIUDFJANCAGPATHLACJHPAUTYNJKG ...... Desired Output File >Seq1 ASDADAFASF ASFADGSDGF SDFSDFSDFS DFSDFSDFSD FSDFSDFSDF SD >Seq2 (4 Replies)
Discussion started by: patrick87
4 Replies

2. Shell Programming and Scripting

awk - Multi-line data to be stored in variable

Greetings Experts, As part of automating the sql generation, I have the source table name, target table name, join condition stored in a file join_conditions.txt which is a delimited file (I can edit the file if for any reason). The reason I needed to store is I have built SELECT list without... (5 Replies)
Discussion started by: chill3chee
5 Replies

3. Shell Programming and Scripting

[awk] line by line processing the same file

Hey, not too good at this, so I only managed a clumsy and SLOW solution to my problem that needs a drastic speed up. Any ideas how I write the following in awk only? Code is supposed to do... For every line read column values $6, $7, $8 and do a calculation with the same column values of every... (6 Replies)
Discussion started by: origamisven
6 Replies

4. Shell Programming and Scripting

Multi-line filtering based on multi-line pattern in a file

I have a file with data records separated by multiple equals signs, as below. ========== RECORD 1 ========== RECORD 2 DATA LINE ========== RECORD 3 ========== RECORD 4 DATA LINE ========== RECORD 5 DATA LINE ========== I need to filter out all data from this file where the... (2 Replies)
Discussion started by: Finja
2 Replies

5. Shell Programming and Scripting

Transpose multi-line records into a single row

Now that I've parsed out the data that I desire I'm left with variable length multi-line records that are field seperated by new lines (\n) and record seperated by a single empty line ("") At first I was considering doing something like this to append all of the record rows into a single row: ... (4 Replies)
Discussion started by: daveyabe
4 Replies

6. UNIX for Dummies Questions & Answers

Alphabetical sort for multi line records contains in a single file

Hi all, I So, I've got a monster text document comprising a list of various company names and associated info just in a long list one after another. I need to sort them alphabetically by name... The text document looks like this: Company Name: the_first_company's_name_here Address:... (2 Replies)
Discussion started by: quee1763
2 Replies

7. Shell Programming and Scripting

Capturing multi-line records containing known value?

Some records in a file look like this, with any number of lines between start and end flags: /Start Some stuff Banana 1 Some more stuff End/ /Start Some stuff End/ /Start Some stuff Some more stuff Banana 2 End/ ...how would I process this file to find records containing the... (8 Replies)
Discussion started by: cs03dmj
8 Replies

8. Infrastructure Monitoring

Processing records as group - awk

I have a file has following records policy glb id 1233 name Permit ping from "One" to "Second" "Address1" "Any" "ICMP-ANY" permit policy id 999251 service "snmp-udp" exit policy glb id 1234 name Permit telnet from "One" to "Second" "Address2" "Any" "TCP-ANY" permit policy id 1234... (3 Replies)
Discussion started by: baskar
3 Replies

9. Shell Programming and Scripting

reading a file inside awk and processing line by line

Hi Sorry to multipost. I am opening the new thread because the earlier threads head was misleading to my current doubt. and i am stuck. list=`cat /u/Test/programs`; psg "ServTest" | awk -v listawk=$list '{ cmd_name=($5 ~ /^/)? $9:$8 for(pgmname in listawk) ... (6 Replies)
Discussion started by: Anteus
6 Replies

10. Shell Programming and Scripting

AWK Multi-Line Records Numbering Problem

I have a set of files of multi-line records with the records separated by a blank line. I needed to add a record number to the front of each line followed by a colon and did the following: awk 'BEGIN {FS = "\n"; RS = ""}{for (i=1; i<=NF; i++)print NR,":",$i}' ~/Desktop/data98-1-25.txt >... (3 Replies)
Discussion started by: RacerX
3 Replies

Featured Tech Videos