How to extract entire stanza using awk?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to extract entire stanza using awk?
# 1  
Old 09-10-2018
How to extract entire stanza using awk?

Hello friends,

I have a text file with lot of stanzas with each starting with "[Event " and I need to extract the stanzas which has string O-O-O.

Sample file :-
Code:
[Event "EU/C2016/ct02"]
[Site "ICCF"]
[Date "2016.03.15"]
[Round "?"]
[White "Tinture, Laurent"]
[Black "Sommerbauer, Dr. Norbert"]
[Result "1/2-1/2"]
[WhiteElo "2452"]
[BlackElo "2420"]
[PlyCount "54"]
[EventDate "2016.??.??"]
[Source "ICCF"]

1. d4 Nf6 2. c4 g6 3. Nc3 d5 4. cxd5 Nxd5 5. e4 Nxc3 6. bxc3 Bg7 7. Nf3 c5 8.
Rb1 O-O 9. Be2 cxd4 10. cxd4 Qa5+ 11. Bd2 Qxa2 12. O-O Bg4 13. Bg5 h6 14. Be3
Nc6 15. d5 Bxf3 16. gxf3 Nd4 17. Bd3 a5 18. f4 b5 19. Bxd4 Bxd4 20. Bxb5 Bc5
21. Qd3 Bb4 22. Rfd1 Rfc8 23. Bc6 Rab8 24. Rbc1 Qa3 25. Qg3 Qb2 26. Qd3 Qa3 27.
Qg3 Qb2 1/2-1/2

[Event "UKR/C28/final (UKR)"]
[Site "ICCF"]
[Date "2017.03.10"]
[Round "?"]
[White "Rudenko, Vitaly"]
[Black "Begliy, Mikhail"]
[Result "1/2-1/2"]
[WhiteElo "2398"]
[BlackElo "2427"]
[PlyCount "37"]
[EventDate "2017.??.??"]
[Source "ICCF"]

1. d4 Nf6 2. c4 g6 3. Nc3 d5 4. cxd5 Nxd5 5. e4 Nxc3 6. bxc3 Bg7 7. Bc4 c5 8.
Ne2 Nc6 9. Be3 O-O 10. O-O b6 11. Rc1 Bb7 12. Qd2 Rc8 13. Rfd1 e6 14. f3 Na5
15. Bb5 cxd4 16. cxd4 Rxc1 17. Rxc1 a6 18. Bd3 Nc6 19. Bc4 1/2-1/2

[Event "LIPEAD40/f (PER)"]
[Site "ICCF"]
[Date "2016.09.30"]
[Round "?"]
[White "Rost, Detlef"]
[Black "Rawlings, Alan J. C"]
[Result "1/2-1/2"]
[WhiteElo "2451"]
[BlackElo "2368"]
[PlyCount "39"]
[EventDate "2016.??.??"]
[Source "ICCF"]

1. d4 Nf6 2. c4 g6 3. Nc3 d5 4. cxd5 Nxd5 5. Bd2 Bg7 6. e4 Nxc3 7. Bxc3 O-O 8.
Qd2 Nc6 9. Nf3 Bg4 10. d5 Bxf3 11. gxf3 Ne5 12. O-O-O c6 13. Qd4 Qd6 14. Kb1
Qf6 15. dxc6 Nxc6 16. Qxf6 Bxf6 17. Bxf6 exf6 18. Bb5 Rfd8 19. Bxc6 bxc6 20.
Kc2 1/2-1/2

[Event "GER/CM/04-A (GER)"]
[Site "ICCF"]
[Date "2017.06.26"]
[Round "?"]
[White "Felkel, Siegfried"]
[Black "Schulz, Günter"]
[Result "1/2-1/2"]
[WhiteElo "2394"]
[BlackElo "2403"]
[PlyCount "51"]
[EventDate "2017.??.??"]
[Source "ICCF"]

1. d4 Nf6 2. c4 g6 3. Nc3 d5 4. cxd5 Nxd5 5. e4 Nxc3 6. bxc3 Bg7 7. Nf3 c5 8.
Be3 Qa5 9. Qd2 Nc6 10. Rb1 a6 11. Rc1 cxd4 12. cxd4 Qxd2+ 13. Kxd2 e6 14. Bd3
O-O 15. Rc4 Bd7 16. Rhc1 Rfd8 17. Ke2 h6 18. Bf4 Rac8 19. h4 b5 20. R4c2 Nxd4+
21. Nxd4 Bxd4 22. Bxh6 Rxc2+ 23. Rxc2 Rc8 24. Rxc8+ Bxc8 25. Be3 Bxe3 26. Kxe3
1/2-1/2

[Event "CT20/pr41"]
[Site "ICCF"]
[Date "2013.11.30"]
[Round "?"]
[White "Pachnicke, Harald"]
[Black "Oppermann, Peter"]
[Result "0-1"]
[WhiteElo "2076"]
[BlackElo "2277"]
[PlyCount "82"]
[EventDate "2013.??.??"]
[Source "ICCF"]

1. d4 Nf6 2. c4 g6 3. Nc3 d5 4. Nf3 Bg7 5. Bg5 Ne4 6. cxd5 Nxg5 7. Nxg5 e6 8.
Qa4+ c6 9. dxc6 Nxc6 10. Nf3 Bd7 11. O-O-O O-O 12. Qa3 b5 13. Nxb5 Rb8 14. e4
Qb6 15. Kb1 Na5 16. Nd6 Ba4 17. Rd2 Bh6 18. Re2 Rfc8 19. Nxc8 Rxc8 20. Re3 Bc2+
21. Ka1 Bf8 22. Rc3 Rd8 23. Qxf8+ Kxf8 24. Rxc2 Nc6 25. Be2 Nxd4 26. Nxd4 Rxd4
27. Bf3 Kg7 28. g3 Qd8 29. Rf1 Rd3 30. Be2 Rd2 31. Rxd2 Qxd2 32. Bf3 Qd3 33.
Bg2 Qe2 34. Kb1 e5 35. a4 a5 36. Ka2 f5 37. exf5 gxf5 38. h4 Qc2 39. Ka3 e4 40.
b3 h5 41. Bh1 Kf6 0-1

Expected output:-
Code:
[Event "LIPEAD40/f (PER)"]
[Site "ICCF"]
[Date "2016.09.30"]
[Round "?"]
[White "Rost, Detlef"]
[Black "Rawlings, Alan J. C"]
[Result "1/2-1/2"]
[WhiteElo "2451"]
[BlackElo "2368"]
[PlyCount "39"]
[EventDate "2016.??.??"]
[Source "ICCF"]

1. d4 Nf6 2. c4 g6 3. Nc3 d5 4. cxd5 Nxd5 5. Bd2 Bg7 6. e4 Nxc3 7. Bxc3 O-O 8.
Qd2 Nc6 9. Nf3 Bg4 10. d5 Bxf3 11. gxf3 Ne5 12. O-O-O c6 13. Qd4 Qd6 14. Kb1
Qf6 15. dxc6 Nxc6 16. Qxf6 Bxf6 17. Bxf6 exf6 18. Bb5 Rfd8 19. Bxc6 bxc6 20.
Kc2 1/2-1/2

[Event "CT20/pr41"]
[Site "ICCF"]
[Date "2013.11.30"]
[Round "?"]
[White "Pachnicke, Harald"]
[Black "Oppermann, Peter"]
[Result "0-1"]
[WhiteElo "2076"]
[BlackElo "2277"]
[PlyCount "82"]
[EventDate "2013.??.??"]
[Source "ICCF"]

1. d4 Nf6 2. c4 g6 3. Nc3 d5 4. Nf3 Bg7 5. Bg5 Ne4 6. cxd5 Nxg5 7. Nxg5 e6 8.
Qa4+ c6 9. dxc6 Nxc6 10. Nf3 Bd7 11. O-O-O O-O 12. Qa3 b5 13. Nxb5 Rb8 14. e4
Qb6 15. Kb1 Na5 16. Nd6 Ba4 17. Rd2 Bh6 18. Re2 Rfc8 19. Nxc8 Rxc8 20. Re3 Bc2+
21. Ka1 Bf8 22. Rc3 Rd8 23. Qxf8+ Kxf8 24. Rxc2 Nc6 25. Be2 Nxd4 26. Nxd4 Rxd4
27. Bf3 Kg7 28. g3 Qd8 29. Rf1 Rd3 30. Be2 Rd2 31. Rxd2 Qxd2 32. Bf3 Qd3 33.
Bg2 Qe2 34. Kb1 e5 35. a4 a5 36. Ka2 f5 37. exf5 gxf5 38. h4 Qc2 39. Ka3 e4 40.
b3 h5 41. Bh1 Kf6 0-1

what I tried:-
Code:
awk '/^\[Event/{flag=1;if(flag && non_flag){print val};val=flag=non_flag=""} /O-O-O/{non_flag=1} {val=val?val ORS $0:$0}'  test_file

above cmd shows below:- (but it displays only first occurrence of the searching pattern but not all that too few missing lines)
Code:
[EventDate "2016.??.??"]
[Source "ICCF"]

1. d4 Nf6 2. c4 g6 3. Nc3 d5 4. cxd5 Nxd5 5. Bd2 Bg7 6. e4 Nxc3 7. Bxc3 O-O 8.
Qd2 Nc6 9. Nf3 Bg4 10. d5 Bxf3 11. gxf3 Ne5 12. O-O-O c6 13. Qd4 Qd6 14. Kb1
Qf6 15. dxc6 Nxc6 16. Qxf6 Bxf6 17. Bxf6 exf6 18. Bb5 Rfd8 19. Bxc6 bxc6 20.
Kc2 1/2-1/2


Please advise, thanks!
# 2  
Old 09-10-2018
how about:
Code:
awk '/^[[]Event/ {e=$0;next} /O-O-O/ {print e ORS $0}' RS= ORS='\n\n' myFile

This User Gave Thanks to vgersh99 For This Post:
# 3  
Old 09-10-2018
Many thanks vgersh99.

I'm unable to add "solved" tag, could any admin/Mod please do that for me?

I find it extremely hard to learn awk and is highly confusing. Could anyone please suggest a book/link that explains awk in the easiest way? thanks!!
# 4  
Old 09-10-2018
There're many awk resources out there including manuals and tutorials.
This is one I have bookmarked (among others) awhile back, but I cannot recall it was good or not.
See if it helps.
# 5  
Old 09-23-2018
Apologies to bump this thread but I have problems with large size of input files(~15MB size and 400K lines). The solution offered by vgersh99 did work for the sample provided and also for few small-sized input files. But it outputs entire input for large files. I'm not sure what's wrong with it.

Please advise, thanks!
# 6  
Old 09-23-2018
It should not differentiate between small and large files. Do you have structural differences in the large files? Mayhap DOS line terminators (^M = <CR> = \r = 0x0D)? How are those files created?
This User Gave Thanks to RudiC For This Post:
# 7  
Old 09-23-2018
Thanks RidiC for correctly pointing out about dos format. I had to dos2unix which solved the issue. Smilie

Last edited by prvnrk; 09-23-2018 at 04:23 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to extract entire para instead of just line?

Hello, I have a file with multiple paragraphs/sections each starting with word "Handle" and if I grep for a pattern, I should get contents of entire section/para (not just line). Please advise, thanks! #script.sh file.txt "System Information" Handle 0x0001 DMI type 1, 27 bytes. ... (9 Replies)
Discussion started by: reddyr
9 Replies

2. AIX

Fsize in default: stanza.

The default: stanza in /etc/security/limits is still set to 2097151 on fsize (max file size). I know tar had issues with large files but is there any other reasons for it? I'm thinking yes since it's still set to that by IBM. Cheers, DH (4 Replies)
Discussion started by: Devyn
4 Replies

3. UNIX for Dummies Questions & Answers

Using awk to find max and printing entire line

Hi folks, I am very new to awk. I have what is probably a very simple question. I'm trying to get the max value of column 1, but also print column 2. My data looks like this: 0.044|2000-02-03 14:00:00 5.23|2000-02-03 05:45:00 5.26|2000-02-03 11:15:00 0|2000-02-01 18:30:00 So in this case... (2 Replies)
Discussion started by: amandarobe
2 Replies

4. Shell Programming and Scripting

awk sum entire string

Hi I am trying to carry out a sum on a file (totals.txt). The file looks like: So far i have this command this returns 20610 I however want it to return 000000206100 Any help would be great thanks! (6 Replies)
Discussion started by: nwalsh88
6 Replies

5. UNIX for Dummies Questions & Answers

Do I need to extract the entire tar file to confirm the tar folder is fine?

I would like to confirm my file.tar is been tar-ed correctly before I remove them. But I have very limited disc space to untar it. Can I just do the listing instead of actual extract it? Can I say confirm folder integrity if the listing is sucessful without problem? tar tvf file1.tar ... (1 Reply)
Discussion started by: vivien_chu
1 Replies

6. Shell Programming and Scripting

awk if statement not printing entire field

I have an input that looks like this: chr1 mm9_knownGene utr3 3204563 3206102 0 - . gene_id "Xkr4"; transcript_id "uc007aeu.1"; chr1 mm9_knownGene utr3 4280927 4283061 0 - . gene_id "Rp1"; transcript_id "uc007aew.1"; chr1 mm9_knownGene ... (5 Replies)
Discussion started by: pbluescript
5 Replies

7. Shell Programming and Scripting

Printing entire field, if at least one row is matching by AWK

Dear all, I have been trying to print an entire field, if the first line of the field is matching. For example, my input looks something like this. aaa ddd zzz 123 987 126 24 0.650 985 354 9864 0.32 0.333 4324 000 I am looking for a pattern,... (5 Replies)
Discussion started by: Chulamakuri
5 Replies

8. Shell Programming and Scripting

Script required to extract a specific snippet from the entire file.

Hi, I have a file with the following structure. XXXXX........... YYYYY........... ................. .................. ZZZZZZ...... qwerty_start.............. .................. ................. .................. querty_end................ .............................. (1 Reply)
Discussion started by: abinash
1 Replies

9. Shell Programming and Scripting

selecting the stanza fields

Hi Friends, I have a stanza file as below : CuDv: name = "hdisk34" status = 0 chgstatus = 3 ddins = "scsidisk" location = "06-08-02" parent = "fscsi0" connwhere = "W_0" PdDvLn = "disk/fcp/mpioosdisk" CuDv: ... (1 Reply)
Discussion started by: vijaya2006
1 Replies

10. Shell Programming and Scripting

getting the stanza names if the pattern found

Hi Friends, I have a file as below : machine1: abc xyz qwerty machine2: jkl mno machine3: hhh kkk qwerty Now ...i need to find all the stanza names that have the pattern "qwerty'" in it...( i need to get the output as machine1 and machine3 since... (4 Replies)
Discussion started by: vijaya2006
4 Replies
Login or Register to Ask a Question