Awk - Handling different types of newlines


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Awk - Handling different types of newlines
# 1  
Old 03-21-2011
Awk - Handling different types of newlines

Hi. We have some data that's generated from a webpage. Part is pretty well-formatted, but part of it preserves newlines in a way that breaks the record separating in awk. Here's 2 records, filtered through cat -e:
Code:
Jones,Bob,20,Q: What is your favorite ice cream?$
A: Butter Pecan$
Q: Do you like sprinkles?$
A: But of course$
cone^M$
Smith,Jane,18,Q: What is your favorite ice cream?$
A: Rocky Road$
Q: Do you like sprinkles?$
A: Yuck, no$
bowl^M$

So, you can see that there are different newline types here-- just the straight-up $ and the ^M$. How do I essentially get awk to ignore the first and use only the second as a record separator? Many thanks in advance.
# 2  
Old 03-21-2011
You have to 'preprocess' the file with dos2unix (also called dos2ux on some boxes)
Code:
dos2unix somefile | awk '{awk program here}'

# 3  
Old 03-21-2011
Thanks for the reply. That does normalize the situation somewhat-- all of the newline characters are consistent-- but unfortunately doesn't make processing much simpler. Following the use of dos2unix, here's a single record from the human's perspective:
Code:
Jones,Bob,20,Q: What is your favorite ice cream?$
A: Butter Pecan$
Q: Do you like sprinkles?$
A: But of course$
cone$

From awk's perspective, though, that's 5 different records. My naive approach is to try to turn the above into:

Code:
Jones,Bob,20,Q: What is your favorite ice cream?  A: Butter Pecan, Q: Do you like sprinkles? A: But of course, cone$

That seems to require using ^M$ as a record separator, and just discarding $ (UNIX newline, as opposed to ^M$) so that the records join together. Any suggestions on how that might be done? This is a strangely sticky problem, but just researching it has been very informative. Anyway, again, thanks for the reply.
# 4  
Old 03-21-2011
Hi treesloth,

Please try with:
Code:
awk -F"\$" '{gsub(/\^M/,"\n",$0); for(i=1;i<NF;i++) printf $i" " }' inputfile
Jones,Bob,20,Q: What is your favorite ice cream? A: Butter Pecan Q: Do you like sprinkles? A: But of course cone
 Smith,Jane,18,Q: What is your favorite ice cream? A: Rocky Road Q: Do you like sprinkles? A: Yuck, no bowl

Regards.
# 5  
Old 03-21-2011
Code:
dos2unix file|awk '{printf $0~/[AQ]:/?$0 FS: $0 RS}'

# 6  
Old 03-21-2011
treesloth,

This is an improved version of my first code. I was missing the convertion from DOS to Unix format and commas in the output.
This time those details are included:

Code:
awk '{gsub(/\r$/,"");gsub(/\$$/,",$");gsub(/?,\$/,"?$");gsub(/\^M,\$/,"\n");gsub(/\$/," ");printf $0}' inputfile
Jones,Bob,20,Q: What is your favorite ice cream? A: Butter Pecan, Q: Do you like sprinkles? A: But of course, cone
Smith,Jane,18,Q: What is your favorite ice cream? A: Rocky Road, Q: Do you like sprinkles? A: Yuck, no, bowl

Hope it helps,

Regards
# 7  
Old 03-22-2011
This should do it: tr '\r\n' '|' <in | sed 's/||/$/g' | tr '$' '\n'

You can pipe it to awk (for example: awk -F"|" '{print$0}' see below) and prove that it has the field separator "|" and the record separator of "\n"

stefangr$ cat in

Jones,Bob,20
Q: What is your favorite ice cream?
A: Butter Pecan
Q: Do you like sprinkles?
A: But of course
cone
Smith,Jane,18
Q: What is your favorite ice
cream?
A: Rocky Road
Q: Do you like sprinkles?
A: Yuck, no
bowl

stefangr$ tr '\r\n' '|' <in | sed 's/||/$/g' | tr '$' '\n' | awk -F"|" '{print$0}'
Jones,Bob,20|Q: What is your favorite ice cream?|A: Butter Pecan|Q: Do you like sprinkles?|A: But of course|cone
Smith,Jane,18|Q: What is your favorite ice |cream?|A: Rocky Road|Q: Do you like sprinkles?|A: Yuck, no|bowl
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Using find with awk to remove newlines

I want to list all html files present in a directory tree, the remove the newline and get one string with a space between files find /home/chrisd/Desktop/seg/geohtml/ -name '*.html' | awk BEGIN{FS=\r} '{print}' ---------- Post updated at 06:47 PM ---------- Previous update was at 06:25 PM... (5 Replies)
Discussion started by: kristinu
5 Replies

2. Shell Programming and Scripting

Handling 2 files simultaneously with awk

Hello, Is it possible to handle data from two different files at once in awk (latest version and platform is Fedora). I found on the net that we cannot nest awk. My requirement is that I have two similar files : File 1: Name: abc Val = 58 Name: cdf Val = 1; .................. File... (7 Replies)
Discussion started by: fifteate
7 Replies

3. Shell Programming and Scripting

handling asterix in AWK

I have a file like below. colA^col2^col3^col4^col5 aa^11^aaa^a1a^111^aa* bb*^22^bbb*^bb2^222^bb cc^33^ccc*^3cc^333^ccc dd^44^d*dd*^d4d^444^ddd ee^55^e*ee^e5e*^555^e*e NOTE: '^' is the field separator. I need to get the output as colA^col2^col3^col4^col5 aa^11^aaa^a1a^111^aa... (5 Replies)
Discussion started by: rinku11
5 Replies

4. Shell Programming and Scripting

Data handling using AWK

Hi, I have requirement to fetch lines with a particular character in one column e.g. 2.5M asdsad 3.5M sadsadas 12323M ssdss i tried following so far #echo 2.3M asdsad | nawk -F " " '{print substr($1,length($1))}' M So far i have tried following # echo 2.3M asdsad | nawk... (4 Replies)
Discussion started by: mtomar
4 Replies

5. UNIX for Advanced & Expert Users

awk function in handling quotes

Hi all, I have input lines like below empno,ename,sal,description ---------------------------- 311,"jone,abc",2000,manager 301,david,200,"president,ac" I need to sum the salary of them i.e. 2000+200 anything suggested Thanks, Shahnaz. Use code tags. (5 Replies)
Discussion started by: shahnazurs
5 Replies

6. Shell Programming and Scripting

handling arrays with awk

Hi, I have an issue that I am trying to resolve using arrays in awk. I have two files, the first one is a dictionary with this format: FILE 1 (dictionary) 'Abrir' 'Open' 'Aceptar' 'Accept' Every line has two fields, a word in two languages. The second file is a simple list of... (3 Replies)
Discussion started by: gmartinez
3 Replies

7. Shell Programming and Scripting

Handling regular expressions in awk

Script is: accept filename as argument(also handle CTRL+C).to check whether th file exist in the current directory,it it then using awk find the employees who are either born in 1990 or drawing a salary greater than 25000. In my database file 1st field is of id ,2nd field is name,5th field is of... (5 Replies)
Discussion started by: Priyanka Bhati
5 Replies

8. Shell Programming and Scripting

awk - need to remove unwanted newlines on match

Context: I need to remove unwanted newlines from a data file listing books and associated data. Here is a sample listing ( line numbers included ): 1 360762| Skip-beat! 14 /| 9781421517544| nb | 2008.| Nakamura, Yoshiki.| NAKAMUR | Kyoko Mogami followed 2 her true love Sho to Tokyo to... (6 Replies)
Discussion started by: Bubnoff
6 Replies

9. Shell Programming and Scripting

column handling in awk

Dear Scripting experts, I have a problem which i cannot get my head around and wondered if anyone can help me. I have two files "file1" and "file2" and i want to replace column one from file 1 with column one with file2.(where file two has many columns). see example.. ive tried to use cut and... (4 Replies)
Discussion started by: Mish_99
4 Replies

10. Shell Programming and Scripting

Handling special characters using awk

Hi all, How do I extract a value without special characters? I need to extract the value of %Used from below and if its greater than 80, need to send a notification. I am doing this right now..Its giving 17%..Is there a way to extract the value and assign it to a variable in one step? df |grep... (3 Replies)
Discussion started by: sam_78_nyc
3 Replies
Login or Register to Ask a Question