awk - need to remove unwanted newlines on match

08-18-2009

Registered User

50, 1

Join Date: Jun 2009

Last Activity: 19 October 2010, 9:04 PM EDT

Posts: 50

Thanks Given: 6

Thanked 1 Time in 1 Post

awk - need to remove unwanted newlines on match

Context:
I need to remove unwanted newlines from a data file listing books and associated data. Here is a sample listing ( line numbers included ):

Code:

1 360762| Skip-beat! 14 /| 9781421517544| nb        | 2008.| Nakamura, Yoshiki.| NAKAMUR | Kyoko Mogami followed 
2 her true love Sho to Tokyo to support him while he made it big as an idol. But he's casting her out now that he's famous. 
Kyoko won't suffer in silence--she's going to get her sweet revenge by beating Sho in show biz.
3 361018| Angel numbers 101 : the meaning of 111, 123, 444, and other number sequences /| 1401920012| b         | 2008.| 
Virtue, Doreen, 1958-| 133.3359 VIRTUE |

I am using the following, found in these forums, for removing unwanted newlines:

Code:

awk 'NR==1{s=$0;next} /^[a-zA-Z]|^;/{s=s$0;next} {print s;s=$0} END{if(s)print s}' $RAW_DATA > $UNSPLIT

However, it is inexact and leaves some lines with punctuation and dates unresolved.

It needs to:
Find lines in which the first field DOES NOT contain precisely 6 digits and append them to the line above.

Thanks ~

Bub

Bubnoff

View Public Profile for Bubnoff

Find all posts by Bubnoff

08-18-2009

Registered User

187, 4

Join Date: Jul 2009

Last Activity: 20 February 2013, 8:48 AM EST

Posts: 187

Thanks Given: 0

Thanked 4 Times in 4 Posts

When you say "( line numbers included ):", do you mean you added for readability? If you just post what the output shoud look like,it will be easier.

edidataguy

View Public Profile for edidataguy

Find all posts by edidataguy

08-18-2009

Registered User

2,163, 123

Join Date: Nov 2007

Last Activity: 31 July 2016, 9:42 AM EDT

Location: H3X

Posts: 2,163

Thanks Given: 11

Thanked 123 Times in 116 Posts

Code:

# awk -F\| '{if(NR==1){printf}else{if($1*1){printf "\n%s",$0}else{printf " %s",$0}}}' file

danmero

View Public Profile for danmero

Find all posts by danmero

08-18-2009

Registered User

50, 1

Join Date: Jun 2009

Last Activity: 19 October 2010, 9:04 PM EDT

Posts: 50

Thanks Given: 6

Thanked 1 Time in 1 Post

I was using gvim and the line numbers didn't copy over so I added them. I mentioned that to let people know it wasn't part of the data.

Sorry for the confusion.

---------- Post updated at 10:16 AM ---------- Previous update was at 09:59 AM ----------

Thanks Danmero ...but I get this error.

awk: (FILENAME=All_Items.out FNR=1) fatal: printf: no arguments

---------- Post updated at 10:58 AM ---------- Previous update was at 10:16 AM ----------

Thanks for the link to the other post Danmero. That actually turned out to
be what I looking for. I adjusted it to my situation as follows:

Code:

awk -F\| --posix '{if(/^[0-9]{6}/){if(NR>1){printf "%s\n",$0}else{printf}}}' All_Items.out > tester

I'm not sure I understand how your example on this thread was supposed to work though.

As a bit of an aside:
Is there a better way to describe the regex above ...i.e. without the --posix
option?

Bub

Bubnoff

View Public Profile for Bubnoff

Find all posts by Bubnoff

08-18-2009

Registered User

2,163, 123

Join Date: Nov 2007

Last Activity: 31 July 2016, 9:42 AM EDT

Location: H3X

Posts: 2,163

Thanks Given: 11

Thanked 123 Times in 116 Posts

Use GNU awk (gawk), New awk (nawk) or POSIX awk (/usr/xpg4/bin/awk) on Solaris.
Works for me
To keep the forums high quality for all users, please take the time to format your posts correctly.
1. Use Code Tags when you post any code or data samples so others can easily read your code.
  You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags [code] and [/code] by hand.)
2. Avoid adding color or different fonts and font size to your posts.
  Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red.
3. Be careful when you cut-and-paste, edit any odd characters and make sure all links are working property.
Thank You.

The UNIX and Linux Forums
Reply With Quote

danmero

View Public Profile for danmero

Find all posts by danmero

08-18-2009

Registered User

7,747, 559

Join Date: Feb 2007

Last Activity: 20 April 2020, 11:28 AM EDT

Location: The Netherlands

Posts: 7,747

Thanks Given: 139

Thanked 559 Times in 520 Posts

Quote:

Originally Posted by Bubnoff

As a bit of an aside:
Is there a better way to describe the regex above ...i.e. without the --posix
option?

Bub

Use:

Code:

if ($1 >= 100000 && $1 < 1000000)

instead of:

Code:

if(/^[0-9]{6}/)

Regards

Franklin52

View Public Profile for Franklin52

Find all posts by Franklin52

08-18-2009

Registered User

50, 1

Join Date: Jun 2009

Last Activity: 19 October 2010, 9:04 PM EDT

Posts: 50

Thanks Given: 6

Thanked 1 Time in 1 Post

Quote:

Originally Posted by Franklin52

Use:

Code:

if ($1 >= 100000 && $1 < 1000000)

instead of:

Code:

if(/^[0-9]{6}/)

Regards

Thanks!

Bub

Bubnoff

View Public Profile for Bubnoff

Find all posts by Bubnoff

Shell Programming and Scripting

awk - need to remove unwanted newlines on match

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Using awk to remove lines from file that match text

Discussion started by: cmccabe

2. Shell Programming and Scripting

awk to remove field and match strings to add text

Discussion started by: cmccabe

3. Shell Programming and Scripting

Awk; pattern match, remove and re write

Discussion started by: TY718

4. UNIX for Dummies Questions & Answers

Using find with awk to remove newlines

Discussion started by: kristinu

5. UNIX for Dummies Questions & Answers

Remove newlines

Discussion started by: Suneelbabu.etl

6. Shell Programming and Scripting

Awk-sed help : to remove first and last line with pattern match:

Discussion started by: rveri

7. Shell Programming and Scripting

How to remove unwanted strings?

Discussion started by: pinpe

8. Shell Programming and Scripting

sed remove newlines and spaces

Discussion started by: rishav

9. Shell Programming and Scripting

perl regexp: no match across newlines

Discussion started by: BatManWSL

10. Shell Programming and Scripting

Remove improperly placed newlines

Discussion started by: mikesimone