awk - need to remove unwanted newlines on match


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk - need to remove unwanted newlines on match
# 1  
Old 08-18-2009
awk - need to remove unwanted newlines on match

Context:
I need to remove unwanted newlines from a data file listing books and associated data. Here is a sample listing ( line numbers included ):
Code:
1 360762| Skip-beat! 14 /| 9781421517544| nb        | 2008.| Nakamura, Yoshiki.| NAKAMUR | Kyoko Mogami followed 
2 her true love Sho to Tokyo to support him while he made it big as an idol. But he's casting her out now that he's famous. 
Kyoko won't suffer in silence--she's going to get her sweet revenge by beating Sho in show biz.
3 361018| Angel numbers 101 : the meaning of 111, 123, 444, and other number sequences /| 1401920012| b         | 2008.| 
Virtue, Doreen, 1958-| 133.3359 VIRTUE |


I am using the following, found in these forums, for removing unwanted newlines:

Code:
awk 'NR==1{s=$0;next} /^[a-zA-Z]|^;/{s=s$0;next} {print s;s=$0} END{if(s)print s}' $RAW_DATA > $UNSPLIT

However, it is inexact and leaves some lines with punctuation and dates unresolved.

It needs to:
Find lines in which the first field DOES NOT contain precisely 6 digits and append them to the line above.

Thanks ~

Bub
# 2  
Old 08-18-2009
When you say "( line numbers included ):", do you mean you added for readability? If you just post what the output shoud look like,it will be easier.
# 3  
Old 08-18-2009
Code:
# awk -F\| '{if(NR==1){printf}else{if($1*1){printf "\n%s",$0}else{printf " %s",$0}}}' file

Similar problem : to get two almost identical rows into one - The UNIX and Linux Forums
# 4  
Old 08-18-2009
I was using gvim and the line numbers didn't copy over so I added them. I mentioned that to let people know it wasn't part of the data.

Sorry for the confusion.

---------- Post updated at 10:16 AM ---------- Previous update was at 09:59 AM ----------

Thanks Danmero ...but I get this error.

awk: (FILENAME=All_Items.out FNR=1) fatal: printf: no arguments

---------- Post updated at 10:58 AM ---------- Previous update was at 10:16 AM ----------

Thanks for the link to the other post Danmero. That actually turned out to
be what I looking for. I adjusted it to my situation as follows:

Code:
awk -F\| --posix '{if(/^[0-9]{6}/){if(NR>1){printf "%s\n",$0}else{printf}}}' All_Items.out > tester

I'm not sure I understand how your example on this thread was supposed to work though.

As a bit of an aside:
Is there a better way to describe the regex above ...i.e. without the --posix
option?

Bub
# 5  
Old 08-18-2009
  1. Use GNU awk (gawk), New awk (nawk) or POSIX awk (/usr/xpg4/bin/awk) on Solaris.
  2. Works for me
  3. To keep the forums high quality for all users, please take the time to format your posts correctly.
    1. Use Code Tags when you post any code or data samples so others can easily read your code.
      You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags [code] and [/code] by hand.)
    2. Avoid adding color or different fonts and font size to your posts.
      Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red.
    3. Be careful when you cut-and-paste, edit any odd characters and make sure all links are working property.

    Thank You.

    The UNIX and Linux Forums
    Reply With Quote
# 6  
Old 08-18-2009
Quote:
Originally Posted by Bubnoff
As a bit of an aside:
Is there a better way to describe the regex above ...i.e. without the --posix
option?

Bub
Use:

Code:
if ($1 >= 100000 && $1 < 1000000)

instead of:

Code:
if(/^[0-9]{6}/)

Regards
# 7  
Old 08-18-2009
Quote:
Originally Posted by Franklin52
Use:

Code:
if ($1 >= 100000 && $1 < 1000000)

instead of:

Code:
if(/^[0-9]{6}/)

Regards

Thanks!

Bub
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Using awk to remove lines from file that match text

I am trying to remove each line in which $2 is FP or RFP. I believe the below will remove one instance but not both. Thank you :). file 12 123 FP 11 10 RFP awk awk -F'\t' ' $2 != "FP"' file desired output 12 11 (6 Replies)
Discussion started by: cmccabe
6 Replies

2. Shell Programming and Scripting

awk to remove field and match strings to add text

In file1 field $18 is removed.... column header is "Otherinfo", then each line in file1 is used to search file2 for a match. When a match is found the last four strings in file2 are copied to file1. Maybe: cut -f1-17 file1 and then match each line to file2 file1 Chr Start End ... (6 Replies)
Discussion started by: cmccabe
6 Replies

3. Shell Programming and Scripting

Awk; pattern match, remove and re write

the following pattern match works correctly for me awk '/name="Fruits"/{f=1;next} /"name=Vegetables"/{f=0} f' filename This works well for me. Id like to temporarily move the match out of the file ( > newfile) and be able to stick it back in the same place at a later time. Is this... (7 Replies)
Discussion started by: TY718
7 Replies

4. UNIX for Dummies Questions & Answers

Using find with awk to remove newlines

I want to list all html files present in a directory tree, the remove the newline and get one string with a space between files find /home/chrisd/Desktop/seg/geohtml/ -name '*.html' | awk BEGIN{FS=\r} '{print}' ---------- Post updated at 06:47 PM ---------- Previous update was at 06:25 PM... (5 Replies)
Discussion started by: kristinu
5 Replies

5. UNIX for Dummies Questions & Answers

Remove newlines

Hi buddy's my file are like this: s.no,name,band,sal 1,"suneel",,10 2,"bargav sand",,20 30," ebdug gil",,4 but i want s.no,name,band,sal 1,"suneel",,10 2,"bargav sand",,20 30,"ebdug gil",,4 any command or Shell script for this. please help me it's urgent to implement (33 Replies)
Discussion started by: Suneelbabu.etl
33 Replies

6. Shell Programming and Scripting

Awk-sed help : to remove first and last line with pattern match:

awk , sed Experts, I want to remove first and last line after pattern match "vg" : I am trying : # sed '1d;$d' works fine , but where the last line is not having vg entry it is deleting one line of data. - So it should check for the pattern vg if present , then it should delete the line ,... (5 Replies)
Discussion started by: rveri
5 Replies

7. Shell Programming and Scripting

How to remove unwanted strings?

Hi Guys, Can someone give me a hand on how I can remove unwanted strings like "<Number>" and "</Number>" and retain only the numbers from the input file below. INPUT FILE: <Number>10050000</Number> <Number>1001340001</Number> <Number>1001750002</Number> <Number>100750003</Number>... (8 Replies)
Discussion started by: pinpe
8 Replies

8. Shell Programming and Scripting

sed remove newlines and spaces

Hi all, i am getting count from oracle 11g by spooling it to a file. Now there are some newline characters and blank spaces i need to remove these. pl provide me a awk/sed solution. the spooled file is attached. i tried this.. but not getting req o/p (6 Replies)
Discussion started by: rishav
6 Replies

9. Shell Programming and Scripting

perl regexp: no match across newlines

Hi. Here's a tricky one (at least to me): I have a file named theFile.txt (UTF-8) that contains the following: a b cWhen I execute perl -pe 's|a.*c|d|sg' theFile.txtin bash 3.2 on MAC OS X 10.6, I get no match, i.e. the result is a b cagain. Any clues why? (2 Replies)
Discussion started by: BatManWSL
2 Replies

10. Shell Programming and Scripting

Remove improperly placed newlines

Hello, there. I have a file that's a horrible, horrible mess. (Basically, it's an export from a firewall config.) The people who generated the file didn't think that putting a newline in the middle of a hostname would ever be a problem. It is. Here's an example of the stuff in the file: ... (2 Replies)
Discussion started by: mikesimone
2 Replies
Login or Register to Ask a Question