sed to remove newline chars based on pattern mis-match

07-07-2016

Registered User

30, 1

Join Date: Apr 2015

Last Activity: 1 December 2019, 2:44 PM EST

Location: India

Posts: 30

Thanks Given: 25

Thanked 1 Time in 1 Post

sed to remove newline chars based on pattern mis-match

Greetings Experts,
I am in AIX; I have a file generated through awk after processing the input files. Now I need to replace or remove the new-line characters on all lines that doesn't have a ; which is the last character on the line. I tried to use sed 's/\n/ /g' After checking through the forums got to know that this sed will not work as it will remove the new-line character during reading the line (my assumption). As per the post in unix - How can I replace a newline (\n) using sed? - Stack Overflow, tried to use

Code:

sed -e ':a' -e 'N' -e '$!ba' -e ' /;/! s/\n/ /g'

and ended with compatibility issues. As per my understanding on sed, sed '/;/! s#\n# #g' might resolve, but I am facing compatibility issues; After some search replaced ! with b as sed '/;/b s#\n# #g' and this script too faced combatibility issues.

Sample file contents:

Code:

Table1@Table2@SELECT COL1,
COL2,COL3, 
COL4,COL5 FROM
TABLE1 INNER JOIN TABLE2 ON
COL1=COL21 AND COL2=COL22;
Table3@Table4@SELECT COL1,
COL2, COL3,
............

Expected output:

Code:

Table1@Table2@SELECT COL1, COL2,COL3, COL4,COL5 FROM TABLE1 INNER JOIN TABLE2 ON COL1=COL21 AND COL2=COL22;
Table3@Table4@SELECT COL1, COL2,COL3, ......;

I need the new-lines to be eliminated on all rows that doesn't contain ; so that I can have them in a sinlge cell of the excel sheet and then split them based on @ delimited and do a lookup to get the third column.

Code:

cat output_file > excel_lookup_ready.xlsx

I have done it through awk as (not replace with space; but remove newline)

Code:

awk -F "@" { if ($0 ~ /;/) {print $0 > output_file_awk.txt} else { printf $0 > output_file_awk.txt }}' output_file.txt 
cat output_file_awk.txt > excel_lookup_ready.xlsx

and the final file has the contents as what I need;
I am not able to achieve it through sed Can you please help me..
Thank you for your time..

chill3chee

View Public Profile for chill3chee

Find all posts by chill3chee

07-07-2016

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Why do you insist on sed and aren't happy with the awk solution?

Try

Code:

sed ':L; /;$/bX; N; bL; :X; s/\n/ /g' file
Table1@Table2@SELECT COL1, COL2,COL3,  COL4,COL5 FROM TABLE1 INNER JOIN TABLE2 ON COL1=COL21 AND COL2=COL22;
Table3@Table4@SELECT COL1, COL2, COL3, COL4,COL5 FROM TABLE1 INNER JOIN TABLE2 ON COL1=COL21 AND COL2=COL22;

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

07-07-2016

Registered User

30, 1

Join Date: Apr 2015

Last Activity: 1 December 2019, 2:44 PM EST

Location: India

Posts: 30

Thanks Given: 25

Thanked 1 Time in 1 Post

Learning awk and sed through the valuable posts in the forum. Just curious on how to achieve this through sed. Thank you RudiC, request you to please explain the code..

chill3chee

View Public Profile for chill3chee

Find all posts by chill3chee

07-07-2016

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

It's appending N ext lines and branching back to label L until it finds the ; , then branches to X , replaces \n with spaces and prints the resulting line.

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

07-07-2016

Registered User

5,091, 1,931

Join Date: May 2012

Last Activity: 15 July 2020, 4:46 AM EDT

Location: Simplicity

Posts: 5,091

Thanks Given: 565

Thanked 1,931 Times in 1,668 Posts

Your sed needs to substitute all \n where the preceding characters is not a ;

Code:

sed -e ':a' -e 'N' -e '$!ba' -e 's/\([^;]\)\n/\1/g' file

The file must fit into memory.
The following does an early replacement; only the output line must fit into memory

Code:

sed -e ':a' -e '/;$/b' -e '$b' -e 'N; s/\n//; ba' file

This User Gave Thanks to MadeInGermany For This Post:

MadeInGermany

View Public Profile for MadeInGermany

Find all posts by MadeInGermany

07-07-2016

Registered User

6,384, 2,214

Join Date: May 2005

Last Activity: 28 October 2019, 4:59 PM EDT

Location: In the leftmost byte of /dev/kmem

Posts: 6,384

Thanks Given: 143

Thanked 2,214 Times in 1,548 Posts

You might - especially as a beginner - want to make life easier for you and write real sed programs instead of one-liners only experts can decipher. It is like starting to learn the double toe-loop when you are skating for the first time in your life.

sed is basically a rule-based language: you describe types of lines (through regexps) and the actions which should be done once such a line is encountered. Now, start describing what types of lines you have and what you want to do once you encounter such a line. Your secription probably looks like this:

1) If a line doesn't end with a ";" char store the line and continue with the next.
2) If the line does end with a ";" char change the newlines in the lines stored so far to spaces (thus making one big line of the collected lines), output that and clear the storage, then continue with the next line.

Now, just in case the last line doesn't end with a ";" (probably it should, but safe is better than sorry) there might be a third rule:

3) if the line is the last line treat it like rule2 but instead of continuing just exit.

Now let us start with the implementation: we start with rule 2, because this way we will immediately see some effect. In the following i will always use your sample input, with an added ";" as the last character in the last line:

Code:

sed '/;$/ {
             s/\n/ /g
             p
          }' /path/to/file
COL1=COL21 AND COL2=COL22;
............;

We might want to allow for trailing white space at the end of ";" lines so we slightly modify ("<b>" and "<t>" are literal blank and tab characters):

Code:

sed '/;[<b><t>]*$/ {
             s/\n/ /g
             p
          }' /path/to/file
COL1=COL21 AND COL2=COL22;
............;

We didn't "recall" anything yet because we have nothing stored, but we will do that now as we implement rule 1. There is a usable buffer in sed, which is called "hold space", in opposition to the "pattern space", which is what you have read in and are working on. You cannot directly modify the hold space, but you can add the current pattern space to it, replace it with the pattern space or add its contents to the pattern space.

We are going to implement rule 1 by adding the content of the pattern space to the hold space. In our handling of the ";"-endling lines (rule 2) we will pull that hold space backu into the pattern space. See the sed man page for the commands:

Code:

sed '/;[<b><t>]*$/ !{
             H
             d
          }
     /;[<b><t>]*$/ {
             H
             d
             x
             s/\n/ /g
             p
          }' /path/to/file
Table1@Table2@SELECT COL1, COL2,COL3, COL4,COL5 FROM TABLE1 INNER JOIN TABLE2 ON COL1=COL21 AND COL2=COL22;
Table3@Table4@SELECT COL1, COL2,COL3, ......;

The sequence "H->d->x" is a little trick: add the line to the hold space, then clear it (from the pattern space), finally exchange hold space and pattern space, thus having back everything in the pattern space and clearing the hold space at the same time.

Now, the last line (rule 3). You probably can do that yourself already. Only now we drop the final ";" i have added to the sample input and restore it to its original as you posted it. The last transformation means: replace an optional ";" (along with eventually trailing white space) with a single ";". This way we remove trailing whitespace and at the same time make sure there is a ";" at the end. I have also added the trailing-white space removal to the lines ending in ";":

Code:

sed '/;[<b><t>]*$/ !{
             H
             d
          }
     /;[<b><t>]*$/ {
             H
             d
             x
             s/\n/ /g
             s/[<b><t>]*//
             p
          }
     $$ {
             H
             d
             x
             s/\n/ /g
             s/;*[<b><t>]*$/;/
             p
          }' /path/to/file
Table1@Table2@SELECT COL1, COL2,COL3, COL4,COL5 FROM TABLE1 INNER JOIN TABLE2 ON COL1=COL21 AND COL2=COL22;
Table3@Table4@SELECT COL1, COL2,COL3, ......;

I hope this helps.

bakunin

Last edited by bakunin; 07-07-2016 at 06:05 PM..

These 2 Users Gave Thanks to bakunin For This Post:

bakunin

View Public Profile for bakunin

Find all posts by bakunin

07-08-2016

Registered User

1,781, 705

Join Date: May 2008

Last Activity: 10 November 2021, 5:38 PM EST

Posts: 1,781

Thanks Given: 62

Thanked 705 Times in 653 Posts

An alternative in Perl:

Code:

perl -ple 'BEGIN{$\=$/=";\n"} s/\n/ /g' chill3chee.file

This User Gave Thanks to Aia For This Post:

Aia

View Public Profile for Aia

Find all posts by Aia

Shell Programming and Scripting

sed to remove newline chars based on pattern mis-match

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

sed - print only the chars that match a given set in a string

Discussion started by: naderra

2. UNIX for Dummies Questions & Answers

Mac OS X sed, add newline after pattern

Discussion started by: pburge

3. Shell Programming and Scripting

Awk-sed help : to remove first and last line with pattern match:

Discussion started by: rveri

4. Shell Programming and Scripting

Remove duplicate chars and sort string [SED]

Discussion started by: jds93

5. Shell Programming and Scripting

remove newline between two string with sed command in unix shellscript

Discussion started by: jcrshankar

6. Shell Programming and Scripting

sed/awk remove newline

Discussion started by: mirfan

7. Shell Programming and Scripting

remove newline chars in each record of file

Discussion started by: srilaxmi

8. Shell Programming and Scripting

SED: how to remove newline after pattern?

Discussion started by: nico.ben

9. Shell Programming and Scripting

sed ksh remove newline between 2 containers

Discussion started by: JamesJSC

10. Shell Programming and Scripting

grep and sed to find a pattern and add newline

Discussion started by: ssikhar