sed to remove newline chars based on pattern mis-match


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting sed to remove newline chars based on pattern mis-match
# 1  
Old 07-07-2016
sed to remove newline chars based on pattern mis-match

Greetings Experts,
I am in AIX; I have a file generated through awk after processing the input files. Now I need to replace or remove the new-line characters on all lines that doesn't have a ; which is the last character on the line. I tried to use sed 's/\n/ /g' After checking through the forums got to know that this sed will not work as it will remove the new-line character during reading the line (my assumption). As per the post in unix - How can I replace a newline (\n) using sed? - Stack Overflow, tried to use
Code:
sed -e ':a' -e 'N' -e '$!ba' -e ' /;/! s/\n/ /g'

and ended with compatibility issues. As per my understanding on sed, sed '/;/! s#\n# #g' might resolve, but I am facing compatibility issues; After some search replaced ! with b as sed '/;/b s#\n# #g' and this script too faced combatibility issues.

Sample file contents:
Code:
Table1@Table2@SELECT COL1,
COL2,COL3, 
COL4,COL5 FROM
TABLE1 INNER JOIN TABLE2 ON
COL1=COL21 AND COL2=COL22;
Table3@Table4@SELECT COL1,
COL2, COL3,
............

Expected output:
Code:
Table1@Table2@SELECT COL1, COL2,COL3, COL4,COL5 FROM TABLE1 INNER JOIN TABLE2 ON COL1=COL21 AND COL2=COL22;
Table3@Table4@SELECT COL1, COL2,COL3, ......;

I need the new-lines to be eliminated on all rows that doesn't contain ; so that I can have them in a sinlge cell of the excel sheet and then split them based on @ delimited and do a lookup to get the third column.
Code:
cat output_file > excel_lookup_ready.xlsx

I have done it through awk as (not replace with space; but remove newline)
Code:
awk -F "@" { if ($0 ~ /;/) {print $0 > output_file_awk.txt} else { printf $0 > output_file_awk.txt }}' output_file.txt 
cat output_file_awk.txt > excel_lookup_ready.xlsx

and the final file has the contents as what I need;
I am not able to achieve it through sed Can you please help me..
Thank you for your time..
# 2  
Old 07-07-2016
Why do you insist on sed and aren't happy with the awk solution?

Try
Code:
sed ':L; /;$/bX; N; bL; :X; s/\n/ /g' file
Table1@Table2@SELECT COL1, COL2,COL3,  COL4,COL5 FROM TABLE1 INNER JOIN TABLE2 ON COL1=COL21 AND COL2=COL22;
Table3@Table4@SELECT COL1, COL2, COL3, COL4,COL5 FROM TABLE1 INNER JOIN TABLE2 ON COL1=COL21 AND COL2=COL22;

This User Gave Thanks to RudiC For This Post:
# 3  
Old 07-07-2016
Learning awk and sed through the valuable posts in the forum. Just curious on how to achieve this through sed. Thank you RudiC, request you to please explain the code..
# 4  
Old 07-07-2016
It's appending N ext lines and branching back to label L until it finds the ; , then branches to X , replaces \n with spaces and prints the resulting line.
This User Gave Thanks to RudiC For This Post:
# 5  
Old 07-07-2016
Your sed needs to substitute all \n where the preceding characters is not a ;
Code:
sed -e ':a' -e 'N' -e '$!ba' -e 's/\([^;]\)\n/\1/g' file

The file must fit into memory.
The following does an early replacement; only the output line must fit into memory
Code:
sed -e ':a' -e '/;$/b' -e '$b' -e 'N; s/\n//; ba' file

This User Gave Thanks to MadeInGermany For This Post:
# 6  
Old 07-07-2016
You might - especially as a beginner - want to make life easier for you and write real sed programs instead of one-liners only experts can decipher. It is like starting to learn the double toe-loop when you are skating for the first time in your life.

sed is basically a rule-based language: you describe types of lines (through regexps) and the actions which should be done once such a line is encountered. Now, start describing what types of lines you have and what you want to do once you encounter such a line. Your secription probably looks like this:

1) If a line doesn't end with a ";" char store the line and continue with the next.
2) If the line does end with a ";" char change the newlines in the lines stored so far to spaces (thus making one big line of the collected lines), output that and clear the storage, then continue with the next line.


Now, just in case the last line doesn't end with a ";" (probably it should, but safe is better than sorry) there might be a third rule:

3) if the line is the last line treat it like rule2 but instead of continuing just exit.

Now let us start with the implementation: we start with rule 2, because this way we will immediately see some effect. In the following i will always use your sample input, with an added ";" as the last character in the last line:

Code:
sed '/;$/ {
             s/\n/ /g
             p
          }' /path/to/file
COL1=COL21 AND COL2=COL22;
............;

We might want to allow for trailing white space at the end of ";" lines so we slightly modify ("<b>" and "<t>" are literal blank and tab characters):

Code:
sed '/;[<b><t>]*$/ {
             s/\n/ /g
             p
          }' /path/to/file
COL1=COL21 AND COL2=COL22;
............;

We didn't "recall" anything yet because we have nothing stored, but we will do that now as we implement rule 1. There is a usable buffer in sed, which is called "hold space", in opposition to the "pattern space", which is what you have read in and are working on. You cannot directly modify the hold space, but you can add the current pattern space to it, replace it with the pattern space or add its contents to the pattern space.

We are going to implement rule 1 by adding the content of the pattern space to the hold space. In our handling of the ";"-endling lines (rule 2) we will pull that hold space backu into the pattern space. See the sed man page for the commands:

Code:
sed '/;[<b><t>]*$/ !{
             H
             d
          }
     /;[<b><t>]*$/ {
             H
             d
             x
             s/\n/ /g
             p
          }' /path/to/file
Table1@Table2@SELECT COL1, COL2,COL3, COL4,COL5 FROM TABLE1 INNER JOIN TABLE2 ON COL1=COL21 AND COL2=COL22;
Table3@Table4@SELECT COL1, COL2,COL3, ......;

The sequence "H->d->x" is a little trick: add the line to the hold space, then clear it (from the pattern space), finally exchange hold space and pattern space, thus having back everything in the pattern space and clearing the hold space at the same time.

Now, the last line (rule 3). You probably can do that yourself already. Only now we drop the final ";" i have added to the sample input and restore it to its original as you posted it. The last transformation means: replace an optional ";" (along with eventually trailing white space) with a single ";". This way we remove trailing whitespace and at the same time make sure there is a ";" at the end. I have also added the trailing-white space removal to the lines ending in ";":

Code:
sed '/;[<b><t>]*$/ !{
             H
             d
          }
     /;[<b><t>]*$/ {
             H
             d
             x
             s/\n/ /g
             s/[<b><t>]*//
             p
          }
     $$ {
             H
             d
             x
             s/\n/ /g
             s/;*[<b><t>]*$/;/
             p
          }' /path/to/file
Table1@Table2@SELECT COL1, COL2,COL3, COL4,COL5 FROM TABLE1 INNER JOIN TABLE2 ON COL1=COL21 AND COL2=COL22;
Table3@Table4@SELECT COL1, COL2,COL3, ......;

I hope this helps.

bakunin

Last edited by bakunin; 07-07-2016 at 06:05 PM..
These 2 Users Gave Thanks to bakunin For This Post:
# 7  
Old 07-08-2016
An alternative in Perl:
Code:
perl -ple 'BEGIN{$\=$/=";\n"} s/\n/ /g' chill3chee.file

This User Gave Thanks to Aia For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

sed - print only the chars that match a given set in a string

For a given string that may contain any ASCII chars, i.e. that matches .*, find and print only the chars that are in a given subset. The string could also have numbers, uppercase, special chars such as ~!@#$%^&*(){}\", whatever a user could type in without going esoteric For simplicity take... (1 Reply)
Discussion started by: naderra
1 Replies

2. UNIX for Dummies Questions & Answers

Mac OS X sed, add newline after pattern

Hi, I've been trying to work out how to add a new line to a file when the pattern matches .dmg. I've been searching Google but yet not found a working solution. Help would be appreciated... (9 Replies)
Discussion started by: pburge
9 Replies

3. Shell Programming and Scripting

Awk-sed help : to remove first and last line with pattern match:

awk , sed Experts, I want to remove first and last line after pattern match "vg" : I am trying : # sed '1d;$d' works fine , but where the last line is not having vg entry it is deleting one line of data. - So it should check for the pattern vg if present , then it should delete the line ,... (5 Replies)
Discussion started by: rveri
5 Replies

4. Shell Programming and Scripting

Remove duplicate chars and sort string [SED]

Hi, INPUT: DCBADD OUTPUT: ABCD The SED script should alphabetically sort the chars in the string and remove the duplicate chars. (5 Replies)
Discussion started by: jds93
5 Replies

5. Shell Programming and Scripting

remove newline between two string with sed command in unix shellscript

I have a file (test.dat) which contains data like this 459|199811047|a |b |shan kar|ooty| 460|199811047|a |bv |gur u|cbe| but I need it like: 459|199811047|a |b |shankar|ooty| 460|199811047|a |b |guru|cbe| While reading the data from this file, I don't want to remove newline from the end of... (4 Replies)
Discussion started by: jcrshankar
4 Replies

6. Shell Programming and Scripting

sed/awk remove newline

Hi, I have input file contains sql queries i need to eliminate newlines from it. when i open it vi text editor and runs :%s/'\n/'/g it provides required result. but when i run sed command from shell prompt it doesn't impact outfile is still same as inputfile. shell] sed -e... (6 Replies)
Discussion started by: mirfan
6 Replies

7. Shell Programming and Scripting

remove newline chars in each record of file

Hi, I have a fixed width file with record length 10. I need to remove multiple newline characters present in each record. EX: af\n72/7\n s\n3\nad\n 2\n\n33r\n In the above file I want to remove new lines in red color(\n) but not (\n) Please provide me a solution. Thanks, Sri (1 Reply)
Discussion started by: srilaxmi
1 Replies

8. Shell Programming and Scripting

SED: how to remove newline after pattern?

Hi, I have the following XML not well-indented code: <hallo >this is a line </hallo> So I need to remove the newline. This syntax finds what I need to correct, but I don't know how to remove the newline after my pattern: sed 's/<.*$/&/' How can I subtract the newline after my... (1 Reply)
Discussion started by: nico.ben
1 Replies

9. Shell Programming and Scripting

sed ksh remove newline between 2 containers

If we assume that each line between the {} container is an XML document then What I want to remove the newline character from all lines within each container to have one XMDL document per line I wrote a bit of sed after trawling the web: e.g. #!/bin/sed -nf H /}/ { x s/\n//g p... (3 Replies)
Discussion started by: JamesJSC
3 Replies

10. Shell Programming and Scripting

grep and sed to find a pattern and add newline

Hello All, I have log file the result from a multithreaded process. So when a process finishes it will write to this log file as 123 rows merged. The issue is sometimes the processess finish at the same time or write to the file at the same time as 123 rows merged.145 rows merged. At... (5 Replies)
Discussion started by: ssikhar
5 Replies
Login or Register to Ask a Question