Questions on removing unexpected line breaks


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Questions on removing unexpected line breaks
# 8  
Old 09-05-2012
Actually, it's not very robust. The command will mostly hang (loop infinitely) if the last (fourth) field has a null value. Smilie
Code:
awk -F\|  'NF{while(gsub(FS,"&")!=3 || $0 ~ /[|]$/){if(getline p) $0=$0 p;else break}}1' file

This will at least break out of the loop but will not give the desired result, in such a case.

Last edited by elixir_sinari; 09-05-2012 at 04:34 AM..
# 9  
Old 09-05-2012
Yes. lit bit..Smilie

I know mine solution below also looks lit bit lengthy but i think this issue might got resolved with this...

Code:
sed -e 's/^|//g' -e 's/|$//g' file | awk -F\| 'NF<4 || $4 == ""{getline p; $0=$0 FS p}1'

# 10  
Old 09-05-2012
Quote:
Originally Posted by Nekki Basara
I tried to use this command but the result is not as expected

Code:
a|b|c|d
e|f|g||h
i|j|k|l
m|n|o||p
1|2|3|4
5|6|7|
8|a|b|c|d
e|f|g|h

Ah yes.. This should work better
Code:
awk -F\| '$4==x{getline p; $0=$0 p}1' infile

# 11  
Old 09-06-2012
Data

Code:
awk '{while(gsub(/[|]/,"&")!=3 || $0 ~ /[|]$/){getline p;$0=$0 p}}1' file

This code seems working great on the sample!
Could someone kindly explain what it is doing? The regular expression here is kinda difficult for me...Smilie



Code:
awk -F\|  'NF{while(gsub(FS,"&")!=3 || $0 ~ /[|]$/){getline p;$0=$0 p}}1' file

Besides, I get error when I run this code.
I run it on a the sample file called "test1.txt" and get the following error.

Code:
awk: run time error: regular expression compile failed (missing operand)
|
FILENAME="test1.txt" FNR=1 NR=1

Is that I have missed something? I have put in the filename already.

---------- Post updated at 09:15 AM ---------- Previous update was at 08:58 AM ----------

Quote:
Originally Posted by Scrutinizer
Ah yes.. This should work better
Code:
awk -F\| '$4==x{getline p; $0=$0 p}1' infile

This one also works!
Could you please kindly explain what's the meaning of the code?
Smilie

---------- Post updated at 09:29 AM ---------- Previous update was at 09:15 AM ----------

---------- Post updated at 09:50 AM ---------- Previous update was at 09:29 AM ----------




SmilieSmilieSmilie

sorry to all...i am confused...
there are so many solutions!
but each solution yields different results!!!

Here is the real data format for my case
Code:
12345|123456|999|D|1|123|1.2345|12.345|23.4567|||||||
987654|123456|999|O|12|99|2.3456|123.4567|345.6789|||||||Y
987654|123456|999|O|12|99|3.4567|123.4567|345.6789|||||||Y
987654|123456|999|O|12|99|4.5678|12.345|23.4567|||||||Y
987654|123456|999|O|12|99|5.6789|123.4567|345.6789|||||||Y
987654|123456|999|O|12|99|6.7890|123.4567|345.6789|||||||Y
987654|123456|999|H|1|1|34.5678|56.7890|67.8901||
|||||Y
987654|123456|999|E|1|1|2.3456|2.3456|2.34|||||||Y
.
.
.

totally i have 614293 lines in my data



I tried the following code which yielded 595647 lines in the result. (this one give me an error "new-line character seen in unquoted field" when i process it further using another script...Smilie)
Code:
sed -e 's/^|//g' -e 's/|$//g' source_data.csv | awk -F\| 'NF<16 || $16 == ""{getline p; $0=$0 FS p}1'>test1.csv



then i tried the following code which yielded 595433 lines in the result
Code:
awk '{while(gsub(/[|]/,"&")!=15 || $0 ~ /[|]$/){getline p;$0=$0 p}}1' source_data.csv>test2.csv



then i tried the following which yielded 595647 lines (this one also give me an error "new-line character seen in unquoted field" when i process it further using another script...Smilie)
Code:
awk -F\| '$16==x{getline p; $0=$0 p}1' A27.csv >test3.csv




I am totally confused on what's caused of the difference noted...and which code i should use...Smilie

Last edited by Nekki Basara; 09-06-2012 at 11:41 PM..
# 12  
Old 09-07-2012
Quote:
Originally Posted by Nekki Basara
[CODE]

I am totally confused on what's caused of the difference noted...and which code i should use...Smilie
That's why it is advisable to provide real data....Smilie

Please provide input and desired output.
# 13  
Old 09-07-2012
I think the problem lies in the the last field. In your sample if it is empty or there are fewer than x fields, then that means that the lines should be merged. In your actual data that is not always the case; sometimes the lines do not necessarily need to be merged even if the last field contains no value..

Perhaps merging needs to occur only if a line has fewer than X fields OR ( it has X fields AND the last field is empty AND the next line has fewer than X fields ) ?

Last edited by Scrutinizer; 09-07-2012 at 04:38 AM..
# 14  
Old 09-07-2012
If i have counted correctly you have 15 fields in each line. This means there have to be 14 delimiters - if there are fewer, merge the next line to this, otherwise leave it alone.

The following should do what you want:

Code:
sed -n ':start
        /[^|]*\(|[^|]*\)\{14\}/ !{
              N
              s/\n//
              b start
        }
        p' /path/to/infile > /path/to/outfile

This will even connect lines broken into several pieces, but consecutive lines will have to add up to correct ones, otherwise the script will fail to produce correct results.

That is, if a line with 14 fields is followed by a line with 16 fields, it will produce one line with 30 fields, not two with 15 fields each.

If i have miscounted the fields or your file format changes, you can correct this in the counter "\{n\}", which repeats the previous expression "\(|[^|]*\)" (delimiter, followed by optional non-delimiter) n times.

I hope this helps.

bakunin
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removing line breaks inside a field

Hi all, I have a csv input file with total 60 fields and the fields are not enclosed with double quotes.One of the field(50th field) in this file has line breaks in it which results in the row getting split into multiple lines.This is causing my load(to table) to fail.I tried to enforce double... (3 Replies)
Discussion started by: Bobby_2000
3 Replies

2. Linux

Line breaks in mail spool

Hi, I have an issue with the line breaks in the mail spool- /var/mail/user1. I have set up a script to go through the mail spool on one of the users and parse some parts of the mail, however there doesn't seem to exist the regular line endings CR, LF or both in the lines and this is breaking my... (4 Replies)
Discussion started by: night_watcher
4 Replies

3. Shell Programming and Scripting

[BASH] read 'line' issue with leading tabs and virtual line breaks

Heyas I'm trying to read/display a file its content and put borders around it (tui-cat / tui-cat -t(ypwriter). The typewriter-part is a 'bonus' but still has its own flaws, but thats for later. So in some way, i'm trying to rewrite cat using bash and other commands. But sadly it fails on... (2 Replies)
Discussion started by: sea
2 Replies

4. HP-UX

After using @, line breaks for a particular user in shell

Dear Concern, When we using @ sign, line breaks for a particular user in shell. Please advise how to resolve from the problem in HP UX. tabs@tabsdb02:/ccbs/users/tabs$ cat /etc/passwd|grep tabs tabs:RdCgOsmKee7Ps:221:201::/ccbs/users/tabs:/usr/bin/ksh... (3 Replies)
Discussion started by: makauser
3 Replies

5. UNIX for Dummies Questions & Answers

Page breaks and line breaks

Hi All, Need an urgent solution to an issue . We have created a ksh file or shell script which generates 1 DAT file. the DAT file contains extract of a select statement . Now the issue is , when we are executing the ksh file , the output is coimng with page breaks and line breaks . We have... (4 Replies)
Discussion started by: Ayaskant
4 Replies

6. Programming

Clean and keep line breaks

Hello, I want to keep line spaces in comments but clean more then 2 after each. Example: $sentence="This is my first sentence This will be in a new row This will be too in a new row but not separated with 3line breaks just with one "; And i want to... (1 Reply)
Discussion started by: AimyThomas
1 Replies

7. Shell Programming and Scripting

Remove line breaks after a match

I need to remove all line breaks in a document after a match, until there is a blank line. Example below, after the match "THE GREEN TABLE" remove line breaks until a blank line. Then, after the match "THE BLUE TABLE" do the same. Before: THE GREEN TABLE Lorem ipsum dolor sit amet,... (14 Replies)
Discussion started by: dockline
14 Replies

8. Shell Programming and Scripting

Help with wc and line breaks

Hi everyone, I have gone through the forum trying to find an answer to this question but was unsuccessful. I am hoping that someone can help me with this please. I am trying to get my script to recognise line breaks from a file and to give me a result for wc of each line. So basically, if you... (7 Replies)
Discussion started by: stargazerr
7 Replies

9. Shell Programming and Scripting

any better way to remove line breaks

Hi, I got some log files which print the whole xml message in separate lines: e.g. 2008-10-01 14:21:44,561 INFO do something 2008-10-01 14:21:44,561 INFO print xml : <?xml version="1.0" encoding="UTF-8"?> <a> <b>my data</b> </a> 2008-10-01 14:21:44,563 INFO do something again I want... (3 Replies)
Discussion started by: csmklee
3 Replies

10. Shell Programming and Scripting

Removing line breaks from a shell variable

Here is my snippet of code... getDescription() { DESCRIPTION=$(dbaccess dncsdb - << ! 2>/dev/null|sed -e 's/hctt_description//' -e '/^$/ d'|tr -d '\r' select hct_type.hctt_description from hct_type,hct_profile where hct_type.hctt_id=hct_profile.hctt_id and... (5 Replies)
Discussion started by: lyonsd
5 Replies
Login or Register to Ask a Question