Remove newline character or join the broken record


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove newline character or join the broken record
# 1  
Old 04-18-2012
Remove newline character or join the broken record

Hi,
I have a very huge file, around 1GB of data.
I want to remove the newline characters in the file but not preceded by the original end delimiter {}
sample data will look like this
Code:
1[|]2[|]3[|]4[|]5[|]6[|]7
[|]a[|]b[|]c[|]d{}
1[|]2[|]3[|]4[|]sss[|]ss
as6[|]7
[|]a[|]b[|]c[|]d{}
1[|]2[|]dsad3[|]dad
4[|]sdad5[|]6[|]7
[|]a[|]b[|dsad]c[|]dsadd{}

this should look like this
Code:
1[|]2[|]3[|]4[|]5[|]6[|]7[|]a[|]b[|]c[|]d{}
1[|]2[|]3[|]4[|]sss[|]ssas6[|]7[|]a[|]b[|]c[|]d{}
1[|]2[|]dsad3[|]dad4[|]sdad5[|]6[|]7[|]a[|]b[|dsad]c[|]dsadd{}

i tried the below perl command, since the file is too huge it is throwing an error
Code:
perl -0lne 's/\n//g;print "$1\n" while /(.*?{})/g' sourcefile

error
Code:
Substitution loop at -e line 1, <> chunk 1.

sed command will be better which will remove all the '\n' but not '{}\n'
Please hlp me with any clue's
# 2  
Old 04-18-2012
Hi

Code:
awk '/{}$/{print;next}{printf $0;}' file

Guru.
# 3  
Old 04-18-2012
Code:
awk ' 
{ a = a $0 }
/\{\}$/ { print a; a = ""; next }' FILE

# 4  
Old 04-18-2012
Thanks Yazu... It Worked...

---------- Post updated at 03:22 AM ---------- Previous update was at 03:20 AM ----------

@guruprasadpr
It worked till some line. and then thrown an error like below
Code:
awk: cmd. line:1: (FILENAME=xaa FNR=16518) fatal: not enough arguments to satisfy format string
        `Some record{}'
                                                                                                                                                                                                                                                                                                                                                   ^ ran out for this one

# 5  
Old 04-18-2012
Quote:
Originally Posted by ratheeshjulk
Thanks Yazu... It Worked...

---------- Post updated at 03:22 AM ---------- Previous update was at 03:20 AM ----------

@guruprasadpr
It worked till some line. and then thrown an error like below
............... ^ ran out for this one
[/CODE]
try like this
Code:
# awk '/{}$/{print;next}{printf "%s",$0;}' file

and 2.solution
Code:
# awk -vFx='{}' '{if(substr($0,length-1,length)!=Fx)a=a $0;else {a=a $0;print a;a=""}}' file

regards
ygemici
# 6  
Old 04-18-2012
I am trying to understand this code. please help

I have removed the next statement and trying to understand how the code works.. below i have given i how i have understood this code. please correct me if i am wrong.

Code:
bash-3.00# awk '/{}$/{print}{printf "%s ",$0;}' test.txt

1[|]2[|]3[|]4[|]5[|]6[|]7 [|]a[|]b[|]c[|]d{}
[|]a[|]b[|]c[|]d{} 1[|]2[|]3[|]4[|]sss[|]ss as6[|]7 [|]a[|]b[|]c[|]d{}
[|]a[|]b[|]c[|]d{} 1[|]2[|]dsad3[|]dad 4[|]sdad5[|]6[|]7 [|]a[|]b[|dsad]c[|]dsadd{}
[|]a[|]b[|dsad]c[|]dsadd{}


so here the pattern statement is /{}$/
the action statement is {print} ( This print statement will execute if the pattern matched successfully. else this wont executed )
{printf "%s ",$0;} ( This statement will execute throught the script. it doesnt come under any condition )

Code:
line 1 : 1[|]2[|]3[|]4[|]5[|]6[|]7

the pattern {}$ doesnt match this line so the {print} statement doesnt execute.
But the {printf "%s ",$0;} executes and prints the same line1 as thats the current record in processing.

Now the first line has been read and awk moves on to the next line

Code:
line 2 : [|]a[|]b[|]c[|]d{}

the pattern {}$ does match and {print} statement gets executed.

Now awk should again print the line 2 as there is {printf "%s ",$0;} right ? how does it move to next record ?
# 7  
Old 04-18-2012
Quote:
Originally Posted by chidori
I have removed the next statement and trying to understand how the code works.. below i have given i how i have understood this code. please correct me if i am wrong.

Code:
bash-3.00# awk '/{}$/{print}{printf "%s ",$0;}' test.txt

1[|]2[|]3[|]4[|]5[|]6[|]7 [|]a[|]b[|]c[|]d{}
[|]a[|]b[|]c[|]d{} 1[|]2[|]3[|]4[|]sss[|]ss as6[|]7 [|]a[|]b[|]c[|]d{}
[|]a[|]b[|]c[|]d{} 1[|]2[|]dsad3[|]dad 4[|]sdad5[|]6[|]7 [|]a[|]b[|dsad]c[|]dsadd{}
[|]a[|]b[|dsad]c[|]dsadd{}


so here the pattern statement is /{}$/
the action statement is {print} ( This print statement will execute if the pattern matched successfully. else this wont executed )
{printf "%s ",$0;} ( This statement will execute throught the script. it doesnt come under any condition )

Code:
line 1 : 1[|]2[|]3[|]4[|]5[|]6[|]7

the pattern {}$ doesnt match this line so the {print} statement doesnt execute.
But the {printf "%s ",$0;} executes and prints the same line1 as thats the current record in processing.

Now the first line has been read and awk moves on to the next line

Code:
line 2 : [|]a[|]b[|]c[|]d{}

the pattern {}$ does match and {print} statement gets executed.

Now awk should again print the line 2 as there is {printf "%s ",$0;} right ? how does it move to next record ?
Code:
# awk '/{}$/{print}{printf "%s ",$0;}'

first execute [1.LINE]
/{}$/ [ you can think there is a `if` ] the pattern matched successfully then print (with default ORS="\n")
result --> NULL (because 1.line does not match with '{}$')
[ you can think there is an `else` ] print "without newline" (printf ..)
result --> "1[|]2[|]3[|]4[|]5[|]6[|]7 "

second execute [2.LINE]
/{}$/ pattern match then print
result -->[|]a[|]b[|]c[|]d{}

lastresut
"1[|]2[|]3[|]4[|]5[|]6[|]7 "[|]a[|]b[|]c[|]d{}
.............

so it goes on like this..

regards
ygemici
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Remove newline character from column spread over multiple lines in a file

Hi, I came across one issue recently where output from one of the columns of the table from where i am creating input file has newline characters hence, record in the file is spread over multiple lines. Fields in the file are separated by pipe (|) delimiter. As header will never have newline... (4 Replies)
Discussion started by: Prathmesh
4 Replies

2. Shell Programming and Scripting

How to remove newline character if it is the only character in the entire file.?

I have a file which comes every day and the file data look's as below. Vi abc.txt a|b|c|d\n a|g|h|j\n Some times we receive the file with only a new line character in the file like vi abc.txt \n (8 Replies)
Discussion started by: rak Kundra
8 Replies

3. Shell Programming and Scripting

Remove last newline character..

Hi all.. I have a text file which looks like below: abcd efgh ijkl (blank space) I need to remove only the last (blank space) from the file. When I try wc -l the file name,the number of lines coming is 3 only, however blank space is there in the file. I have tried options like... (14 Replies)
Discussion started by: Sathya83aa
14 Replies

4. Shell Programming and Scripting

Remove newline character between two delimiters

hi i am having delimited .dat file having content like below. test.dat(5 line of records) ====== PT2~Stag~Pt2 Stag Test. Updated~PT2 S T~Area~~UNCEF R20~~2012-05-24 ~2014-05-24~~ PT2~Stag y~Pt2 Stag Test. Updated~PT2 S T~Area~METR~~~2012-05-24~2014-05-24~~test PT2~Pt2 Stag Test~~PT2 S... (4 Replies)
Discussion started by: sushine11
4 Replies

5. Shell Programming and Scripting

Remove \n <newline> character inside the records.

Hi, In my file, I have '\n' characters inside a single record. Because of this, a single records appears in many lines and looks like multiple records. In the below file. File 1 ==== 1,nmae,lctn,da\n t 2,ghjik,o\n ut,de\n fk Expected output after the \n removed File 2 =====... (5 Replies)
Discussion started by: machomaddy
5 Replies

6. Shell Programming and Scripting

any savant ? using AWK/SED to remove newline character between two strings : conditional removal

I'd like to remove (do a pattern or precise replacement - this I can handle in SED using Regex ) ---AFTER THE 1ST Occurrence ( i.e. on the 2nd occurrence - from the 2nd to fourth occurance ) of a specific string : type 1 -- After the 1st occurrence of 1 string1 till the 1st occurrence of... (4 Replies)
Discussion started by: sieger007
4 Replies

7. Shell Programming and Scripting

remove newline chars in each record of file

Hi, I have a fixed width file with record length 10. I need to remove multiple newline characters present in each record. EX: af\n72/7\n s\n3\nad\n 2\n\n33r\n In the above file I want to remove new lines in red color(\n) but not (\n) Please provide me a solution. Thanks, Sri (1 Reply)
Discussion started by: srilaxmi
1 Replies

8. Shell Programming and Scripting

Remove newline character conditionally

Hi All, I have 5000 records like this Request_id|Type|Status|Priority|Ticket Submitted Date and Time|Actual Resolved Date and Time|Current Ticket Owner Group|Case final Ticket Owner Group|Customer Severity|Reported Symptom/Request|Component|Hot Topic|Reason for Missed SLA|Current Ticket... (2 Replies)
Discussion started by: j_53933
2 Replies

9. Shell Programming and Scripting

To remove the newline character while appending into a file

Hi All, We append the output of a file's size in a file. But a newline character is appended after the variable. Pls help how to clear this. filesize=`ls -l test.txt | awk `{print $5}'` echo File size of test.txt is $filesize bytes >> logfile.txt The output we got is, File size of... (4 Replies)
Discussion started by: amio
4 Replies

10. Shell Programming and Scripting

How to remove a newline character at the end of filename

Hi All, I have named a file with current date,time and year as follows: month=`date | awk '{print $2}'` date=`date | awk '{print $3}'` year=`date | awk '{print $6}'` time=`date +%Hh_%Mm_%Ss'` filename="test_"$month"_"$date"_"$year"_"$time".txt" > $filename The file is created with a... (2 Replies)
Discussion started by: amio
2 Replies
Login or Register to Ask a Question