delete newline character between html tags


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers delete newline character between html tags
# 1  
Old 02-26-2008
delete newline character between html tags

Hi,
I have learned some of the Unix commands a way back and not sure of how to code them when needed in certain way, especially sed command. Here is my situation. I have an xml file with several tags. most of the tags start on the same line and end on the same line. However, data for some tags span into mulitple lines. I would like to bring back that particular tag into one line removing all new lines between them.

Here is an example:
<Report>
<Project>
<Proj_Name>ABC Enhancement</Proj_Name>
<Proj_Type>Mechanical</Proj_Type>
<Proj_Description>Project started on 01/03/2006.
However, it is running behind due to unavailable
Resources</Proj_Description>
<Proj_Hours>123.00</Proj_Hours.
</Project>
<Report>

The above is a sample data. I am looking to remove new line characters only from the lines that spans into multiple lines. Herea is how it should appear after removing new lines.

<Report>
<Project>
<Proj_Name>ABC Enhancement</Proj_Name>
<Proj_Type>Mechanical</Proj_Type>
<Proj_Description>Project started on so and so date.... </Proj_Description>
<Proj_Hours>123.00</Proj_Hours.
</Project>
<Report>

Any help is highly appreciated.

Thanks,
Rao
# 2  
Old 02-26-2008
Quote:
Originally Posted by girish312
Hi,
I have learned some of the Unix commands a way back and not sure of how to code them when needed in certain way, especially sed command. Here is my situation. I have an xml file with several tags. most of the tags start on the same line and end on the same line. However, data for some tags span into mulitple lines. I would like to bring back that particular tag into one line removing all new lines between them.

Here is an example:
<Report>
<Project>
<Proj_Name>ABC Enhancement</Proj_Name>
<Proj_Type>Mechanical</Proj_Type>
<Proj_Description>Project started on 01/03/2006.
However, it is running behind due to unavailable
Resources</Proj_Description>
<Proj_Hours>123.00</Proj_Hours.
</Project>
<Report>

The above is a sample data. I am looking to remove new line characters only from the lines that spans into multiple lines. Herea is how it should appear after removing new lines.

<Report>
<Project>
<Proj_Name>ABC Enhancement</Proj_Name>
<Proj_Type>Mechanical</Proj_Type>
<Proj_Description>Project started on so and so date.... </Proj_Description>
<Proj_Hours>123.00</Proj_Hours.
</Project>
<Report>

This may need a little tweaking:
Code:
awk '
NR>1 && /^</ { printf "\n" }
{ printf "%s ", $0 }
END { printf "\n" }
' "$FILE"

# 3  
Old 02-26-2008
Hi,
Thanks for your quick reply. First thing I noticed is that this script is taking little long time to process my file(size 200mb). I have about 10 to 20 files to process like this and I am afraid it might take several minutes process. Secondly, I thouhgt, if we remove extra new line characters, the size of the result file should be less than or equal to the original size, but I see the resulted file size being increased than original size. Thirdly, can you please explain me on your command what exactly is each command doing?

Thanks
Rao
# 4  
Old 02-26-2008
Also, I noticed that the above script provided was not working, though executing successfully, to remove newline characters between a pair of html tags.
# 5  
Old 02-27-2008
The following assumes you want a new line after each
>
character, otherwise keep printing on the same line until you have one.

I'm calling it tst.pl
===================
#!/usr/bin/env perl
while (<>) {
chomp;
if ( /.*\>$/ ) {
print "$_\n";
} else {
print "$_";
}
}
====================

in file
tst.txt
=================
as;dkl>
asjkf
qoweiu
askldfj>
asdf>
asdf> s
askld
>
=================
produces

as;dkl>
asjkfqoweiuaskldfj>
asdf>
asdf> saskld>


Flavour according to taste.
# 6  
Old 02-27-2008
Hi,
As I am new to Perl scripting, I appologize for my dumb questions. Please help me understand it so that I can utilize it. I added a line to it to open a file, I read this command some where in the net, but not sure where are we writing re-formatted input lines back. Here is it how it looks now.

#!/usr/local/bin/perl
open (MYFILE, "/ABC/XYZ/A123/XmlFldr/test1.xml") or die("Unable to open File");
while (<MYFILE>) {
chomp;
if ( /.*\>$/ ) {
print "$_\n";
} else {
print "$_";
}
}

Also, I noticed that in my input data, especially when a particular tag has data span into multiple lines, for ex:

<Proj_Name>ABC - Mechanical fix<Proj_Name>
<Proj_Descritpion> ABC - Mechanical fix
to a generator x1234m. <Proj_Description>
<Proj_Comment>Project started on so and so date.
It is now running behind schedule due to
unavailable resources<Proj_Comment>

In the above case there are two tags that has data split into multiple lines. This data file comes from a windows environment. So, I am not sure if it has both newline and CR breaking the line.

With Awk script indicated above, it did not convert these lines into one line tags. however, it appears to be working fine by putting remaining lines into one line tags. I would like to get the multiline data tags into one single line.

Any help is appreciated.
Thanks,
Rao
# 7  
Old 02-28-2008
you need to add nothing to the file
tst.pl tst.txt
should produce the output if you use my original script and txt file

I thought I edited my original post to include the command line, but I must not have saved it.

#!/usr/bin/env perl
while (<>) { #treat parameters as filenames and open them for reading (in this case you should only process one file at a time)
chomp; # remove last character if it is a linefeed
if ( /.*\>$/ ) { # look for any character (the .) repeat as often as possible (the *) followed by a '>' character (the \>) but only at the end of the line (the $). So 'zbc> ' does not match, since the > character is not at the end of the line.
print "$_\n";
} else {
print "$_";
}
}

Last edited by scott1256ca; 02-28-2008 at 05:10 AM..
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to remove newline character if it is the only character in the entire file.?

I have a file which comes every day and the file data look's as below. Vi abc.txt a|b|c|d\n a|g|h|j\n Some times we receive the file with only a new line character in the file like vi abc.txt \n (8 Replies)
Discussion started by: rak Kundra
8 Replies

2. Shell Programming and Scripting

Remove last newline character..

Hi all.. I have a text file which looks like below: abcd efgh ijkl (blank space) I need to remove only the last (blank space) from the file. When I try wc -l the file name,the number of lines coming is 3 only, however blank space is there in the file. I have tried options like... (14 Replies)
Discussion started by: Sathya83aa
14 Replies

3. UNIX for Dummies Questions & Answers

How to add newline before and after a special character?

So I have a file that contains >NM_#########AUGCAUCGUAGCUAGUCGAUACUGGACUG>NM_########AUGAGUAUGUAUGAUGUAUGUAUGA where # is any digit 0-9 (the text is many repetitions of the pattern above, not just that, but all in one line), and I want it to show >NM_#########... (2 Replies)
Discussion started by: ShiGua
2 Replies

4. Shell Programming and Scripting

replacing by newline character

I have a file (pema)with a single long record which i have to break up into multiple lines Input s1aaaaaaaaaaaaaaaaaaaaaaas1bbbbbbbbbbs1cccccccccc Output s1aaaaaaaaaaaaaaaaaaaaaaa s1bbbbbbbbbb s1cccccccccc m planning to do it by replacing s1 by \ns1 \n is the new line character i... (5 Replies)
Discussion started by: pema.yozer
5 Replies

5. UNIX for Dummies Questions & Answers

newline character in a variable

variable="unix\nlinux" echo $variable expected output: unix linux :wall: can i do that ?? thanks in advance!! (3 Replies)
Discussion started by: sathish92
3 Replies

6. Shell Programming and Scripting

Why SED can't see the last newline character?

Removed. My question does not make sense. and SED does see the last newline character. But I still have a question: How to remove the last newline character(the newline character at the end of last line) using SED? ---------- Post updated 05-01-11 at 10:51 AM ---------- Previous update was... (7 Replies)
Discussion started by: kevintse
7 Replies

7. UNIX for Dummies Questions & Answers

Delete the line started with nondigit or newline character

i want to delete the line which is not started with numeric in vim. vim temp.txt Volume in drive D is DATA Volume Serial Number is 8C52-2055 Directory of D:\data\notes 02/16/2010 03:09 PM <DIR> . 02/16/2010 03:09 PM <DIR> .. 09/11/1999 03:03 AM ... (5 Replies)
Discussion started by: Manabhanjan
5 Replies

8. UNIX for Dummies Questions & Answers

echo without newline character

hi, I have a for loop where in I write some file name to another file. I want to write all the filenames to another without any newlines. how can i avoid getting new lines with echo? Thanks, Srilaxmi (2 Replies)
Discussion started by: srilaxmi
2 Replies

9. UNIX for Dummies Questions & Answers

newline character

hi, I want to print the below lines "Message from bac logistics The Confirmation File has not been received." When i give like this in the code "Message from bac logistics\n The Confirmation File has not been received." It is giving only Message from bac logistics\n The... (9 Replies)
Discussion started by: trichyselva
9 Replies

10. UNIX for Dummies Questions & Answers

How can I replace newline character?

Hi, I am trying to write a script to prepare some text for use as web content. What is happening is that all the newlines in the textfile are ignored, so I want to be able to replace/add a few characters so that for a file containg: This is line 1. This is line two. This is line four.... (1 Reply)
Discussion started by: ghoti
1 Replies
Login or Register to Ask a Question