Old 02-26-2008
delete newline character between html tags

I have learned some of the Unix commands a way back and not sure of how to code them when needed in certain way, especially sed command. Here is my situation. I have an xml file with several tags. most of the tags start on the same line and end on the same line. However, data for some tags span into mulitple lines. I would like to bring back that particular tag into one line removing all new lines between them.

Here is an example:
<Proj_Name>ABC Enhancement</Proj_Name>
<Proj_Description>Project started on 01/03/2006.
However, it is running behind due to unavailable

The above is a sample data. I am looking to remove new line characters only from the lines that spans into multiple lines. Herea is how it should appear after removing new lines.

<Proj_Name>ABC Enhancement</Proj_Name>
<Proj_Description>Project started on so and so date.... </Proj_Description>

Any help is highly appreciated.

Old 02-26-2008
This may need a little tweaking:
awk '
NR>1 && /^</ { printf "\n" }
{ printf "%s ", $0 }
END { printf "\n" }
' "$FILE"

Old 02-26-2008
Thanks for your quick reply. First thing I noticed is that this script is taking little long time to process my file(size 200mb). I have about 10 to 20 files to process like this and I am afraid it might take several minutes process. Secondly, I thouhgt, if we remove extra new line characters, the size of the result file should be less than or equal to the original size, but I see the resulted file size being increased than original size. Thirdly, can you please explain me on your command what exactly is each command doing?

Old 02-26-2008
Also, I noticed that the above script provided was not working, though executing successfully, to remove newline characters between a pair of html tags.
Old 02-27-2008
The following assumes you want a new line after each
character, otherwise keep printing on the same line until you have one.

I'm calling it
#!/usr/bin/env perl
while (<>) {
if ( /.*\>$/ ) {
print "$_\n";
} else {
print "$_";

in file
asdf> s

asdf> saskld>

Flavour according to taste.
Old 02-27-2008
As I am new to Perl scripting, I appologize for my dumb questions. Please help me understand it so that I can utilize it. I added a line to it to open a file, I read this command some where in the net, but not sure where are we writing re-formatted input lines back. Here is it how it looks now.

open (MYFILE, "/ABC/XYZ/A123/XmlFldr/test1.xml") or die("Unable to open File");
while (<MYFILE>) {
if ( /.*\>$/ ) {
print "$_\n";
} else {
print "$_";

Also, I noticed that in my input data, especially when a particular tag has data span into multiple lines, for ex:

<Proj_Name>ABC - Mechanical fix<Proj_Name>
<Proj_Descritpion> ABC - Mechanical fix
to a generator x1234m. <Proj_Description>
<Proj_Comment>Project started on so and so date.
It is now running behind schedule due to
unavailable resources<Proj_Comment>

In the above case there are two tags that has data split into multiple lines. This data file comes from a windows environment. So, I am not sure if it has both newline and CR breaking the line.

With Awk script indicated above, it did not convert these lines into one line tags. however, it appears to be working fine by putting remaining lines into one line tags. I would like to get the multiline data tags into one single line.

Any help is appreciated.
Old 02-28-2008
you need to add nothing to the file tst.txt
should produce the output if you use my original script and txt file

I thought I edited my original post to include the command line, but I must not have saved it.

#!/usr/bin/env perl
while (<>) { #treat parameters as filenames and open them for reading (in this case you should only process one file at a time)
chomp; # remove last character if it is a linefeed
if ( /.*\>$/ ) { # look for any character (the .) repeat as often as possible (the *) followed by a '>' character (the \>) but only at the end of the line (the $). So 'zbc> ' does not match, since the > character is not at the end of the line.
print "$_\n";
} else {
print "$_";

