Sort html based on .jar, .war file names and still keep text within three groups.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Sort html based on .jar, .war file names and still keep text within three groups.
# 1  
Old 03-12-2016
Sort html based on .jar, .war file names and still keep text within three groups.

Output from zipdiff GNU EAR comparison tool produces output in html divided into three sections "Added, Removed, Changed". I want the output to be sorted by jar or war file.


Code:
<html>
<body>
<table>
<tr>
<td class="diffs" colspan="2">Added </td>
</tr>
<tr><td>
<ul>
<li>jar1.jar/com/aaa/bbbb/cc/file1.class</li>
<li>jar1.jar/com/aaa/bbbb/cc/file3.class</li>
<li>jarname.jar/com/aaa/bbbb/cc/dd/filename.class</li>
<li>jarname.jar/com/aaa/bbbb/cc/ee/af/fileblala.class</li>
</ul>
<tr>
<td class="diffs" colspan="2">Removed </td>
</tr>
<tr><td>
<ul>
<li>jar3.war/com/aaa/bbbb/cc/file5.class</li>
<li>jarbla.jar/com/aaa/bbbb/cc/file6.class</li>
<li>jarblabla.war/com/aaa/bbbb/cc/ee/fa/afd/filenamefa.class</li>
<li>jar3.war/com/aaa/bbbb/cc/affa/faf/wrw/filenaa.class</li>
</ul>
<tr>
<td class="diffs" colspan="2">chagned </td>
</tr>
<ul>
<tr><td>
<li>jar4.jar/com/aaa/bbbb/cc/filefsaf.class</li>
<li>jarfsadf.war/com/aaa/bbbb/cc/filedfasf.class</li>
<li>jar4.jar/com/aaa/bbbb/cc/file11.class</li>
<li>jardfasdf.war/com/aaa/bbbb/cc/rr/ryy/filedfasf.class</li>
</ul>
</td>
</tr>
</table>

Expected output will have sorted by .jar and .war file names under sections
"Added, Removed, Updated". I think awk or sed can do this with inline replacing.

Code:
<html>
<body>
<table>
<tr>
<td class="diffs" colspan="2">Added </td>
</tr>
<tr><td>
<ul>
<li>jar1.jar/com/aaa/bbbb/cc/file1.class</li>
<li>jar1.jar/com/aaa/bbbb/cc/file3.class</li>
<li>jarname.jar/com/aaa/bbbb/cc/dd/filename.class</li>
<li>jarname.war/com/aaa/bbbb/cc/ee/af/fileblala.class</li>
</ul>
<tr>
<td class="diffs" colspan="2">Removed </td>
</tr>
<tr><td>
<ul>
<li>jar3.war/com/aaa/bbbb/cc/file5.class</li>
<li>jar3.war/com/aaa/bbbb/cc/affa/faf/wrw/filenaa.class</li>
<li>jarbla.jar/com/aaa/bbbb/cc/file6.class</li>
<li>jarblabla.war/com/aaa/bbbb/cc/ee/fa/afd/filenamefa.class</li>
</ul>
<tr>
<td class="diffs" colspan="2">chagned </td>
</tr>
<ul>
<tr><td>
<li>jar4.jar/com/aaa/bbbb/cc/filefsaf.class</li>
<li>jar4.jar/com/aaa/bbbb/cc/file11.class</li>
<li>jarfsadf.war/com/aaa/bbbb/cc/filedfasf.class</li>
<li>jardfasdf.war/com/aaa/bbbb/cc/rr/ryy/filedfasf.class</li>
</ul>
</td>
</tr>
</table>


Last edited by kchinnam; 03-13-2016 at 07:21 PM.. Reason: corrected input
# 2  
Old 03-13-2016
Making a few wild assumptions:
  1. The pathnames of the files in all three of the lists have exactly six components where the 1st component is of the form jardigits.suffix where digits is a string of one or more decimal digits and suffix is either jar or war; the second, third, fourth, and fifth components are the same in all of the pathnames; and the sixth component is of the form filedigits2.class where digits2 is another string of one or more decimal digits.
  2. The first component in the pathnames will not have the same digits string for both a .jar directory name and a .war directory name.
  3. Even though the <ul> HTML tag might be misplaced in some groups, an </ul> HTML tag will always appear on the first line after the lines containing the <li> HTML tags for each table.
  4. And, the <li> HTML tag always appears at the start of a line.
the following works with your provided sample input:
Code:
awk '
BEGIN {	cmd = "sort -t/ -k1.8,1n -k6.5,6n"
}
/<li>/ {print | cmd
	next
}
/<\/ul>/ {
	close(cmd)
}
1' file.html

As always, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
# 3  
Old 03-13-2016
Don,
I have corrected my input. I only need to sort on first column(i.e .jar/.war names)
sort -t/ -k1 is working fine.

This code is printing only lines start with "<li>", how can I keep html around it?
output is divided into three groups: Added/Removed/Changed. Could you try to keep these?
# 4  
Old 03-13-2016
Quote:
Originally Posted by kchinnam
Don,
I have corrected my input. I only need to sort on first column(i.e .jar/.war names)
sort -t/ -k1 is working fine.

This code is printing only lines start with "<li>", how can I keep html around it?
output is divided into three groups: Added/Removed/Changed. Could you try to keep these?
In the code I suggested:
Code:
awk '
BEGIN {	cmd = "sort -t/ -k1.8,1n -k6.5,6n"
}
/<li>/ {print | cmd
	next
}
/<\/ul>/ {
	close(cmd)
}
1' file.html

the 1 shown in red at the end of the awk code prints the lines that you say are not being printed. I can only assume that you did not copy that part of my suggestion into the code you used.

With your new requirements, the line in my suggestion:
Code:
BEGIN {	cmd = "sort -t/ -k1.8,1n -k6.5,6n"

can be simplified to just:
Code:
BEGIN {	cmd = "sort"

With this change to my suggestion and your new sample input, the output produced is:
Code:
<html>
<body>
<table>
<tr>
<td class="diffs" colspan="2">Added </td>
</tr>
<tr><td>
<ul>
<li>jar1.jar/com/aaa/bbbb/cc/file1.class</li>
<li>jar1.jar/com/aaa/bbbb/cc/file3.class</li>
<li>jarname.jar/com/aaa/bbbb/cc/dd/filename.class</li>
<li>jarname.jar/com/aaa/bbbb/cc/ee/af/fileblala.class</li>
</ul>
<tr>
<td class="diffs" colspan="2">Removed </td>
</tr>
<tr><td>
<ul>
<li>jar3.war/com/aaa/bbbb/cc/affa/faf/wrw/filenaa.class</li>
<li>jar3.war/com/aaa/bbbb/cc/file5.class</li>
<li>jarbla.jar/com/aaa/bbbb/cc/file6.class</li>
<li>jarblabla.war/com/aaa/bbbb/cc/ee/fa/afd/filenamefa.class</li>
</ul>
<tr>
<td class="diffs" colspan="2">chagned </td>
</tr>
<ul>
<tr><td>
<li>jar4.jar/com/aaa/bbbb/cc/file11.class</li>
<li>jar4.jar/com/aaa/bbbb/cc/filefsaf.class</li>
<li>jardfasdf.war/com/aaa/bbbb/cc/rr/ryy/filedfasf.class</li>
<li>jarfsadf.war/com/aaa/bbbb/cc/filedfasf.class</li>
</ul>
</td>
</tr>

which seems to meet your requirements.
This User Gave Thanks to Don Cragun For This Post:
# 5  
Old 03-13-2016
Thanks Don. My mistake I did not include 1. Its working like magic. what does close(cmd) and 1 do? Do you mind explaining how this is working?
# 6  
Old 03-13-2016
Note that awk statements take the form:
Code:
condition {action}

where condition is evaluated for each input line and, if it yields a value of TRUE or a non-zero numeric value or a non-empty string string value, the actions specified by action will be performed for that input line. If no condition is specified, the given action will be performed for every line. If condition is specified and no action is given, print the input line if condition evaluates to true.

Code:
awk '			# Use awk to interpret the following script...

BEGIN {	cmd = "sort"	# Before any lines are read from the input file, define
}			# cmd to be the command through which lines containing
			# "<li>" will be piped.

/<li>/ {print | cmd	# Send any line containing "<li>" to the sort command.
	next		# Restart processing with the next input line (skipping
			# later steps in this script for the current input
			# line).
}
/<\/ul>/ {		# When a line is found containing "</ul>", close the
	close(cmd)	# pipe to the sort command (forcing sort to print the
}			# lines that have been written to it in sorted order.

1			# Since no action is specified for this condition and
			# the default action is to print the current input line,
			# print the current input line.

' file.html		# End the awk script and specify the input file(s) to be
			# processed.

Does this answer your questions?

Last edited by Don Cragun; 03-13-2016 at 10:35 PM.. Reason: Fix typo: s|/li>|/ul|
This User Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Sort a text file based on names in square brackets

Hi all, I have a text file similar to this: Text More text Etc Stuff That Is Needed Etc Etc This contains over 70 entries and each entry has several lines of text below the name in square brackets. (5 Replies)
Discussion started by: Scally
5 Replies

2. Shell Programming and Scripting

Best way to sort file with groups of text of 4-5 lines by the first one

Hi, I have some data I have taken from the internet in the following scheme: name direction webpage phone number open hours menu url book url name ... Of course the only line that is mandatory is the name wich is the one I want to sort by. I have the following sed & awk script that... (3 Replies)
Discussion started by: devmsv
3 Replies

3. Shell Programming and Scripting

How to get CRC check sum of files in java EAR file without extracting .jar/.war files to disk.?

unzip -v gives CRC info of each file in a zip(in my case .EAR) file. # unzip -v my-application.ear Archive: my-application.ear Length Method Size Cmpr Date Time CRC-32 Name -------- ------ ------- ---- ---------- ----- -------- ---- 197981 Defl:N 183708 7%... (1 Reply)
Discussion started by: kchinnam
1 Replies

4. Web Development

Sort 3 or more columns in a HTML file

Hi Friends, I have a HTMl file with 10 columns. I found a script online that can sort any single column in a HTML file. But, I would like to sort on multiple columns at once. Could you please show some pointers? Thanks (6 Replies)
Discussion started by: jacobs.smith
6 Replies

5. UNIX for Dummies Questions & Answers

gawk asort to sort record groups based on one subfield

input ("/" delimited fields): style1/book1 (author_C)/editor1/2000 style1/book2 (author_A)/editor2/2004 style1/book3 (author_B)/editor3/2001 style2/book8 (author_B)/editor4/2010 style2/book5 (author_A)/editor2/1998 Records with same field 1 belong to the same group. Using asort (not sort),... (3 Replies)
Discussion started by: lucasvs
3 Replies

6. Shell Programming and Scripting

Sort content of text file based on date?

I now have a 230,000+ lines long text file formatted in segments like this: Is there a way to sort this file to have everything in chronological order, based on the date and time in the text? In this example, I would like the result to be: (19 Replies)
Discussion started by: KidCactus
19 Replies

7. Shell Programming and Scripting

Find and rename long file names (html)

Hi Guys, I need a help. I have 1130 zip files. Each one of them has files including 1 html file with long file name (includes special charactors, Alphabetic and numbers). I have copied all 1130 zip files to my linux system and extracted using below command. Find . -name "*.zip" -exec... (7 Replies)
Discussion started by: Rajmani
7 Replies

8. UNIX for Dummies Questions & Answers

How can I sort the file names in the directory

Hi , I have a list of files in the directory I want to sort based on the file name. But in the middle of filename contains the number based on that I need to sort.Could you suggest me on the same? Example filenames: /user1$ls RS.DEV.ISV.F1.RS.REFDATA.DATA... (1 Reply)
Discussion started by: praveen.thumati
1 Replies

9. UNIX for Dummies Questions & Answers

How do I extract text only from html file without HTML tag

I have a html file called myfile. If I simply put "cat myfile.html" in UNIX, it shows all the html tags like <a href=r/26><img src="http://www>. But I want to extract only text part. Same problem happens in "type" command in MS-DOS. I know you can do it by opening it in Internet Explorer,... (4 Replies)
Discussion started by: los111
4 Replies
Login or Register to Ask a Question