Deleting table cells in a script


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Deleting table cells in a script
# 8  
Old 12-05-2008
Oddly, on my Linux box, I cannot make your code work...

Code:
echo $LINE |sed -n '/<tr>/,/<\/tr> {
					    s/.*<tr>//
					    s/<\/tr>.*//
					    p
					    }' >> newfile.html

unless I add a trailing "/" before the "{"...

Code:
echo $LINE |sed -n '/<tr>/,/<\/tr>/ {
					    s/.*<tr>//
					    s/<\/tr>.*//
					    p
					    }' >> newfile.html

Just an FYI Smilie

Also

Code:
echo $LINE

yields very different results than it's "quoted" counterpart...

Code:
echo "$LINE"

# 9  
Old 12-05-2008
k, I have a few issues/questions at this point. When I run my script it errs:

Code:
[: !=: unexpected operator
[: !=: unexpected operator


Seems that part of ddreggors code is breaking during the test command. I'm going to man test to see if I can dig up something there, here, or elsewhere.

Secondly, and please bare with me on this as I'm still learning, but what can I do to tell the script to 'do nothing and keep going' vs. echo "blah" in my loop. I feel like I'm just filling in the blanks here because I'm stumped since I'm sure if I leave it out, it'll break. Would the solution be to just echo to devnull? Smilie

Third, ddreggors, I'm looking around right now but if I'm going to use your sed example, I'll need an expression a little more complex than yours since the range of characters goes beyond just [x-z] I think what I need is [a-zA-Z0-9]. Also needs to include "(|)|:|.|,|/" (brackets, semicolons, periods, commas, slashes if I noted that right). I'll try with my own sed example first then explore later if need be.


Code:
#!/bin/sh

#TD=0
CT=0
cat status.html |while read LINE
do
    # Check to see if the LINE is non-empty, and has a <td> tag in it.
    if [ -n "$LINE" -a `echo $LINE |grep "<td>"` != "" ] ; then
        # Increase the TD counter by 1
        CT=`echo "$CT+1" |bc`
        
        # Check to see if the TD counter is at 6 (we are at 7th TD as the counter starts at 0 not 1)
        if [ "$CT" -eq 6 ] ; then
            # Use sed to remove this TD tag AND everything in between
            echo $LINE |sed -n '/<tr>/,/<\/tr> {
                                            s/.*<tr>//
                                            s/<\/tr>.*//
                                            p
                                            }' >> ztmp.Ps23zp2s.2-Fpps3-wmmm0dss3
        else 
            echo $LINE >> ztmp.Ps23zp2s.2-Fpps3-wmmm0dss3
        fi
    else
        echo $LINE >> ztmp.Ps23zp2s.2-Fpps3-wmmm0dss3
    fi
    
    # If we are leaving a table row the we need to reset the TD counter!
    if [ -n "$LINE" -a `echo $LINE |grep "</tr>"` != "" ] ; then
                CT=0
    else
    	echo "No reset"
	fi
	
    if [ -n "$LINE" -a `echo $LINE |grep "</html>"` != "" ] ; then 
                mv ztmp.Ps23zp2s.2-Fpps3-wmmm0dss3 status.html
		else	
			echo "Not done yet, keep going" 
		fi

done

# 10  
Old 12-05-2008
Quote:
[: !=: unexpected operator
[: !=: unexpected operator
OK change it some to do this:

Code:
    # If we are leaving a table row the we need to reset the TD counter!
    TEST=`echo $LINE |grep '</tr>'`
    if [ -n "$TEST" ] ; then
                CT=0
    else
    	echo "No reset"
	fi
	
    TEST=`echo $LINE |grep '</html>'`
    if [ -n "$TEST" ] ; then 
                mv ztmp.Ps23zp2s.2-Fpps3-wmmm0dss3 status.html
		else	
			echo "Not done yet, keep going" 
		fi

That should fix that, and as for the sed expression, I was sure you WOULD have to change that as I am not sure we have ever seen the exact pattern you are looking for. If you did post that pattern I missed it, sorry.
# 11  
Old 12-05-2008
Script doesn't err but the sed isn't clearing the cells. I found when I ran it manually on the file..


Code:
# cp status.html teststatus.html
# OUT=`cat teststatus.html` && echo $OUT |sed 's/<td>[a-zA-Z0-9|(|)]<\/td>//'
OUT=<?xml: Command not found.
# grep -n "<?xml" teststatus.html
1:<?xml version="1.0" encoding="iso-8859-1"?>
#

Within the script that brings the page into my box I added:

Code:
sed -e '1d' <status.html > ztmp.Ps23zp2s.2-Fpps3-wmmm0dss3 ;
mv ztmp.Ps23zp2s.2-Fpps3-wmmm0dss3 sentinalstatus.html

I go to test again and hence it complains about something else..

Code:
# OUT=`cat teststatus.html` && echo $OUT |sed 's/<td>[a-zA-Z0-9|(|)]<\/td>//'
OUT=<!DOCTYPE: Command not found.
#

I'm probably missing quotes somewhere I figure. Tried adding them to the var but it doesn't work. Below is an update of what I have so far.


Code:
#!/bin/sh

#TD=0
CT=0
cat status.html |while read LINE
do
    # Check to see if the LINE is non-empty, and has a <td> tag in it.
	TD=`echo $LINE |grep '</td>'`
	if [ -n "$TD" ] ; then
        # Increase the TD counter by 1
        CT=`echo "$CT+1" |bc`
        
        # Check to see if the TD counter is at 6 (we are at 7th TD as the counter starts at 0 not 1)
        if [ "$CT" -eq 7 ] ; then
            # Use sed to remove this TD tag AND everything in between
            echo $LINE |sed 's/<td>[a-zA-Z0-9|(|)]<\/td>//' >> ztmp.Ps23zp2s.2-Fpps3-wmmm0dss3
        else 
            echo $LINE >> ztmp.Ps23zp2s.2-Fpps3-wmmm0dss3
        fi
    else
        echo $LINE >> ztmp.Ps23zp2s.2-Fpps3-wmmm0dss3
    fi
    
    # If we are leaving a table row the we need to reset the TD counter!
    TR=`echo $LINE |grep '</tr>'`
    if [ -n "$TR" ] ; then
                CT=0
    else
    	echo "" > /dev/null
	fi
	
    HTML=`echo $LINE |grep '</html>'`
    if [ -n "$HTML" ] ; then 
        mv ztmp.Ps23zp2s.2-Fpps3-wmmm0dss3 status.html
	else	
		echo "" > /dev/null
	fi

done

# 12  
Old 12-06-2008
on command line give this a try:

Code:
# export OUT=`cat teststatus.html`
# echo "$OUT" |sed 's/<td>[a-zA-Z0-9|(|)]<\/td>//'

Notice the quotes around the variable in the echo line (echo "$OUT")
# 13  
Old 12-06-2008
Tried but it didn't work. Assuming the below were in a file by itself, if I can get sed to empty it out then I -should be ok.

Code:
<TD ALIGN=CENTER>
<A HREF=addcomment.pl?type=li&serv_ip=1.30.33.2 onclick="NewWindow(this.href,'name','500','300','yes');return false;"><I>(Curtis Blow)</I>: CASE IN QUEUE - RAID REBOOT<BR>
<A HREF=/server/singleserveruptime.pl?server_ip=1.30.33.2&time_period=1&days=&start=&end=&submit=Submit><font size=1><i>Click To See Uptime/Assign History</i></font></A></A>
</TD>

As you can see I'm dealing with characters like ? < > , . = & ' ; ( ) / _ etc.
# 14  
Old 12-09-2008
This should do it...

Code:
#!/bin/sh

IN=0
CT=0
OUTFILE="TestHTML.out"
echo > $OUTFILE # Start with fresh file always

cat TestHTML.htm |while read LINE
do
    # If we are entering a table row the we need to reset the TD counter
    TR=`echo $LINE |grep -i '<tr'`
    if [ -n "$TR" ]
    then
        CT=0
    else
        echo "" > /dev/null
    fi

    # Check to see if the LINE is non-empty, and has an opening td tag in it.
    TD=`echo $LINE |tr -d '\n' |grep -i '<td'`
    if [ -n "$TD" ]
    then
        # We are inside a td tag.
        IN=1
    fi

    # Check to see if the LINE is non-empty and has a closing td tag in it.
    ENDTD=`echo $LINE |tr -d '\n' |grep -i '/td>'`
    if [ -n "$ENDTD" ]
    then
        # We are leaving a td tag.
        IN=0
        # Increase the TD counter by 1
        CT=`echo "$CT+1" |bc`
    fi

    if [ "$IN" -eq 1 -a "$CT" -eq 6 -a -z "$ENDTD" ]
    then
        # Use sed to remove this TD tag AND everything in between
        echo $LINE |tr -d '\n' |sed 's/.*//' >> $OUTFILE
    elif [ "$IN" -eq 0 -a "$CT" -eq 7 ]
    then
        # We may (or may not) have an opening and closing td tag in 1 line.
        TMP=`echo $LINE |tr -d '\n' |sed 's/<TD.*//'`
        echo $TMP |sed 's/.*\/TD>//' >> $OUTFILE
    else
        echo $LINE >> $OUTFILE
    fi
done

This was tested against this file (TestHTML.htm):

Code:
<HTML>
<BODY>

<TABLE>
<TR>
<TD>Table Data1</TD>
<TD>Table Data2</TD>
<TD>Table Data3</TD>
<TD>Table Data4</TD>
<TD>Table Data5</TD>
<TD>Table Data6</TD>
<TD ALIGN=CENTER>
<A HREF=addcomment.pl?type=li&serv_ip=1.30.33.2 onclick="NewWindow(this.href,'name','500','300','yes');return false;"><I>(Curtis Blow)</I>: CASE IN QUEUE - RAID REBOOT<BR>
<A HREF=/server/singleserveruptime.pl?server_ip=1.30.33.2&time_period=1&days=&start=&end=&submit=Submit><font size=1><i>Click To See Uptime/Assign History</i></font></A></A>
</TD>
</TR>
</TABLE>

<!-- COMMENT -->

<TABLE>
<TR>
<TD>Table Data1</TD>
<TD>Table Data2</TD>
<TD>Table Data3</TD>
<TD>Table Data4</TD>
<TD>Table Data5</TD>
<TD>Table Data6</TD>
<TD ALIGN=CENTER>
<A HREF=addcomment.pl?type=li&serv_ip=1.30.33.2 onclick="NewWindow(this.href,'name','500','300','yes');return false;"><I>(Curtis Blow)</I>: CASE IN QUEUE - RAID REBOOT<BR>
<A HREF=/server/singleserveruptime.pl?server_ip=1.30.33.2&time_period=1&days=&start=&end=&submit=Submit><font size=1><i>Click To See Uptime/Assign History</i></font></A></A>
</TD>
</TR>
</TABLE>

<!-- COMMENT -->

</BODY>
</HTML>

and the resulting file (TestHTML.out):

Code:
<HTML>
<BODY>

<TABLE>
<TR>
<TD>Table Data1</TD>
<TD>Table Data2</TD>
<TD>Table Data3</TD>
<TD>Table Data4</TD>
<TD>Table Data5</TD>
<TD>Table Data6</TD>

</TR>
</TABLE>

<!-- COMMENT -->

<TABLE>
<TR>
<TD>Table Data1</TD>
<TD>Table Data2</TD>
<TD>Table Data3</TD>
<TD>Table Data4</TD>
<TD>Table Data5</TD>
<TD>Table Data6</TD>

</TR>
</TABLE>

<!-- COMMENT -->

</BODY>
</HTML>

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Remove duplicates in a dataframe (table) keeping all the different cells of just one of the columns

Hello all, I need to filter a dataframe composed of several columns of data to remove the duplicates according to one of the columns. I did it with pandas. In the main time, I need that the last column that contains all different data ( not redundant) is conserved in the output like this: A ... (5 Replies)
Discussion started by: pedro88
5 Replies

2. UNIX for Beginners Questions & Answers

Merge cells in all rows of a HTML table dynamically.

Hello All, I have visited many pages in Unix.com and could find out one solution for merging the HTML cells in the 1st row. (Unable to post the complete URL as I should not as per website rules). But, however I try, I couldn't achieve this merging to happen for all other rows of HTML... (17 Replies)
Discussion started by: Mounika
17 Replies

3. UNIX for Beginners Questions & Answers

BASH SCRIPT - Insert date into cells in cvs file

Hi, I'm looking to accomplish the following. Insert current date into three places/cells within a cvs, every time the bash script is executed. The cells are column A,B,C row 2. Row 1 is reserved for the headers. The file name is always orders.csv. These three cells we always have an old... (1 Reply)
Discussion started by: Rookievmc
1 Replies

4. Programming

Perl script to merge cells in column1 which has same strings, for all sheets in a excel workbook

Perl script to merge cells ---------- Post updated at 12:59 AM ---------- Previous update was at 12:54 AM ---------- I am using below code to read files from a dir and print to excel. open(my $in, '<', $file) or die "Could not open file: $!"; my $rowCount = 0; my $colCount = 0;... (11 Replies)
Discussion started by: Jack_Bruce
11 Replies

5. UNIX for Dummies Questions & Answers

Deleting unwanted text from a table

Hi everyone, I have a microbial diversity table in the format ;k__kingdom; p__phylum, etc, somer rows have descriptions before the :k__ (like the af028349.1 below) is there a way I can get rid of this text (which is different every time) and keep all the other columns? Thanks a bunch! ;... (1 Reply)
Discussion started by: Juan Gonzalez
1 Replies

6. Shell Programming and Scripting

In php, Moving a new row to another table and deleting old row

Hi, I already succeed moving a new row to another table if the field from new row doesn't have the first word that I categorized (like: IRC blablabla, PTM blablabla, ADM blablabla, BS blablabla). But it can't delete the old row. Please help me with the script. my php script: INSERT INTO... (2 Replies)
Discussion started by: jazzyzha
2 Replies

7. Shell Programming and Scripting

Moving new row and deleting old row to another table

Hi, I want to move a new row to another table if the field from new row doesn't have the first word that I categorized (like: IRC blablabla, PTM blablabla, ADM blablabla, BS blablabla). I already use this script but doesn't work as I expected. CHECK_KEYWORD="$( mysql -uroot -p123456 smsd -N... (7 Replies)
Discussion started by: jazzyzha
7 Replies

8. UNIX for Dummies Questions & Answers

Filling empty cells

How do you fill empty cells that do not have any data in them with "X" in a tab delimited text file? Thanks! (4 Replies)
Discussion started by: evelibertine
4 Replies

9. UNIX for Dummies Questions & Answers

Deleting cells that contain a specific number only from a space delimited text file

I have this space delimited large text file with more than 1,000,000+ columns and about 100 rows. I want to delete all the cells that consist of just 2 (leave 2's that are not by themselves intact): File before modification aa bb cc 2 NA100 dd aa b1 c2 2 NA102 de File after modification... (1 Reply)
Discussion started by: evelibertine
1 Replies

10. Shell Programming and Scripting

rm -i and deleting files from an index table

Hi, I am trying to make a command to delete my files out the trash can, but one at a time. I am currently using rm - i to do this, but the original file locations for restoring my files are heard on a .txt file which I am using as an index table. How would I manage to make it so that if I... (21 Replies)
Discussion started by: E-WAN
21 Replies
Login or Register to Ask a Question