Deleting table cells in a script


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Deleting table cells in a script
# 1  
Old 12-05-2008
Deleting table cells in a script

I'd like to use sed or awk to do this but I'm weak on both along with RE. Looking for a way with sed or awk to count for the 7th table data within a table row and if the condition is met to delete "<td>and everything in between </td>". Since the table header start on a specific line each time, that I can delete using sed easy.

Stumped on how to get rid of the other data in that column. Also, the table the script retrieves may vary in length and this is the reason why I'd like it scripted as I've described. If you have any better ideas, I'm open to them.


For those that like gui's here's a simple diagram:

Code:
qrstuvwxyz
qrstuvwxyz
qrstuvwxyz

Code:
qrstuvwyz
qrstuvwyz
qrstuvwyz

# 2  
Old 12-05-2008
Quote:
Originally Posted by phpfreak
Looking for a way [...] to count for the 7th table data within a table row
Do you mean *every* 7th occurrance of <td> within *every* <tr> in a file??
# 3  
Old 12-05-2008
Almost there...

Code:
sed -n '/<tr>/,/<\/tr> {
           s/.*<tr>//
           s/<\/tr>.*//
           p
           }' /path/to/my/file

That's correct. If that condition is met I need it to do the above. Thanks.
# 4  
Old 12-05-2008
This is the simplest answer, if the table data is the same and ONLY in that 7th table data and nowhere else you could use this:

Code:
 
cat myfile |sed 's|<td>x</td>||g'

NOTE
I am using "|" as the delimeter and not "/" so I do not have to escape the "/" in "</td>"...


However if you have other table data fields with the same text and you wish to keep these other cels you may wish to use a loop to parse the tokens and count the "<td>" occurances and then do a test to figure out if this cel matches what you are looking for before handing to sed.

SIMPLE EXAMPLE OF LOOP:

Code:
OUTFILE=/some/file
TD=0
CT=0
cat myfile |while read LINE
do
    # Check to see if the LINE is non-empty, and has a <td> tag in it.
    if [ -n "$LINE" -a `echo $LINE |grep "<td>"` != "" ]
    then
        # Increase the TD counter by 1
        CT=`echo "$CT+1" |bc`
        
        # Check to see if the TD counter is at 6 (we are at 7th TD as the counter starts at 0 not 1)
        if [ "$CT" -eq 6 ]
        then
            # Use sed to remove this TD tag
            echo $LINE |sed 's|<td>x</td>||' >> $OUTFILE
        else 
            echo $LINE >> $OUTFILE
        fi
    else
        echo $LINE >> $OUTFILE
    fi
    
    # If we are leaving a table row the we need to reset the TD counter!
    if [ -n "$LINE" -a `echo $LINE |grep "</tr>"` != "" ]
        CT=0
    fi
done

Note that this does NOT account for multiple "<td>" tags on one line.
# 5  
Old 12-05-2008
Ok I think I follow and I'm going to combine with my sed script to test. Please review it below as I've made some revisions. Remember, my goal is to remove everything between the table data tags and the content within will vary, it'll never be the same.


Code:
#!/bin/sh

#TD=0
CT=0
cat oldfile.html |while read LINE
do
    # Check to see if the LINE is non-empty, and has a <td> tag in it.
    if [ -n "$LINE" -a `echo $LINE |grep "<td>"` != "" ]
    then
        # Increase the TD counter by 1
        CT=`echo "$CT+1" |bc`
        
        # Check to see if the TD counter is at 6 (we are at 7th TD as the counter starts at 0 not 1)
        if [ "$CT" -eq 6 ]
        then
            # Use sed to remove this TD tag AND everything in between
            echo $LINE |sed -n '/<tr>/,/<\/tr> {
					    s/.*<tr>//
					    s/<\/tr>.*//
					    p
					    }' >> newfile.html
        else 
            echo $LINE >> newfile.html
        fi
    else
        echo $LINE >> newfile.html
    fi
    
    # If we are leaving a table row the we need to reset the TD counter!
    if [ -n "$LINE" -a `echo $LINE |grep "</tr>"` != "" ]
        CT=0
    fi
done

But I have two questions regarding your script. What is the var TD for? Notice I've commented it out for now. Lastly, since you set CT to 0, on the -very first count it will be 0 not 1 correct?

Last edited by phpfreak; 12-05-2008 at 07:45 AM..
# 6  
Old 12-05-2008
OK I created a file named testHTM.txt with the following

Code:
$ cat testHTM.txt
<table>
<tr>
<td>x</td>
<td>y</td>
<td>z</td>
</tr>
</table>

then I ran this against that file

Code:
$ OUT=`cat testHTM.txt` && echo $OUT |sed 's/<td>[x-z]<\/td>//1'
<table> <tr>  <td>y</td> <td>z</td> </tr> </table>

$ OUT=`cat testHTM.txt` && echo $OUT |sed 's/<td>[x-z]<\/td>//2'
<table> <tr> <td>x</td>  <td>z</td> </tr> </table>

$ OUT=`cat testHTM.txt` && echo $OUT |sed 's/<td>[x-z]<\/td>//3'
<table> <tr> <td>x</td> <td>y</td>  </tr> </table>

Notice the setting of the occurance you wish to change at the end of the line... so setting to handle occurrence 1 removes the "<td>x</td>" and setting to handle occurrence 2 removes "<td>y</td>" and so on.

Placing this code inside a loop checking weather you are inside "<tr>" and "</tr>" tags and setting to handle occurrence 7 would do it.
# 7  
Old 12-05-2008
oops yeah TD was an extra that I never used so safe to comment/remove

and the CT question:

Code:
    if [ -n "$LINE" -a `echo $LINE |grep "<td>"` != "" ]
    then
        # Increase the TD counter by 1
        CT=`echo "$CT+1" |bc`
        
        # Check to see if the TD counter is at 6 (we are at 7th TD as the counter starts at 0 not 1)
        if [ "$CT" -eq 6 ]

You should change "-eq 6" to "-eq 7" as we increment before we test... again oops
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Remove duplicates in a dataframe (table) keeping all the different cells of just one of the columns

Hello all, I need to filter a dataframe composed of several columns of data to remove the duplicates according to one of the columns. I did it with pandas. In the main time, I need that the last column that contains all different data ( not redundant) is conserved in the output like this: A ... (5 Replies)
Discussion started by: pedro88
5 Replies

2. UNIX for Beginners Questions & Answers

Merge cells in all rows of a HTML table dynamically.

Hello All, I have visited many pages in Unix.com and could find out one solution for merging the HTML cells in the 1st row. (Unable to post the complete URL as I should not as per website rules). But, however I try, I couldn't achieve this merging to happen for all other rows of HTML... (17 Replies)
Discussion started by: Mounika
17 Replies

3. UNIX for Beginners Questions & Answers

BASH SCRIPT - Insert date into cells in cvs file

Hi, I'm looking to accomplish the following. Insert current date into three places/cells within a cvs, every time the bash script is executed. The cells are column A,B,C row 2. Row 1 is reserved for the headers. The file name is always orders.csv. These three cells we always have an old... (1 Reply)
Discussion started by: Rookievmc
1 Replies

4. Programming

Perl script to merge cells in column1 which has same strings, for all sheets in a excel workbook

Perl script to merge cells ---------- Post updated at 12:59 AM ---------- Previous update was at 12:54 AM ---------- I am using below code to read files from a dir and print to excel. open(my $in, '<', $file) or die "Could not open file: $!"; my $rowCount = 0; my $colCount = 0;... (11 Replies)
Discussion started by: Jack_Bruce
11 Replies

5. UNIX for Dummies Questions & Answers

Deleting unwanted text from a table

Hi everyone, I have a microbial diversity table in the format ;k__kingdom; p__phylum, etc, somer rows have descriptions before the :k__ (like the af028349.1 below) is there a way I can get rid of this text (which is different every time) and keep all the other columns? Thanks a bunch! ;... (1 Reply)
Discussion started by: Juan Gonzalez
1 Replies

6. Shell Programming and Scripting

In php, Moving a new row to another table and deleting old row

Hi, I already succeed moving a new row to another table if the field from new row doesn't have the first word that I categorized (like: IRC blablabla, PTM blablabla, ADM blablabla, BS blablabla). But it can't delete the old row. Please help me with the script. my php script: INSERT INTO... (2 Replies)
Discussion started by: jazzyzha
2 Replies

7. Shell Programming and Scripting

Moving new row and deleting old row to another table

Hi, I want to move a new row to another table if the field from new row doesn't have the first word that I categorized (like: IRC blablabla, PTM blablabla, ADM blablabla, BS blablabla). I already use this script but doesn't work as I expected. CHECK_KEYWORD="$( mysql -uroot -p123456 smsd -N... (7 Replies)
Discussion started by: jazzyzha
7 Replies

8. UNIX for Dummies Questions & Answers

Filling empty cells

How do you fill empty cells that do not have any data in them with "X" in a tab delimited text file? Thanks! (4 Replies)
Discussion started by: evelibertine
4 Replies

9. UNIX for Dummies Questions & Answers

Deleting cells that contain a specific number only from a space delimited text file

I have this space delimited large text file with more than 1,000,000+ columns and about 100 rows. I want to delete all the cells that consist of just 2 (leave 2's that are not by themselves intact): File before modification aa bb cc 2 NA100 dd aa b1 c2 2 NA102 de File after modification... (1 Reply)
Discussion started by: evelibertine
1 Replies

10. Shell Programming and Scripting

rm -i and deleting files from an index table

Hi, I am trying to make a command to delete my files out the trash can, but one at a time. I am currently using rm - i to do this, but the original file locations for restoring my files are heard on a .txt file which I am using as an index table. How would I manage to make it so that if I... (21 Replies)
Discussion started by: E-WAN
21 Replies
Login or Register to Ask a Question