Remove 5th & 6th <td> from file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove 5th & 6th <td> from file
# 1  
Old 02-24-2011
Remove 5th & 6th <td> from file

I have a page with a 6<td> structure. I need to strip the 5th and 6th <td></td> from the structure. The data in the 4th <td> varies, I can't get a pattern on it. So, I tried setting up a count of the td occurrence and then remove the 5th and 6th, but I'm not making any progress.
Is there an easy, or easier than I've been trying, way to do this?

Test data:
Code:
<tr>
  <td>Date</td>
  <td>Good Data 1 </td>
  <td>Good Data 2</td>
  <td>Good Data 3</td>
  <td>Needs Removed</td>
  <td>Needs Removed</td>
</tr>
<tr>
  <td>Date</td>
  <td>Good Data 1 </td>
  <td>Good Data 2</td>
  <td>Good Data 3</td>
  <td>Needs Removed</td>
  <td>Needs Removed</td>
</tr>

Code:
file
CT=0
while read LINE
do
    # Check to see if the LINE is non-empty, and has a <td> tag in it.
    if [ `echo $LINE |grep -c "<td>"` != ""  ] 
      then
       # Increase the TD counter by 1
        CT=$((CT+1))
        # Check to see if the TD counter is at 3
            if [ "$CT" -eq 3 ]
                then
                echo $CT
            fi    
     fi
done <test.html

# 2  
Old 02-24-2011
Processing HTML isn't trivial. You're lucky to get the things you want on a line each...

There may be easier ways but I have no idea if your system has them. What is your system? What is your shell?
# 3  
Old 02-24-2011
Try:
Code:
awk '{if ($0~"</tr>"){n=0}if ($0~"<td>"){n++}}$0!~"<td>" || n<5' file


Last edited by bartus11; 02-24-2011 at 11:44 AM..
This User Gave Thanks to bartus11 For This Post:
# 4  
Old 02-24-2011
Code:
nawk '($0~/<\/*tr>/){n=!n}!n{x=0}/<td>/{++x}(x!=5&&x!=6)' infile

# 5  
Old 02-24-2011
Now that is truly COOOL. TY
# 6  
Old 02-24-2011
Code:
#!/bin/bash

CT=0
while read LINE
do
    [[ "$LINE" =~ "<tr>" ]] && CT=0
    [[ "$LINE" =~ "<td>" && $CT < 4 ]] && {
        echo "$LINE"
        (( CT++ ))
    }
done < infile

# 7  
Old 02-24-2011
Code:
awk 'BEGIN{RS=ORS="<tr>";FS=OFS="\n"} {print $1,$2,$3,$4,$5,$NF}' infile

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

To remove any invisible and special characters from the file(exclude @!#$&*)

Hi Guys, My requirement is to remove any invisible and special characters from the file like control M(carriage return) and alt numerics and it should not replace @#!$% abc|xyz|acd¥£ó adc|123| 12áí Please help on this. Thanks Rakesh (1 Reply)
Discussion started by: rakeshp
1 Replies

2. Shell Programming and Scripting

Display combination of 4 field uniqe record and along with concatenate 5th and 6th field.

Table ACN|NAME|CITY|CTY|NO1|NO2 115|AKKK|ASH|IND|10|15 115|AKKK|ASH|IND|20|20 115|AKKK|ASH|IND|30|35 115|AKKK|ASH|IND|30|35 112|ABC|FL|USA|15|15 112|ABC|FL|USA|25|20 112|ABC|FL|USA|25|45 i have written shell script using cut command and awk programming getting error correct it and add... (5 Replies)
Discussion started by: udhal
5 Replies

3. Shell Programming and Scripting

Replace Null with 0 in 6th column in file

Hi Forum. I tried to search for the solution online but I couldn't find specifically what I'm trying to achieve. I want to replace Null with 0 in column position#6; Any other values would be retained. Before: 52653363|3407732947|28-MAR-2014... (3 Replies)
Discussion started by: pchang
3 Replies

4. Shell Programming and Scripting

How to remove alphabets/special characters/space in the 5th field of a tab delimited file?

Thank you for 4 looking this post. We have a tab delimited file where we are facing problem in a lot of funny character. I have tried using awk but failed that is not working. In the 5th field ID which is supposed to be a integer only of that file, we are getting corrupted data as below. I... (12 Replies)
Discussion started by: Srithar
12 Replies

5. Shell Programming and Scripting

Grep the 5th and 6th position character of a word in a file

I am trying to find/grep the 5th and 6th position character (TX) of a word in a file. I tried to do grep "....TX" file The output gives me everything in the file with TX in it. I only need the output with the TX in the 5th and 6th position of the word. Any idea Example: test1 car... (5 Replies)
Discussion started by: e_mikey_2000
5 Replies

6. UNIX for Dummies Questions & Answers

Remove part of file name with sed & mv

Ok, so I have bunch of files that are named "orange__file_name.asm" and I want to batch rename them to "file_name.asm" I know that using "ls | sed s/orange__//" will get rid of the part of the file name I do not want. But how do I combine that with the mv command to actually do it? Thanks JG (5 Replies)
Discussion started by: john galt
5 Replies

7. Shell Programming and Scripting

Remove 5th & 6th <td> from file

I have a page with a 6<td> structure. I need to strip the 5th and 6th <td></td> from the structure. The data in the 4th <td> varies, so I can't get a pattern on it. So, I tried setting up a count of the td occurrence and then remove the 5th and 6th, but I'm not making an progress. Is there an... (0 Replies)
Discussion started by: dba_frog
0 Replies

8. Shell Programming and Scripting

regex to remove text before&&after comma chars

Hi, all: I have a question about "cleaning up" a huge file with regular expression(s) and sed: The init file goes like this: block1,blah-blah-blah-blah,numseries1,numseries2,numseries3,numseries4 block2,blah-blah-blah-blah-blah,numseries,numseries2,numseries3,numseries4 ...... (3 Replies)
Discussion started by: yomaya
3 Replies

9. Shell Programming and Scripting

Remove 5th character from Field1 & Print

Hi , I need to remove the 5th character of column1 and print the rest. Can anybody give some advice? Input: 0001c xx 0001r gg jj 0002y vv 0002p kk 0003q gg ll 0003v tt 0003t gg pp kk Output: 0001 xx 0001 gg jj (9 Replies)
Discussion started by: Raynon
9 Replies

10. Shell Programming and Scripting

Remove & insertion of data in a same file

I am iterating record by record through a file as below, A,B A,C A,D B,E B,F E,G E,H The same file should look like in the final output as below, A,B B,E E,G E,H B,F A,C A,D (10 Replies)
Discussion started by: videsh77
10 Replies
Login or Register to Ask a Question