Deleting Repeating lines from a txt file via script

05-31-2014

Registered User

41, 0

Join Date: Apr 2014

Last Activity: 6 February 2015, 3:07 PM EST

Location: online~/var/home/ #

Posts: 41

Thanks Given: 8

Thanked 0 Times in 0 Posts

I'm using a Javascript based Linux emulator over online -> http://bellard.org/jslinux/

Gautham

View Public Profile for Gautham

Find all posts by Gautham

05-31-2014

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Scrutinizer's code makes assumptions about what the 1st line of a header looks like (based on your sample input) that might not be true in your real data and only requires that the 3rd line of a header contain From= somewhere on the line. It will also delete a 3 line header if the next 3 line header is the same as the previous 3 line header even if those headers are not adjacent.

The following code makes no assumptions about what the 1st two lines of a 3 line header look like, assumes that the 3rd line must start with From= and must also contain , To=. It will only remove a 3 line header if the next three lines match that header. I asked questions about the header format (all three lines) and got no response, so I still have no idea if the following code will do what is desired. I believe it meets all stated requirements:

Code:

awk '
{	b[(l = NR) % 6] = $0
	if(++bc == 6 && $0 ~ /^From=.*, To=/ && b[(l - 3) % 6] == $0 &&
		b[(l - 1) % 6] == b[(l - 4) % 6] &&
		b[(l - 2) % 6] == b[(l - 5) % 6])
		bc = 3
	if(bc == 6) {
		print b[(l - 5) % 6]
		bc--
	}
}
END {	while(bc)
		print b[(l - --bc) % 6]
}' File.txt

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or nawk.

With the following input:

Code:

20140522121432,0,12,ram
Loc=India
From=ram@xxx.com, To=ravi@yyy.com,,
1
20140522121432,0,12,ram
Loc=India
From=ram@xxx.com, To=ravi@yyy.com,,
2
H1
H2
From=dwc@home, To=dwc@work
H1
H2
From=dwc@home, To=dwc@work
H1
H2
From=dwc@home, To=dwc@work
3
4
.
.
30
20140523121432,0,12,roger
Loc=redmont
From=roger@xxx.com, To=david@yyy.com,,
20140523121432,0,12,roger
Loc=redmont
From=roger@xxx.com, To=david@yyy.com,,
1
2
3
l7
l6
l5
l4
l3
l2
l1

this script produces the output:

Code:

20140522121432,0,12,ram
Loc=India
From=ram@xxx.com, To=ravi@yyy.com,,
1
20140522121432,0,12,ram
Loc=India
From=ram@xxx.com, To=ravi@yyy.com,,
2
H1
H2
From=dwc@home, To=dwc@work
3
4
.
.
30
20140523121432,0,12,roger
Loc=redmont
From=roger@xxx.com, To=david@yyy.com,,
1
2
3
l7
l6
l5
l4
l3
l2
l1

while Scrutinizer's script produces:

Code:

20140522121432,0,12,ram
Loc=India
From=ram@xxx.com, To=ravi@yyy.com,,
1
2
H1
H2
From=dwc@home, To=dwc@work
H1
H2
From=dwc@home, To=dwc@work
H1
H2
From=dwc@home, To=dwc@work
3
4
.
.
30
20140523121432,0,12,roger
Loc=redmont
From=roger@xxx.com, To=david@yyy.com,,
1
2
3
l7
l6
l5
l4
l3
l2
l1

If you're unwilling to show us the output you're getting from these scripts with your sample input, there isn't much we can do to help.

These 2 Users Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

05-31-2014

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

@Gautham:

FWIW I tried it with the emulator link you provided:

Code:

/var/root # cat test.sh                                                         
awk -F, '                                                                       
  NF==4 && $1~/^[0-9]+$/ {                          # If there are 4 fields and 
the first field is numerical, then found a header                               
    is_header=1                                                                 
  }                                                                             
  is_header {                                                                   
    current=current $0 ORS                          # Add lines to current header                                                                               
    if(/From=/){                                    # Last line of header       
      if(current!=previous) printf "%s", current                                
      previous=current                                                          
      current=""                                                                
      is_header=0                                                               
    }                                                                           
    next                                                                        
  }                                                                             
  1 
' File.txt

Code:

/var/root # sh ./test.sh                                                        
20140522121432,0,12,ram                                                         
Loc=India                                                                       
From=ram@xxx.com, To=ravi@yyy.com,,                                             
1                                                                               
2                                                                               
3                                                                               
4                                                                               
.                                                                               
.                                                                               
30                                                                              
20140523121432,0,12,roger                                                       
Loc=redmont                                                                     
From=roger@xxx.com, To=david@yyy.com,,                                          
1                                                                               
2                                                                               
3                                                                               
.                                                                               
.                                                                               
30                                                                              
......                                                                          
....                                                                            
......                                                                          
...

This User Gave Thanks to Scrutinizer For This Post:

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

05-31-2014

Registered User

41, 0

Join Date: Apr 2014

Last Activity: 6 February 2015, 3:07 PM EST

Location: online~/var/home/ #

Posts: 41

Thanks Given: 8

Thanked 0 Times in 0 Posts

Thanks a lot Scrutinizer and Don. I moved to my unix machine it works as it needed to be.

Gautham

View Public Profile for Gautham

Find all posts by Gautham

05-31-2014

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Quote:

Originally Posted by Don Cragun

Good catch about the adjacency, Don. Quick fix:

Code:

awk -F, '                                                                       
  NF==4 && $1~/^[0-9]+$/ {                          # If there are 4 fields and the first field is numerical, then found a header                               
    is_header=1
    previous=current
    current=""                                                                 
  }                                                                             
  is_header {                                                                   
    current=current $0 ORS                          # Add lines to current header                                                                               
    if(/From=/){                                    # Last line of header       
      if(current!=previous) printf "%s", current                                
      is_header=0                                                               
    }                                                                           
    next                                                                        
  }
  {
    current=""
  }                                                               
  1 
' file

But further robustness would need to be added and "From=" would need to be always present...

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

Shell Programming and Scripting

Deleting Repeating lines from a txt file via script

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Creating repeating record in between file through script

Discussion started by: Ganesh L

2. Shell Programming and Scripting

Getting lines from .txt file

Discussion started by: leghorn

3. UNIX for Dummies Questions & Answers

redirecting arguments in a script to multiple lines in a .txt file

Discussion started by: austing5

4. UNIX for Dummies Questions & Answers

find lines in file1.txt not found in file2.txt memory problem

Discussion started by: raptor25

5. Shell Programming and Scripting

merging two .txt files by alternating x lines from file 1 and y lines from file2

Discussion started by: ink_LE

6. Shell Programming and Scripting

sed to cp lines x->y from 1.txt into lines a->b in file2.txt

Discussion started by: czar21

7. Shell Programming and Scripting

script to delete lines from a txt file if pattern matches

Discussion started by: ajiwww

8. UNIX for Dummies Questions & Answers

Deleting lines in .txt with nonspecific value

Discussion started by: luis3141

9. Shell Programming and Scripting

Deleting lines that contain spaces in a txt file

Discussion started by: r04dw4rri0r

10. Shell Programming and Scripting

print all even lines of a txt file

Discussion started by: ajp7701