Deleting Repeating lines from a txt file via script


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Deleting Repeating lines from a txt file via script
# 15  
Old 05-31-2014
I'm using a Javascript based Linux emulator over online -> http://bellard.org/jslinux/
# 16  
Old 05-31-2014
Scrutinizer's code makes assumptions about what the 1st line of a header looks like (based on your sample input) that might not be true in your real data and only requires that the 3rd line of a header contain From= somewhere on the line. It will also delete a 3 line header if the next 3 line header is the same as the previous 3 line header even if those headers are not adjacent.

The following code makes no assumptions about what the 1st two lines of a 3 line header look like, assumes that the 3rd line must start with From= and must also contain , To=. It will only remove a 3 line header if the next three lines match that header. I asked questions about the header format (all three lines) and got no response, so I still have no idea if the following code will do what is desired. I believe it meets all stated requirements:
Code:
awk '
{	b[(l = NR) % 6] = $0
	if(++bc == 6 && $0 ~ /^From=.*, To=/ && b[(l - 3) % 6] == $0 &&
		b[(l - 1) % 6] == b[(l - 4) % 6] &&
		b[(l - 2) % 6] == b[(l - 5) % 6])
		bc = 3
	if(bc == 6) {
		print b[(l - 5) % 6]
		bc--
	}
}
END {	while(bc)
		print b[(l - --bc) % 6]
}' File.txt

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or nawk.

With the following input:
Code:
20140522121432,0,12,ram
Loc=India
From=ram@xxx.com, To=ravi@yyy.com,,
1
20140522121432,0,12,ram
Loc=India
From=ram@xxx.com, To=ravi@yyy.com,,
2
H1
H2
From=dwc@home, To=dwc@work
H1
H2
From=dwc@home, To=dwc@work
H1
H2
From=dwc@home, To=dwc@work
3
4
.
.
30
20140523121432,0,12,roger
Loc=redmont
From=roger@xxx.com, To=david@yyy.com,,
20140523121432,0,12,roger
Loc=redmont
From=roger@xxx.com, To=david@yyy.com,,
1
2
3
l7
l6
l5
l4
l3
l2
l1

this script produces the output:
Code:
20140522121432,0,12,ram
Loc=India
From=ram@xxx.com, To=ravi@yyy.com,,
1
20140522121432,0,12,ram
Loc=India
From=ram@xxx.com, To=ravi@yyy.com,,
2
H1
H2
From=dwc@home, To=dwc@work
3
4
.
.
30
20140523121432,0,12,roger
Loc=redmont
From=roger@xxx.com, To=david@yyy.com,,
1
2
3
l7
l6
l5
l4
l3
l2
l1

while Scrutinizer's script produces:
Code:
20140522121432,0,12,ram
Loc=India
From=ram@xxx.com, To=ravi@yyy.com,,
1
2
H1
H2
From=dwc@home, To=dwc@work
H1
H2
From=dwc@home, To=dwc@work
H1
H2
From=dwc@home, To=dwc@work
3
4
.
.
30
20140523121432,0,12,roger
Loc=redmont
From=roger@xxx.com, To=david@yyy.com,,
1
2
3
l7
l6
l5
l4
l3
l2
l1

If you're unwilling to show us the output you're getting from these scripts with your sample input, there isn't much we can do to help.
These 2 Users Gave Thanks to Don Cragun For This Post:
# 17  
Old 05-31-2014
@Gautham:

FWIW I tried it with the emulator link you provided:
Code:
/var/root # cat test.sh                                                         
awk -F, '                                                                       
  NF==4 && $1~/^[0-9]+$/ {                          # If there are 4 fields and 
the first field is numerical, then found a header                               
    is_header=1                                                                 
  }                                                                             
  is_header {                                                                   
    current=current $0 ORS                          # Add lines to current header                                                                               
    if(/From=/){                                    # Last line of header       
      if(current!=previous) printf "%s", current                                
      previous=current                                                          
      current=""                                                                
      is_header=0                                                               
    }                                                                           
    next                                                                        
  }                                                                             
  1 
' File.txt

Code:
/var/root # sh ./test.sh                                                        
20140522121432,0,12,ram                                                         
Loc=India                                                                       
From=ram@xxx.com, To=ravi@yyy.com,,                                             
1                                                                               
2                                                                               
3                                                                               
4                                                                               
.                                                                               
.                                                                               
30                                                                              
20140523121432,0,12,roger                                                       
Loc=redmont                                                                     
From=roger@xxx.com, To=david@yyy.com,,                                          
1                                                                               
2                                                                               
3                                                                               
.                                                                               
.                                                                               
30                                                                              
......                                                                          
....                                                                            
......                                                                          
...

This User Gave Thanks to Scrutinizer For This Post:
# 18  
Old 05-31-2014
Thanks a lot Scrutinizer and Don. I moved to my unix machine it works as it needed to be.
# 19  
Old 05-31-2014
Quote:
Originally Posted by Don Cragun
Scrutinizer's code makes assumptions about what the 1st line of a header looks like (based on your sample input) that might not be true in your real data and only requires that the 3rd line of a header contain From= somewhere on the line. It will also delete a 3 line header if the next 3 line header is the same as the previous 3 line header even if those headers are not adjacent.
[..]
Good catch about the adjacency, Don. Quick fix:


Code:
awk -F, '                                                                       
  NF==4 && $1~/^[0-9]+$/ {                          # If there are 4 fields and the first field is numerical, then found a header                               
    is_header=1
    previous=current
    current=""                                                                 
  }                                                                             
  is_header {                                                                   
    current=current $0 ORS                          # Add lines to current header                                                                               
    if(/From=/){                                    # Last line of header       
      if(current!=previous) printf "%s", current                                
      is_header=0                                                               
    }                                                                           
    next                                                                        
  }
  {
    current=""
  }                                                               
  1 
' file

But further robustness would need to be added and "From=" would need to be always present...
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Creating repeating record in between file through script

apprecieate your help to resove this. My source file looke like 1001 000 HEADER 1001 001 RAJESH 1001 002 100 1001 002 200 1001 002 500 1001 006 FOOTER 1002 000 HEADER 1002 001 RAMESH 1002 002 100 1002 002 200 1002 002 500 1002 006 FOOTER my... (8 Replies)
Discussion started by: Ganesh L
8 Replies

2. Shell Programming and Scripting

Getting lines from .txt file

Hi I have a file with contents: NAMES John carrey williams How can I get all the names and store them in seperate variables(or arrays) please keep in mind that the no. of such names is not known.Three here is a bogus value ~thanks (4 Replies)
Discussion started by: leghorn
4 Replies

3. UNIX for Dummies Questions & Answers

redirecting arguments in a script to multiple lines in a .txt file

Ok hope my vocab is right here, i'm trying to write multiple sets of arguments to another file for example: I have a script that accepts four arguments and sends them to a another file $write.sh it then out in so the file receiver.txt would contain this: it then out in what... (2 Replies)
Discussion started by: austing5
2 Replies

4. UNIX for Dummies Questions & Answers

find lines in file1.txt not found in file2.txt memory problem

I have a diff command that does what I want but when comparing large text/log files, it uses up all the memory I have (sometimes over 8gig of memory) diff file1.txt file2.txt | grep '^<'| awk '{$1="";print $0}' | sed 's/^ *//' Is there a better more efficient way to find the lines in one file... (5 Replies)
Discussion started by: raptor25
5 Replies

5. Shell Programming and Scripting

merging two .txt files by alternating x lines from file 1 and y lines from file2

Hi everyone, I have two files (A and B) and want to combine them to one by always taking 10 rows from file A and subsequently 6 lines from file B. This process shall be repeated 40 times (file A = 400 lines; file B = 240 lines). Does anybody have an idea how to do that using perl, awk or sed?... (6 Replies)
Discussion started by: ink_LE
6 Replies

6. Shell Programming and Scripting

sed to cp lines x->y from 1.txt into lines a->b in file2.txt

I have one base file, and multiple target files-- each have uniform line structure so no need to use grep to find things-- can just define sections by line number. My question is quite simple-- can I use sed to copy a defined block of lines (say lines 5-10) from filename1.txt to overwrite an... (3 Replies)
Discussion started by: czar21
3 Replies

7. Shell Programming and Scripting

script to delete lines from a txt file if pattern matches

File 6 dbnawldb010-b office Memphis_Corp_SQL_Diff Memphis-Corp-SQL-Inc-Application-Backup 03/09/11 03:24:04 42 luigi-b IPNRemitDB Memphis_Corp_SQL_Diff Memphis-Corp-SQL-Inc-Application-Backup 03/10/11 00:41:36 6 ebs-sqldev1-b IPNTracking Memphis_Corp_SQL_Diff... (4 Replies)
Discussion started by: ajiwww
4 Replies

8. UNIX for Dummies Questions & Answers

Deleting lines in .txt with nonspecific value

Hello, i am new to the forum and know nothing about programing, Linux or Unix :( hope somebody can help me out. I have a .txt file that i need to delete certain lines from. After searching the forum i noticed that using "sed" was the way to go, so i installed gnuwin32 (i use windows xp... (4 Replies)
Discussion started by: luis3141
4 Replies

9. Shell Programming and Scripting

Deleting lines that contain spaces in a txt file

I need some help deleting lines in a file that contain spaces. Im sure awk or sed will work but i dont know much about those commands. Any help is appreciated :D (7 Replies)
Discussion started by: r04dw4rri0r
7 Replies

10. Shell Programming and Scripting

print all even lines of a txt file

In other news, I have a colors text file with hundreds of lines, and I want to print only the even numbered lines. for example I have this file looks something like this: ALLCOLORS.TXT red red green red blue red red red green red red blue green green green blue blue blue red blue blue blue... (1 Reply)
Discussion started by: ajp7701
1 Replies
Login or Register to Ask a Question