Removing duplicate terms in a file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Removing duplicate terms in a file
# 1  
Old 11-08-2012
Removing duplicate terms in a file

Hi everybody
I have a .txt file that contains some assembly code for optimizing it i need to remove some replicated parts.
for example I have:
Code:
e_li r0,-1 
e_li r25,-1  
e_lis r25,0000  
 
add r31, r31 ,r0 
       
e_li r28,-1  
e_lis r28,0000  
 
add r31, r31 ,r0 
       
e_li r28,-1  
e_lis r28,0000  
 
add r31, r31 ,r0 
       
e_li r2,-1  
e_lis r2,0000  
 
add r31, r31 ,r0 
       
e_li r9,-1  
e_lis r9,0000  
 
add r31, r31 ,r0 
       
e_li r24,-1  
e_lis r24,0000  
 
add r31, r31 ,r0 
       
e_li r21,-1  
e_lis r21,0000  
 
add r31, r31 ,r0 
       
e_li r28,-1  
e_lis r28,0000  
 
add r31, r31 ,r0

So if in a way I could remove the replicated parts the final code would look like:
Code:
e_li r0,-1 
e_li r25,-1  
e_lis r25,0000  
 
add r31, r31 ,r0 
       
e_li r28,-1  
e_lis r28,0000  
 
add r31, r31 ,r0 
              
e_li r2,-1  
e_lis r2,0000  
 
add r31, r31 ,r0 
       
e_li r9,-1  
e_lis r9,0000  
 
add r31, r31 ,r0 
       
e_li r24,-1  
e_lis r24,0000  
 
add r31, r31 ,r0 
       
e_li r21,-1  
e_lis r21,0000  
 
add r31, r31 ,r0

Thanks for your help
# 2  
Old 11-08-2012
try:
Code:
awk '
{sub(" *$",""); sub("^ *",""); l=l":"$0; }
/add/ {if (b[l]) {l=""; next;} else {a[c++]=l; b[l]=l;};l=""}
END {
  for (i=0; i<c; i++) {
    sub("^:", "", a[i]);
    gsub(":", "\n", a[i]);
    printf a[i];
    print "";
  }
}
' a.txt

This User Gave Thanks to rdrtx1 For This Post:
# 3  
Old 11-09-2012
Thanks rdrtx1, seems work! Smilie
# 4  
Old 11-09-2012
Alternatively (just for fun):
Code:
awk '{getline p} !A[$0,p]++{print $0 ORS p}' RS= ORS='\n\n' infile

But this is probably not practical, since it would be sensitive to extra spaces in the input file..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removing duplicate sequences and modifying a text file

Hi. I've tried several different programs to try and solve this problem, but none of them seem to have done exactly what I want (and I need the file in a very specific format). I have a large file of DNA sequences in a multifasta file like this, with around 15 000 genes: ... (2 Replies)
Discussion started by: 4galaxy7
2 Replies

2. Shell Programming and Scripting

Removing Duplicate Rows in a file

Hello I have a file with contents like this... Part1 Field2 Field3 Field4 (line1) Part2 Field2 Field3 Field4 (line2) Part3 Field2 Field3 Field4 (line3) Part1 Field2 Field3 Field4 (line4) Part4 Field2 Field3 Field4 (line5) Part5 Field2 Field3 Field4 (line6) Part2 Field2 Field3 Field4... (7 Replies)
Discussion started by: ekbaazigar
7 Replies

3. UNIX for Dummies Questions & Answers

Removing a set of Duplicate lines from a file

Hi, How do i remove a set of duplicate lines from a file. My file contains the lines: abc def ghi abc def ghi jkl mno pqr jkl mno (1 Reply)
Discussion started by: raosr020
1 Replies

4. Shell Programming and Scripting

Removing a block of duplicate lines from a file

Hi all, I have a file with the data 1 abc 2 123 3 ; 4 rao 5 bell 6 ; 7 call 8 abc 9 123 10 ; 11 rao 12 bell 13 ; (10 Replies)
Discussion started by: raosr020
10 Replies

5. Shell Programming and Scripting

Removing duplicate records in a file based on single column explanation

I was reading this thread. It looks like a simpler way to say this is to only keep uniq lines based on field or column 1. https://www.unix.com/shell-programming-scripting/165717-removing-duplicate-records-file-based-single-column.html Can someone explain this command please? How are there no... (5 Replies)
Discussion started by: cokedude
5 Replies

6. Shell Programming and Scripting

Removing duplicate records in a file based on single column

Hi, I want to remove duplicate records including the first line based on column1. For example inputfile(filer.txt): ------------- 1,3000,5000 1,4000,6000 2,4000,600 2,5000,700 3,60000,4000 4,7000,7777 5,999,8888 expected output: ---------------- 3,60000,4000 4,7000,7777... (5 Replies)
Discussion started by: G.K.K
5 Replies

7. Shell Programming and Scripting

removing duplicate lines while maintaing coherence with second file

So I have two files. The first file, file1.txt, has lines of numbers separated by commas. file1.txt 10,2,30,50 22,6,3,15,16,100 73,55 78,40,33,30,11 73,55 99,82,85 22,6,3,15,16,100 The second file, file2.txt, has sentences. file2.txt "the cat is fat" "I like eggs" "fish live in... (6 Replies)
Discussion started by: adrunknarwhal
6 Replies

8. Shell Programming and Scripting

Removing duplicate records from 2 files

Can anyone help me to removing duplicate records from 2 separate files in UNIX? Please find the sample records for both the files cat Monday.dat 3FAHP0JA1AR319226MOHMED ATEK 966504453742 SAU2010DE 3LNHL2GC6AR636361HEA DEUK CHOI 821057314531 KOR2010LE 3MEHM0JG7AR652083MUTLAB NAL-NAFISAH... (4 Replies)
Discussion started by: zooby
4 Replies

9. Shell Programming and Scripting

removing the duplicate lines in a file

Hi, I need to concatenate three files in to one destination file.In this if some duplicate data occurs it should be deleted. eg: file1: ----- data1 value1 data2 value2 data3 value3 file2: ----- data1 value1 data4 value4 data5 value5 file3: ----- data1 value1 data4 value4 (3 Replies)
Discussion started by: Sharmila_P
3 Replies

10. UNIX for Dummies Questions & Answers

removing duplicate lines from a file

Hi, I am trying to remove duplicate lines from a file. For example the contents of example.txt is: this is a test 2342 this is a test 34343 this is a test 43434 and i want to remove the "this is a test" lines only and end up with the numbers in the file, that is, end up with: 2342... (4 Replies)
Discussion started by: ocelot
4 Replies
Login or Register to Ask a Question