[uniq + awk?] How to remove duplicate blocks of lines in files?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting [uniq + awk?] How to remove duplicate blocks of lines in files?
# 15  
Old 09-21-2011
Quote:
Originally Posted by raidzero
This command is nice, it works very well to put all items between <array> and </array> on its own line - making it easier for processing. however, it does not remove duplicate definitions.
have you tried what I'd posted with 'nawk'?
# 16  
Old 09-21-2011
Quote:
Originally Posted by vgersh99
have you tried what I'd posted with 'nawk'?
Yes I did, it produced the same output as the first nawk suggestion.

With each array and its elements on one line it might make this easier. Is there a way to delete lines based on the comparison result of the contents between the first < and > characters? Smilie
# 17  
Old 09-21-2011
Quote:
Originally Posted by raidzero
This command is nice, it works very well to put all items between <array> and </array> on its own line - making it easier for processing. however, it does not remove duplicate definitions.
Or even cuter (though it only works with the original posted data):
Code:
gawk 'BEGIN{ORS=RS="\n</string-array>\n"}!a[$0]++' infile

# 18  
Old 09-21-2011
Quote:
Originally Posted by raidzero
Yes I did, it produced the same output as the first nawk suggestion.

With each array and its elements on one line it might make this easier. Is there a way to delete lines based on the comparison result of the contents between the first < and > characters? Smilie
given your sample file myFile:
Code:
<string-array name="stringArray1">
<item>Element1|Element2<item>
<item>@android:color/black<item>
</string-array>
<style name="style1" parent="@android:style/mainStyle"></style>
<style name="style1" parent="@android:style/mainStyle">
<item name="android:textColor">@drawable/black</item>
<item name="android:typeface">sans</item>
<item name="android:textStyle">bold</item>
</style>
<string-array name="stringArray1">
<item>Element1|Element2<item>
<item>@android:color/black<item>
</string-array>
<string-array name="stringArray2"></string-array>

and the code:
Code:
nawk -F'"' '$1 ~ "=$" && NF>1 {f=($(NF-1) in a)?0:1;if(f)a[$(NF-1)]} f' myFile

I get:
Code:
<string-array name="stringArray1">
<item>Element1|Element2<item>
<item>@android:color/black<item>
</string-array>
<style name="style1" parent="@android:style/mainStyle"></style>
<item name="android:textColor">@drawable/black</item>
<item name="android:typeface">sans</item>
<item name="android:textStyle">bold</item>
</style>
<string-array name="stringArray2"></string-array>

looks fine to me. Anything wrong you can identify?
# 19  
Old 09-21-2011
Quote:
Originally Posted by binlib
Or even cuter (though it only works with the original posted data):
Code:
gawk 'BEGIN{ORS=RS="\n</string-array>\n"}!a[$0]++' infile

Based on the OP's previous explanation, one cannot hard-wire the array "names" as they differ.
# 20  
Old 09-21-2011
Quote:
Originally Posted by vgersh99
Based on the OP's previous explanation, one cannot hard-wire the array "names" as they differ.

I have it wrapped in a function to take the array name as an argument, and the function is run as many times as the number of names. it takes several input files, one for string-arrays, one for styles, plurals, dimens, strings, colors, drawables, etc (all the android resources) and produces two final xml files: one for strings and colors with each item being one line, and one called arrays.xml which is what I am working with now. I hope that clears it up.

---------- Post updated at 02:59 PM ---------- Previous update was at 02:47 PM ----------

I figured it out...

here is my final function

Code:
dupArrayDelete() {
	echo "removing duplicate"
	arrayName=$1
	echo $arrayName
	#get arrays and their contents on their own line
	#first awk prints the file ignoring new lines, putting the whole file in one line
	#sed inserts newlines after each closing </style> or </string-array>, etc
	#second awk removes all lines that have the same column 2
	awk '{printf$0}' $2 | sed 's#</'$arrayName'>#&\n#g' | awk '!A[$2]++' >> $3
}

$1 is the array name, $2 is the source file and $3 is the destination Smilie

thanks everyone!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to put the command to remove duplicate lines in my awk script?

I create a CGI in bash/html. My awk script looks like : echo "<table>" for fn in /var/www/cgi-bin/LPAR_MAP/*; do echo "<td>" echo "<PRE>" awk -F',|;' -v test="$test" ' NR==1 { split(FILENAME ,a,""); } $0 ~ test { if(!header++){ ... (12 Replies)
Discussion started by: Tim2424
12 Replies

2. Shell Programming and Scripting

Remove lines from output in files using awk

I have two large files (~250GB) that I am trying to remove the where GT: 0/0 or 1/1 or 2/2 for both files. I was going to use a bash with the below awk, which I think will find each line but how do I remove that line is that condition is found? Thank you :). Input 20 60055 . A ... (4 Replies)
Discussion started by: cmccabe
4 Replies

3. Shell Programming and Scripting

How to remove duplicate text blocks from a file?

Hi All I have a list of files which will have duplicate list of blocks of text. Following is a sample of the file, I have removed the sensitive information from the file. All the code samples starts from <TR BGCOLOR="white"> and Ends with IP address and two html tags like this. 10.14.22.22... (3 Replies)
Discussion started by: mahasona
3 Replies

4. Windows & DOS: Issues & Discussions

Remove duplicate lines from text files.

So, I have text files, one "fail.txt" And one "color.txt" I now want to use a command line (DOS) to remove ANY line that is PRESENT IN BOTH from each text file. Afterwards there shall be no duplicate lines. (1 Reply)
Discussion started by: pasc
1 Replies

5. Shell Programming and Scripting

Cant get awk 1liner to remove duplicate lines from Delimited file, get "event not found" error..help

Hi, I am on a Solaris8 machine If someone can help me with adjusting this awk 1 liner (turning it into a real awkscript) to get by this "event not found error" ...or Present Perl solution code that works for Perl5.8 in the csh shell ...that would be great. ****************** ... (3 Replies)
Discussion started by: andy b
3 Replies

6. Shell Programming and Scripting

remove duplicate lines using awk

Hi, I came to know that using awk '!x++' removes the duplicate lines. Can anyone please explain the above syntax. I want to understand how the above awk syntax removes the duplicates. Thanks in advance, sudvishw :confused: (7 Replies)
Discussion started by: sudvishw
7 Replies

7. Shell Programming and Scripting

perl/shell need help to remove duplicate lines from files

Dear All, I have multiple files having number of records, consist of more than 10 columns some column values are duplicate and i want to remove these duplicate values from these files. Duplicate values may come in different files.... all files laying in single directory.. Need help to... (3 Replies)
Discussion started by: arvindng
3 Replies

8. Shell Programming and Scripting

Command to remove duplicate lines with perl,sed,awk

Input: hello hello hello hello monkey donkey hello hello drink dance drink Output should be: hello hello monkey donkey drink dance (9 Replies)
Discussion started by: cola
9 Replies

9. UNIX for Dummies Questions & Answers

deleteing duplicate lines sing uniq while ignoring a column

I have a data set that has 4 columns, I want to know if I can delete duplicate lines while ignoring one of the columns, for example 10 chr1 ASF 30 15 chr1 ASF 20 5 chr1 ASF 30 6 chr2 EBC 15 4 chr2 EBC 30 ... I want to know if I can delete duplicate lines while ignoring column 1, so the... (5 Replies)
Discussion started by: japaneseguitars
5 Replies

10. Shell Programming and Scripting

remove all duplicate lines from all files in one folder

Hi, is it possible to remove all duplicate lines from all txt files in a specific folder? This is too hard for me maybe someone could help. lets say we have an amount of textfiles 1 or 2 or 3 or... maximum 50 each textfile has lines with text. I want all lines of all textfiles... (8 Replies)
Discussion started by: lowmaster
8 Replies
Login or Register to Ask a Question