Removing duplicate items from an array


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Removing duplicate items from an array
# 1  
Old 11-15-2017
Removing duplicate items from an array

Hello Experts,

I am trying to write a shell script to find duplicate items in an array, this is what i have tried :
Code:
#!/bin/bash
echo "This is another sample Progg to remove duplicate items from an array"
echo "How many number do you want to insert"
read n
for (( i=0; i<$n; i++ ))
do
   echo "Enter Number"
   read num
   arr[i]=$num
done
 echo "Insertion is done, Below are the element of the array"
 echo ${arr[@]}

for (( i=0; i<$n; i++ ))
do
    echo "------------i here is ${arr[i]}-------" 
    for (( j=0; j<$n; j++ ))
  do
       echo "j here is ${arr[j]}"
      if [[ $i -eq $j ]]
        then
           echo "Skip this iteration"
            continue
        else
           if [[ arr[i] -eq arr[j] ]]
             then
                 echo "There is a duplicate!!"
                 arr=(${arr[@]:0:$j} ${arr[@]:(( $j+1))})
                 break
           fi
      fi
  done
done
echo ${arr[@]}

The script is working properly and removing the duplicate element as expected, the problem is since i am using a counter (variable n) equivalent to number of items in array to loop, the number of iteration is always constant even though I am popping out the duplicate element as soon as it is encountered.

Any help on how to optimise the script to remove the iteration that are not required?
# 2  
Old 11-16-2017
I don't think your code works as well as you think it does. Try running your script again telling it that you want it to process 5 numbers and then enter the same number 5 times. Do you get the results you wanted? Then try running your script again telling it that you want it to process 5 numbers and then enter 1, 2, 1, 3, and 1. Would you expect the final result to be 2 3 1 or 1 2 3?

See if this works better for you:
Code:
#!/bin/bash
echo "This is a sample Prog to remove duplicate integer items from an array"
echo "How many numbers do you want to insert"
read n
for (( i=0; i<$n; i++ ))
do
	echo "Enter Number"
	read num
	arr[i]=$num
done
echo "Insertion is done, Below are the elements of the array"
echo ${arr[@]}

for (( i=0; i<$((n-1)); i++ ))
do
	echo "------------arr[i] here is ${arr[i]}-------" 
	for (( j=$((i+1)); j<$n; j++ ))
	do
		echo "arr[j] here is ${arr[j]}"
		if [[ arr[i] -eq arr[j] ]]
		then
			echo "There is a duplicate!!"
			arr=(${arr[@]:0:$j} ${arr[@]:$((j+1))})
			((j--))
			((n--))
		fi
	done
done
echo ${arr[@]}

This seems to work OK with both bash version 3.2.57(1)-release (x86_64-apple-darwin16) and with ksh version (AT&T Research) 93u+ 2012-08-01. I assume it will also work with more recent releases of both of these shells.
These 3 Users Gave Thanks to Don Cragun For This Post:
# 3  
Old 11-16-2017
Hi Don,

Thanks for your reply, the script is now working perfectly.

Can you please help to explain below part, the script seems to be working fine without decreasing the counter as well.
Code:
((j--))

# 4  
Old 11-16-2017
Quote:
Originally Posted by mukulverma2408
Hi Don,

Thanks for your reply, the script is now working perfectly.

Can you please help to explain below part, the script seems to be working fine without decreasing the counter as well.
Code:
((j--))

The script is decreasing the counters.

After removing an element from the array of n items, the arithmetic command ((n--)) decreases the value of n. Note that n-1 is also the upper limit on the outer loop (indexed by i) and n is the upper limit on the inner loop (indexed by j).

When we remove an element from the array (i.e. remove ${array[$j]} by moving higher elements of the array down one position), the ((j--)) forces the next iteration of the inner loop to look at the same element (after moving down higher elements). Without the ((j--)) the first element that was moved down won't be examined to see if it is also a duplicate of the element indexed by the outer loop.

P.S.: Note also that your script compared every element of the array to every element of the array (skipping the cases where it was comparing an item to itself). My script compares element 1 in the array to elements 2 through n, then compares element 2 to elements 3 through n, and so on until it compares element n-1 to element n on the last iteration through the nested loops. Doing it this way eliminates the need to check for comparing an element to itself and gets rid of the duplicated comparisons with pairs of elements that have already been compared.

Last edited by Don Cragun; 11-16-2017 at 10:37 PM.. Reason: Add postscript.
# 5  
Old 11-17-2017
Don, your solution was 100% correct, as always!
As a note, within $(( )) and (( )) evaluation you do not need further $ evaluations. So the following works as well
Code:
for (( i=0; i<n; i++ ))

Code:
for (( i=0; i<(n-1); i++ ))

Last but not least, in bash-4 you can use an associative array to automatically sort out duplicates:
Code:
#!/bin/bash
echo "This is a sample Prog to remove duplicate integer items from an array"
echo "How many numbers do you want to insert"
read n
declare -A hash
for (( i=0; i<n; i++ ))
do
        echo "Enter Number"
        read num
        # we address the array by $num, and do not need a value
        hash[$num]=""
done
echo "Insertion is done, Below are the elements of the array"
# dump the keys
printf "%s\n" "${!hash[@]}"
echo "now loop through the keys"
for i in "${!hash[@]}"
do
   echo "do something with $i"
done

This User Gave Thanks to MadeInGermany For This Post:
# 6  
Old 11-17-2017
Quote:
Originally Posted by MadeInGermany
Don, your solution was 100% correct, as always!
As a note, within $(( )) and (( )) evaluation you do not need further $ evaluations. So the following works as well
Code:
for (( i=0; i<n; i++ ))

Code:
for (( i=0; i<(n-1); i++ ))

Last but not least, in bash-4 you can use an associative array to automatically sort out duplicates:
Code:
#!/bin/bash
echo "This is a sample Prog to remove duplicate integer items from an array"
echo "How many numbers do you want to insert"
read n
declare -A hash
for (( i=0; i<n; i++ ))
do
        echo "Enter Number"
        read num
        # we address the array by $num, and do not need a value
        hash[$num]=""
done
echo "Insertion is done, Below are the elements of the array"
# dump the keys
printf "%s\n" "${!hash[@]}"
echo "now loop through the keys"
for i in "${!hash[@]}"
do
   echo "do something with $i"
done

I avoided associative arrays in my suggestion, because I only have access to bash version 3.2.57(1)-release (x86_64-apple-darwin16) and it doesn't have associative arrays. I normally use ksh instead of bash (and it has had associative arrays since 1993). Unfortunately, bash and ksh declare associative arrays incompatibly. As you have shown above, bash declares an associative array with:
Code:
declare -A arrayname...

while ksh uses:
Code:
typeset -A arrayname...

with that one line change to your code, it will work with a 1993 or later ksh instead of a version 4 or later bash.

I agree that using an associative array is the easy way to avoid duplicated values when creating an array. But note that you when you say that it "automatically sort out duplicates", the sort order produced by ${!arrayname[@]} is an alphanumeric sort of the indices; not a numeric sort. So if the values provided as the array indices are 5, 20, and 300; the output from:
Code:
echo ${!arrayname[@]}

will be:
Code:
20 300 5

not the:
Code:
5 20 300

that would be produced by a numeric sort.

The OP for this thread never said whether or not maintaining the order of the input values was desirable or important; nor whether sorted output would be a nicety or a nuisance. If numerically sorted output is important, the user entering values would either need to supply leading zeros so all numeric value are the same width, or the list of values will have to be sorted after duplicates are removed (either with an associative array or with the indexed array used in the original post in this thread).
Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Pass an array to awk to sequentially look for a list of items in a file

Hello, I need to collect some statistical results from a series of files that are being generated by other software. The files are tab delimited. There are 4 different sets of statistics in each file where there is a line indicating what the statistic set is, followed by 5 lines of values. It... (8 Replies)
Discussion started by: LMHmedchem
8 Replies

2. Shell Programming and Scripting

sed to delete items in an array from a file

I need to create a shell script to delete multiple items (Strings) at a time from a file. I need to iterate through a list of strings. My plan is to create an array and then iterate through the array. My code is not working #!/bin/bash -x declare -a array=(one, two, three, four)... (5 Replies)
Discussion started by: bash_in_my_head
5 Replies

3. Shell Programming and Scripting

Add items to ksh array

Hi I need a code help put below items in to array and display in order like this Thanks, Ashan 0AD4 R2 03A1 000198701333 0AD4 R1 270B 000198701616 ------ 0AD4 R2 03A1 000198701333 0AD4 R1 270B 000198701616 (1 Reply)
Discussion started by: ashanabey
1 Replies

4. Shell Programming and Scripting

Removing Duplicate Variables : SED?

I have a file that needs to index and remove all duplicate variables (keeping the first of the duplicated). Then in another file, from the index I need to find and replace all those duplicated variables we deleted with their primary variable (the one we kept). I think this is a SED question, but if... (13 Replies)
Discussion started by: Blue Solo
13 Replies

5. Programming

Removing Items In A ListView

Hi everyone! So I have a listView on my Form named "officeView" I already have the code to add and update info into it, but Im having troubles deleting items out of it. :/ Now I know how to delete an Item from the listView, but I want the item before the deleted item to become automatically... (0 Replies)
Discussion started by: romeo5577
0 Replies

6. Shell Programming and Scripting

removing items from a file with batch

Please assist with awk scirpts: I need to remove items from a file in a batch: The file that I will remove from has the following format: abc00tef:10.81.12.3 abc01tef:10.81.12.3 abc02tef:10.81.12.3 abc03tef:10.81.12.3 abc04tef:10.81.12.3 abc05tef:10.81.12.3 I have a file which... (5 Replies)
Discussion started by: amir07
5 Replies

7. Shell Programming and Scripting

removing items with repeated first 3 character

AWK help: I have a file with following format. I need to remove any entries which are repeated based on first 3 characters. So from the following files I need to remove any entries start with "mas". mas01bct mas02bct mas03bct mas01bct mas01bct mas01bct mas11bct mas01bct mas01bct... (11 Replies)
Discussion started by: amir07
11 Replies

8. UNIX for Dummies Questions & Answers

removing duplicate lines from a file

Hi, I am trying to remove duplicate lines from a file. For example the contents of example.txt is: this is a test 2342 this is a test 34343 this is a test 43434 and i want to remove the "this is a test" lines only and end up with the numbers in the file, that is, end up with: 2342... (4 Replies)
Discussion started by: ocelot
4 Replies
Login or Register to Ask a Question