Removing duplicate items from an array

11-15-2017

Registered User

82, 2

Join Date: Jul 2012

Last Activity: 10 February 2020, 11:44 AM EST

Posts: 82

Thanks Given: 33

Thanked 2 Times in 2 Posts

Removing duplicate items from an array

Hello Experts,

I am trying to write a shell script to find duplicate items in an array, this is what i have tried :

Code:

#!/bin/bash
echo "This is another sample Progg to remove duplicate items from an array"
echo "How many number do you want to insert"
read n
for (( i=0; i<$n; i++ ))
do
   echo "Enter Number"
   read num
   arr[i]=$num
done
 echo "Insertion is done, Below are the element of the array"
 echo ${arr[@]}

for (( i=0; i<$n; i++ ))
do
    echo "------------i here is ${arr[i]}-------" 
    for (( j=0; j<$n; j++ ))
  do
       echo "j here is ${arr[j]}"
      if [[ $i -eq $j ]]
        then
           echo "Skip this iteration"
            continue
        else
           if [[ arr[i] -eq arr[j] ]]
             then
                 echo "There is a duplicate!!"
                 arr=(${arr[@]:0:$j} ${arr[@]:(( $j+1))})
                 break
           fi
      fi
  done
done
echo ${arr[@]}

The script is working properly and removing the duplicate element as expected, the problem is since i am using a counter (variable n) equivalent to number of items in array to loop, the number of iteration is always constant even though I am popping out the duplicate element as soon as it is encountered.

Any help on how to optimise the script to remove the iteration that are not required?

mukulverma2408

View Public Profile for mukulverma2408

Find all posts by mukulverma2408

11-16-2017

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

I don't think your code works as well as you think it does. Try running your script again telling it that you want it to process 5 numbers and then enter the same number 5 times. Do you get the results you wanted? Then try running your script again telling it that you want it to process 5 numbers and then enter 1, 2, 1, 3, and 1. Would you expect the final result to be 2 3 1 or 1 2 3?

See if this works better for you:

Code:

#!/bin/bash
echo "This is a sample Prog to remove duplicate integer items from an array"
echo "How many numbers do you want to insert"
read n
for (( i=0; i<$n; i++ ))
do
	echo "Enter Number"
	read num
	arr[i]=$num
done
echo "Insertion is done, Below are the elements of the array"
echo ${arr[@]}

for (( i=0; i<$((n-1)); i++ ))
do
	echo "------------arr[i] here is ${arr[i]}-------" 
	for (( j=$((i+1)); j<$n; j++ ))
	do
		echo "arr[j] here is ${arr[j]}"
		if [[ arr[i] -eq arr[j] ]]
		then
			echo "There is a duplicate!!"
			arr=(${arr[@]:0:$j} ${arr[@]:$((j+1))})
			((j--))
			((n--))
		fi
	done
done
echo ${arr[@]}

This seems to work OK with both bash version 3.2.57(1)-release (x86_64-apple-darwin16) and with ksh version (AT&T Research) 93u+ 2012-08-01. I assume it will also work with more recent releases of both of these shells.

These 3 Users Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

11-16-2017

Registered User

82, 2

Join Date: Jul 2012

Last Activity: 10 February 2020, 11:44 AM EST

Posts: 82

Thanks Given: 33

Thanked 2 Times in 2 Posts

Hi Don,

Thanks for your reply, the script is now working perfectly.

Can you please help to explain below part, the script seems to be working fine without decreasing the counter as well.

Code:

((j--))

mukulverma2408

View Public Profile for mukulverma2408

Find all posts by mukulverma2408

11-16-2017

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by mukulverma2408

Hi Don,

Thanks for your reply, the script is now working perfectly.

Can you please help to explain below part, the script seems to be working fine without decreasing the counter as well.

Code:

((j--))

The script is decreasing the counters.

After removing an element from the array of n items, the arithmetic command ((n--)) decreases the value of n. Note that n-1 is also the upper limit on the outer loop (indexed by i) and n is the upper limit on the inner loop (indexed by j).

When we remove an element from the array (i.e. remove ${array[$j]} by moving higher elements of the array down one position), the ((j--)) forces the next iteration of the inner loop to look at the same element (after moving down higher elements). Without the ((j--)) the first element that was moved down won't be examined to see if it is also a duplicate of the element indexed by the outer loop.

P.S.: Note also that your script compared every element of the array to every element of the array (skipping the cases where it was comparing an item to itself). My script compares element 1 in the array to elements 2 through n, then compares element 2 to elements 3 through n, and so on until it compares element n-1 to element n on the last iteration through the nested loops. Doing it this way eliminates the need to check for comparing an element to itself and gets rid of the duplicated comparisons with pairs of elements that have already been compared.

Last edited by Don Cragun; 11-16-2017 at 10:37 PM.. Reason: Add postscript.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

11-17-2017

Registered User

5,091, 1,931

Join Date: May 2012

Last Activity: 15 July 2020, 4:46 AM EDT

Location: Simplicity

Posts: 5,091

Thanks Given: 565

Thanked 1,931 Times in 1,668 Posts

Don, your solution was 100% correct, as always!
As a note, within $(( )) and (( )) evaluation you do not need further $ evaluations. So the following works as well

Code:

for (( i=0; i<n; i++ ))

Code:

for (( i=0; i<(n-1); i++ ))

Last but not least, in bash-4 you can use an associative array to automatically sort out duplicates:

Code:

#!/bin/bash
echo "This is a sample Prog to remove duplicate integer items from an array"
echo "How many numbers do you want to insert"
read n
declare -A hash
for (( i=0; i<n; i++ ))
do
        echo "Enter Number"
        read num
        # we address the array by $num, and do not need a value
        hash[$num]=""
done
echo "Insertion is done, Below are the elements of the array"
# dump the keys
printf "%s\n" "${!hash[@]}"
echo "now loop through the keys"
for i in "${!hash[@]}"
do
   echo "do something with $i"
done

This User Gave Thanks to MadeInGermany For This Post:

MadeInGermany

View Public Profile for MadeInGermany

Find all posts by MadeInGermany

11-17-2017

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by MadeInGermany

Don, your solution was 100% correct, as always!
As a note, within $(( )) and (( )) evaluation you do not need further $ evaluations. So the following works as well

Code:

for (( i=0; i<n; i++ ))

Code:

for (( i=0; i<(n-1); i++ ))

Last but not least, in bash-4 you can use an associative array to automatically sort out duplicates:

Code:

#!/bin/bash
echo "This is a sample Prog to remove duplicate integer items from an array"
echo "How many numbers do you want to insert"
read n
declare -A hash
for (( i=0; i<n; i++ ))
do
        echo "Enter Number"
        read num
        # we address the array by $num, and do not need a value
        hash[$num]=""
done
echo "Insertion is done, Below are the elements of the array"
# dump the keys
printf "%s\n" "${!hash[@]}"
echo "now loop through the keys"
for i in "${!hash[@]}"
do
   echo "do something with $i"
done

I avoided associative arrays in my suggestion, because I only have access to bash version 3.2.57(1)-release (x86_64-apple-darwin16) and it doesn't have associative arrays. I normally use ksh instead of bash (and it has had associative arrays since 1993). Unfortunately, bash and ksh declare associative arrays incompatibly. As you have shown above, bash declares an associative array with:

Code:

declare -A arrayname...

while ksh uses:

Code:

typeset -A arrayname...

with that one line change to your code, it will work with a 1993 or later ksh instead of a version 4 or later bash.

I agree that using an associative array is the easy way to avoid duplicated values when creating an array. But note that you when you say that it "automatically sort out duplicates", the sort order produced by ${!arrayname[@]} is an alphanumeric sort of the indices; not a numeric sort. So if the values provided as the array indices are 5, 20, and 300; the output from:

Code:

echo ${!arrayname[@]}

will be:

Code:

20 300 5

not the:

Code:

5 20 300

that would be produced by a numeric sort.

The OP for this thread never said whether or not maintaining the order of the input values was desirable or important; nor whether sorted output would be a nicety or a nuisance. If numerically sorted output is important, the user entering values would either need to supply leading zeros so all numeric value are the same width, or the list of values will have to be sorted after duplicates are removed (either with an associative array or with the indexed array used in the original post in this thread).

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

Shell Programming and Scripting

Removing duplicate items from an array

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Pass an array to awk to sequentially look for a list of items in a file

Discussion started by: LMHmedchem

2. Shell Programming and Scripting

sed to delete items in an array from a file

Discussion started by: bash_in_my_head

3. Shell Programming and Scripting

Add items to ksh array

Discussion started by: ashanabey

4. Shell Programming and Scripting

Removing Duplicate Variables : SED?

Discussion started by: Blue Solo

5. Programming

Removing Items In A ListView

Discussion started by: romeo5577

6. Shell Programming and Scripting

removing items from a file with batch

Discussion started by: amir07

7. Shell Programming and Scripting

removing items with repeated first 3 character

Discussion started by: amir07

8. UNIX for Dummies Questions & Answers

removing duplicate lines from a file

Discussion started by: ocelot