[Solved] Removing duplicates from the file and saving as new file

11-22-2012

Registered User

41, 1

Join Date: Feb 2012

Last Activity: 19 September 2018, 1:18 PM EDT

Posts: 41

Thanks Given: 0

Thanked 1 Time in 1 Post

[Solved] Removing duplicates from the file and saving as new file

Dear All

I have 200 data files and each files has many duplicates.

I am looking for the automated awk script such that it checks and removes the duplicates from the each file and saving them as new files for all 200 files in the respective folder.

For example my data looks like this..

Code:

HETATM 4427 LPA    1     1       5.210   9.727   8.104  1.00  0.00      PROB
HETATM 4428 LPA    1     1       5.151   9.153   8.365  1.00  0.00      PROB
HETATM 4429 LPA    1     1       2.339   7.349   5.955  1.00  0.00      PROB
HETATM 4430 LPA    1     1       2.144   8.104   5.275  1.00  0.00      PROB
HETATM 4431 LPA    1     1       2.473   8.896   5.218  1.00  0.00      PROB
HETATM 4498 LPA    1     1       1.679   7.107   7.511  1.00  0.00      PROB
HETATM 4506 LPA    1     1       2.001   8.185   5.346  1.00  0.00      PROB
HETATM 4507 LPA    1     1       2.363   7.711   4.485  1.00  0.00      PROB
HETATM 4427 LPA    1     1       5.210   9.727   8.104  1.00  0.00      PROB

I have to remove the line where "4427" is repeated twice and save as new file.

Kindly advice.

Many Thanks
Balaji

Last edited by Corona688; 11-22-2012 at 02:39 PM..

bala06

View Public Profile for bala06

Find all posts by bala06

11-22-2012

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

Does your data actually have the < and /> or is that an artifact of posting it here?

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

11-22-2012

Registered User

41, 1

Join Date: Feb 2012

Last Activity: 19 September 2018, 1:18 PM EDT

Posts: 41

Thanks Given: 0

Thanked 1 Time in 1 Post

Hi

My data doesnt have the symbols. < />

I taught of using code tag and end up with that symbols.

Many Thanks
Balaji

Quote:

Originally Posted by Corona688

Does your data actually have the < and /> or is that an artifact of posting it here?

bala06

View Public Profile for bala06

Find all posts by bala06

11-22-2012

Moderator

3,689, 1,352

Join Date: Jan 2012

Last Activity: 22 August 2020, 11:29 PM EDT

Location: Galactic Empire

Posts: 3,689

Thanks Given: 268

Thanked 1,352 Times in 1,258 Posts

I am removing the tags for checking duplicates, so you probably have to fix the tags after duplicates are removed:-

Code:

sed 's/< //g; s/\/>//g' filename | awk '!arr[$0]++'

Yoda

View Public Profile for Yoda

Visit Yoda's homepage!

Find all posts by Yoda

11-22-2012

Registered User

41, 1

Join Date: Feb 2012

Last Activity: 19 September 2018, 1:18 PM EDT

Posts: 41

Thanks Given: 0

Thanked 1 Time in 1 Post

Hi

Thanks for the code.

Is there anyway to automate the code, because I have quite number of files in multiple folders and manually entering of filename takes much more time.

Kindly advice.

Many Thanks
Balaji

bala06

View Public Profile for bala06

Find all posts by bala06

11-22-2012

Moderator

3,689, 1,352

Join Date: Jan 2012

Last Activity: 22 August 2020, 11:29 PM EDT

Location: Galactic Empire

Posts: 3,689

Thanks Given: 268

Thanked 1,352 Times in 1,258 Posts

Use a loop:-

Code:

find . -type f -name "*" | while read file
do
   dir=$( dirname $file )
   awk '!arr[$0]++' $file > ${dir}/tmp; cp ${dir}/tmp ${file}_new
done

Note: Replace * with right file pattern you want to look for.

Yoda

View Public Profile for Yoda

Visit Yoda's homepage!

Find all posts by Yoda

11-22-2012

Registered User

41, 1

Join Date: Feb 2012

Last Activity: 19 September 2018, 1:18 PM EDT

Posts: 41

Thanks Given: 0

Thanked 1 Time in 1 Post

HI

I am sorry.

Just now I found it that code is not removing the duplicates.

It restores the same and gives the ouput.

After running the script, I checked for the number "4427", its the same count as earlier.

Kindly advice.

Many Thanks
Balaji

bala06

View Public Profile for bala06

Find all posts by bala06

Programming

[Solved] Removing duplicates from the file and saving as new file

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removing duplicates from new file

Discussion started by: sagar_1986

2. Shell Programming and Scripting

Removing duplicates from new file

Discussion started by: sagar_1986

3. UNIX for Dummies Questions & Answers

Grep from pattern file without removing duplicates?

Discussion started by: Mauve

4. UNIX for Dummies Questions & Answers

Removing duplicates from a file

Discussion started by: Sri3001

5. Shell Programming and Scripting

formatting a file and removing duplicates

Discussion started by: kylle345

6. Shell Programming and Scripting

Removing Duplicates from file

Discussion started by: tinufarid

7. Shell Programming and Scripting

Removing duplicates from log file?

Discussion started by: Ilja

8. UNIX for Dummies Questions & Answers

removing duplicates of a pattern from a file

Discussion started by: ashisharora

9. Shell Programming and Scripting

Removing duplicates in a sorted file by field.

Discussion started by: kinksville

10. UNIX for Dummies Questions & Answers

removing duplicates from a file

Discussion started by: trichyselva