Sort and remove duplicates in directory based on first 5 columns:

01-23-2018

Registered User

51, 0

Join Date: Jul 2013

Last Activity: 22 May 2018, 6:55 AM EDT

Posts: 51

Thanks Given: 7

Thanked 0 Times in 0 Posts

Sort and remove duplicates in directory based on first 5 columns:

I have /tmp dir with filename as:

Code:

010020001_S-FOR-Sort-SYEXC_20160229_2212101.marker
010020001_S-FOR-Sort-SYEXC_20160229_2212102.marker
010020001-S-XOR-Sort-SYEXC_20160229_2212104.marker
010020001-S-XOR-Sort-SYEXC_20160229_2212105.marker
010020001_S-ZOR-Sort-SYEXC_20160229_2212106.marker
010020001-S-FOR-Sort-SYEXC_20160229_2212102.marker

i want to sort these files based on first 5 columns and then remove the duplicates based on those same first 5 columns:

i tried below code:

Code:

ls | sort -k1,2,3,4,5

later on i felt, there is no need to sort my files just remove the duplicates as i need only unique names, order doesn't matter, so i tried this:

Code:

ls | awk -F[_-] '!seen[$1,$2,$3,$4,$5]++'

i got:

Code:

010020001_S-FOR-Sort-SYEXC_20160229_2212101.marker
010020001-S-XOR-Sort-SYEXC_20160229_2212104.marker
010020001_S-ZOR-Sort-SYEXC_20160229_2212106.marker

If you see closely i am missing one file: i.e

Code:

010020001-S-FOR-Sort-SYEXC_20160229_2212102.marker

please note the field separator in first 5 columns.

so my desired output should be :

Code:

010020001_S-FOR-Sort-SYEXC_20160229_2212101.marker
010020001-S-FOR-Sort-SYEXC_20160229_2212102.marker
010020001-S-XOR-Sort-SYEXC_20160229_2212104.marker
010020001_S-ZOR-Sort-SYEXC_20160229_2212106.marker

help me out on this, also i want to run the for loop on the desired result set..so shall i delete the duplicate filenames or store the unique filenames at some other directory and then run for loop, need some kind of advise.

TIA

gnnsprapa

View Public Profile for gnnsprapa

Find all posts by gnnsprapa

01-23-2018

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Sometimes it pays off to follow older threads to their end . Try

Code:

ls *.marker | awk  -F'[_-]' '{T = $0; sub (FS $6 ".*$", "", T)} !seen[T]++'
010020001_S-FOR-Sort-SYEXC_20160229_2212101.marker
010020001-S-FOR-Sort-SYEXC_20160229_2212102.marker
010020001-S-XOR-Sort-SYEXC_20160229_2212104.marker
010020001_S-ZOR-Sort-SYEXC_20160229_2212106.marker

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

01-23-2018

Registered User

51, 0

Join Date: Jul 2013

Last Activity: 22 May 2018, 6:55 AM EDT

Posts: 51

Thanks Given: 7

Thanked 0 Times in 0 Posts

thansk RudiC

gnnsprapa

View Public Profile for gnnsprapa

Find all posts by gnnsprapa

01-23-2018

Read Only

1,278, 486

Join Date: Sep 2012

Last Activity: 27 February 2020, 8:59 PM EST

Location: Houston, Texas, USA

Posts: 1,278

Thanks Given: 0

Thanked 486 Times in 451 Posts

Code:

for file in *.marker
do
   base_name="${file//_[0-9][0-9]*_[0-9][0-9]*[.]*/}"
   [[ "$last_base_name" = "$base_name" ]] || echo "$file"
   last_base_name="$base_name"
done

Last edited by rdrtx1; 01-23-2018 at 05:53 PM..

rdrtx1

View Public Profile for rdrtx1

Find all posts by rdrtx1

02-09-2018

Registered User

48, 7

Join Date: Jan 2018

Last Activity: 20 December 2019, 8:58 PM EST

Posts: 48

Thanks Given: 3

Thanked 7 Times in 7 Posts

use extension regex option

Code:

ls | sed -E '$!N; /^(.*\.marker)\n\1$/!P; D'

Last edited by abdulbadii; 02-09-2018 at 08:06 PM..

abdulbadii

View Public Profile for abdulbadii

Find all posts by abdulbadii

UNIX for Beginners Questions & Answers

Sort and remove duplicates in directory based on first 5 columns:

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Concatenate and sort to remove duplicates

Discussion started by: Paras Pandey

2. Shell Programming and Scripting

Sort and Remove duplicates

Discussion started by: ysvsr1

3. Shell Programming and Scripting

Removing duplicates from delimited file based on 2 columns

Discussion started by: kevinprood

4. Shell Programming and Scripting

Bash - remove duplicates without sort

Discussion started by: locoroco

5. Shell Programming and Scripting

Sort data by date first and then remove duplicates

Discussion started by: samrat dutta

6. Shell Programming and Scripting

remove duplicates and sort

Discussion started by: dvah

7. Shell Programming and Scripting

Search based on 1,2,4,5 columns and remove duplicates in the same file.

Discussion started by: onesuri

8. Shell Programming and Scripting

Remove duplicates based on the two key columns

Discussion started by: kmsekhar

9. Shell Programming and Scripting

need Shell script for Sort BASED ON FIRST FIELD and PRINT THE WHOLE FILE WITHOUT DUPLICATES

Discussion started by: tuffEnuff

10. Shell Programming and Scripting

Remove lines, Sorted with Time based columns using AWK & SORT

Discussion started by: karthikn7974