Sort and remove duplicates in directory based on first 5 columns:

 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Sort and remove duplicates in directory based on first 5 columns:
# 1  
Old 01-23-2018
Sort and remove duplicates in directory based on first 5 columns:

I have /tmp dir with filename as:

Code:
010020001_S-FOR-Sort-SYEXC_20160229_2212101.marker
010020001_S-FOR-Sort-SYEXC_20160229_2212102.marker
010020001-S-XOR-Sort-SYEXC_20160229_2212104.marker
010020001-S-XOR-Sort-SYEXC_20160229_2212105.marker
010020001_S-ZOR-Sort-SYEXC_20160229_2212106.marker
010020001-S-FOR-Sort-SYEXC_20160229_2212102.marker

i want to sort these files based on first 5 columns and then remove the duplicates based on those same first 5 columns:

i tried below code:
Code:
ls | sort -k1,2,3,4,5

later on i felt, there is no need to sort my files just remove the duplicates as i need only unique names, order doesn't matter, so i tried this:

Code:
ls | awk -F[_-] '!seen[$1,$2,$3,$4,$5]++'

i got:

Code:
010020001_S-FOR-Sort-SYEXC_20160229_2212101.marker
010020001-S-XOR-Sort-SYEXC_20160229_2212104.marker
010020001_S-ZOR-Sort-SYEXC_20160229_2212106.marker

If you see closely i am missing one file: i.e
Code:
010020001-S-FOR-Sort-SYEXC_20160229_2212102.marker

please note the field separator in first 5 columns.

so my desired output should be :
Code:
010020001_S-FOR-Sort-SYEXC_20160229_2212101.marker
010020001-S-FOR-Sort-SYEXC_20160229_2212102.marker
010020001-S-XOR-Sort-SYEXC_20160229_2212104.marker
010020001_S-ZOR-Sort-SYEXC_20160229_2212106.marker

help me out on this, also i want to run the for loop on the desired result set..so shall i delete the duplicate filenames or store the unique filenames at some other directory and then run for loop, need some kind of advise.

TIA
# 2  
Old 01-23-2018
Sometimes it pays off to follow older threads to their end . Try
Code:
ls *.marker | awk  -F'[_-]' '{T = $0; sub (FS $6 ".*$", "", T)} !seen[T]++'
010020001_S-FOR-Sort-SYEXC_20160229_2212101.marker
010020001-S-FOR-Sort-SYEXC_20160229_2212102.marker
010020001-S-XOR-Sort-SYEXC_20160229_2212104.marker
010020001_S-ZOR-Sort-SYEXC_20160229_2212106.marker

This User Gave Thanks to RudiC For This Post:
# 3  
Old 01-23-2018
thansk RudiCSmilie
# 4  
Old 01-23-2018
Code:
for file in *.marker
do
   base_name="${file//_[0-9][0-9]*_[0-9][0-9]*[.]*/}"
   [[ "$last_base_name" = "$base_name" ]] || echo "$file"
   last_base_name="$base_name"
done


Last edited by rdrtx1; 01-23-2018 at 05:53 PM..
# 5  
Old 02-09-2018
use extension regex option

Code:
ls | sed -E '$!N; /^(.*\.marker)\n\1$/!P; D'


Last edited by abdulbadii; 02-09-2018 at 08:06 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Concatenate and sort to remove duplicates

Following is the input. 1st and 3rd block are same(block starts here with '*' and ends before blank line) , 2nd and 4th blocks are also the same: cat <file> * Wed Feb 24 2016 Tariq Saeed <tariq.x.saeed@mail.com> 2.0.7-1.0.7 - add vmcore dump support for ocfs2 * Mon Jun 8 2015 Brian Maly... (4 Replies)
Discussion started by: Paras Pandey
4 Replies

2. Shell Programming and Scripting

Sort and Remove duplicates

Here is my task : I need to sort two input files and remove duplicates in the output files : Sort by 13 characters from 97 Ascending Sort by 1 characters from 96 Ascending If duplicates are found retain the first value in the file the input files are variable length, convert... (4 Replies)
Discussion started by: ysvsr1
4 Replies

3. Shell Programming and Scripting

Removing duplicates from delimited file based on 2 columns

Hi guys,Got a bit of a bind I'm in. I'm looking to remove duplicates from a pipe delimited file, but do so based on 2 columns. Sounds easy enough, but here's the kicker... Column #1 is a simple ID, which is used to identify the duplicate. Once dups are identified, I need to only keep the one... (2 Replies)
Discussion started by: kevinprood
2 Replies

4. Shell Programming and Scripting

Bash - remove duplicates without sort

I need to use bash to remove duplicates without using sort first. I can not use: cat file | sort | uniq But when I use only cat file | uniq some duplicates are not removed. (4 Replies)
Discussion started by: locoroco
4 Replies

5. Shell Programming and Scripting

Sort data by date first and then remove duplicates

Hi , I have below data inside a file named ref.psv . I want to create a shell script which will do the below 2 points : (1) sort the file content first based on the latest date which is the last column in the file (actual file its the 175th column) (2)after sorting the file based on latest date... (3 Replies)
Discussion started by: samrat dutta
3 Replies

6. Shell Programming and Scripting

remove duplicates and sort

Hi, I'm using the below command to sort and remove duplicates in a file. But, i need to make this applied to the same file instead of directing it to another. Thanks (6 Replies)
Discussion started by: dvah
6 Replies

7. Shell Programming and Scripting

Search based on 1,2,4,5 columns and remove duplicates in the same file.

Hi, I am unable to search the duplicates in a file based on the 1st,2nd,4th,5th columns in a file and also remove the duplicates in the same file. Source filename: Filename.csv "1","ccc","information","5000","temp","concept","new" "1","ddd","information","6000","temp","concept","new"... (2 Replies)
Discussion started by: onesuri
2 Replies

8. Shell Programming and Scripting

Remove duplicates based on the two key columns

Hi All, I needs to fetch unique records based on a keycolumn(ie., first column1) and also I needs to get the records which are having max value on column2 in sorted manner... and duplicates have to store in another output file. Input : Input.txt 1234,0,x 1234,1,y 5678,10,z 9999,10,k... (7 Replies)
Discussion started by: kmsekhar
7 Replies

9. Shell Programming and Scripting

need Shell script for Sort BASED ON FIRST FIELD and PRINT THE WHOLE FILE WITHOUT DUPLICATES

Can some one provide me a shell script. I have file with many columns and many rows. need to sort the first column and then remove the duplicates records if exists.. finally print the full data with first coulm as unique. Sort BASED ON FIRST FIELD and remove the duplicates if exists... (2 Replies)
Discussion started by: tuffEnuff
2 Replies

10. Shell Programming and Scripting

Remove lines, Sorted with Time based columns using AWK & SORT

Hi having a file as follows MediaErr.log 84 Server1 Policy1 Schedule1 master1 05/08/2008 02:12:16 84 Server1 Policy1 Schedule1 master1 05/08/2008 02:22:47 84 Server1 Policy1 Schedule1 master1 05/08/2008 03:41:26 84 Server1 Policy1 ... (1 Reply)
Discussion started by: karthikn7974
1 Replies
Login or Register to Ask a Question