Sponsored Content
Top Forums Shell Programming and Scripting Remove duplicates based on the two key columns Post 302464845 by kmsekhar on Thursday 21st of October 2010 05:32:43 AM
Old 10-21-2010
earlier i am deleting all occurences on key column1 and stored into seperate file duplicates & uniq records, for that below

sort -t\| -k1 input1.txt|awk '{
x[$1]++
y[NR] = $0
} END {
for(i=1; i<=NR; i++)
{
tmp = y[i]
split(tmp,z)
print tmp> ((x[z[1]]>1) ? "output.txt" : "output2.txt")
}
}' SUBSEP="|" FS="|"
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

removing duplicates based on key

HI I am having a file like this 1234 12345678 1234567890123 4321 43215678 432156789028433435 I want to get ouput as 1234567890123 432156789028433435 based on key position 1-4 I am using ksh can anyone give me an idea Thanks pukars (1 Reply)
Discussion started by: pukars4u
1 Replies

2. Shell Programming and Scripting

Search based on 1,2,4,5 columns and remove duplicates in the same file.

Hi, I am unable to search the duplicates in a file based on the 1st,2nd,4th,5th columns in a file and also remove the duplicates in the same file. Source filename: Filename.csv "1","ccc","information","5000","temp","concept","new" "1","ddd","information","6000","temp","concept","new"... (2 Replies)
Discussion started by: onesuri
2 Replies

3. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Given a file such as this I need to remove the duplicates. 00060011 PAUL BOWSTEIN ad_waq3_921_20100826_010517.txt 00060011 PAUL BOWSTEIN ad_waq3_921_20100827_010528.txt 0624-01 RUT CORPORATION ad_sade3_10_20100827_010528.txt 0624-01 RUT CORPORATION ... (13 Replies)
Discussion started by: script_op2a
13 Replies

4. UNIX for Dummies Questions & Answers

Removing duplicates based on key

Hi, I have the input file with the below data: 12345|12|34 12345|13|23 3456|12|90 15670|12|13 12345|10|14 3456|12|13 I need to remove the duplicates based on the first field only. I need the output like: 12345|12|34 3456|12|90 15670|12|13 The first field needs to be unique . (4 Replies)
Discussion started by: pandeesh
4 Replies

5. Shell Programming and Scripting

finding duplicates in csv based on key columns

Hi team, I have 20 columns csv files. i want to find the duplicates in that file based on the column1 column10 column4 column6 coulnn8 coulunm2 . if those columns have same values . then it should be a duplicate record. can one help me on finding the duplicates, Thanks in advance. ... (2 Replies)
Discussion started by: baskivs
2 Replies

6. Shell Programming and Scripting

Removing duplicates in fixed width file which has multiple key columns

Hi All , I have a requirement where I need to remove duplicates from a fixed width file which has multiple key columns .Also , need to capture the duplicate records into another file . File has 8 columns. Key columns are col1 and col2. Col1 has the length of 8 col 2 has the length of 3. ... (5 Replies)
Discussion started by: saj
5 Replies

7. Shell Programming and Scripting

Remove duplicates based on a field's value

Hi All, I have a text file with three columns. I would like a simple script that removes lines in which column 1 has duplicate entries, but use the largest value in column 3 to decide which one to keep. For example: Input file: 12345a rerere.rerere len=23 11111c fsdfdf.dfsdfdsf len=33 ... (3 Replies)
Discussion started by: anniecarv
3 Replies

8. Shell Programming and Scripting

Remove Duplicates on multiple Key Columns and get the Latest Record from Date/Time Column

Hi Experts , we have a CDC file where we need to get the latest record of the Key columns Key Columns will be CDC_FLAG and SRC_PMTN_I and fetch the latest record from the CDC_PRCS_TS Can we do it with a single awk command. Please help.... (3 Replies)
Discussion started by: vijaykodukula
3 Replies

9. Shell Programming and Scripting

Removing duplicates from delimited file based on 2 columns

Hi guys,Got a bit of a bind I'm in. I'm looking to remove duplicates from a pipe delimited file, but do so based on 2 columns. Sounds easy enough, but here's the kicker... Column #1 is a simple ID, which is used to identify the duplicate. Once dups are identified, I need to only keep the one... (2 Replies)
Discussion started by: kevinprood
2 Replies

10. UNIX for Beginners Questions & Answers

Sort and remove duplicates in directory based on first 5 columns:

I have /tmp dir with filename as: 010020001_S-FOR-Sort-SYEXC_20160229_2212101.marker 010020001_S-FOR-Sort-SYEXC_20160229_2212102.marker 010020001-S-XOR-Sort-SYEXC_20160229_2212104.marker 010020001-S-XOR-Sort-SYEXC_20160229_2212105.marker 010020001_S-ZOR-Sort-SYEXC_20160229_2212106.marker... (4 Replies)
Discussion started by: gnnsprapa
4 Replies
PARALLEL-SLURP(1)														 PARALLEL-SLURP(1)

NAME
parallel-slurp - copy files from listed hosts SYNOPSIS
parallel-slurp [OPTIONS] -h hosts.txt -L destdir remote local DESCRIPTION
pssh provides a number of commands for executing against a group of computers, using SSH. It's most useful for operating on clusters of homogenously-configured hosts. parallel-slurp gathers specified files from hosts you listed. OPTIONS
-r --recursive recusively copy directories (OPTIONAL) -L --localdir output directory for remote file copies -h --hosts hosts file (each line "host[:port] [user]") -l --user username (OPTIONAL) -p --par max number of parallel threads (OPTIONAL) -o --outdir output directory for stdout files (OPTIONAL) -e --errdir output directory for stderr files (OPTIONAL) -t --timeout timeout (secs) (-1 = no timeout) per host (OPTIONAL) -O --options SSH options (OPTIONAL) -v --verbose turn on warning and diagnostic messages (OPTIONAL) EXAMPLE
An example to copy /home/irb2/foo.txt from each host. Files gathered will be stored in /tmp/outdir/hostname/foo.txt. # prallel-slurp -h hosts.txt -L /tmp/outdir -l irb2 /home/irb2/foo.txt foo.txt ENVIRONMENT
All four programs take similar sets of options. All of these options can be set using the following environment variables: o PSSH_HOSTS o PSSH_USER o PSSH_PAR o PSSH_OUTDIR o PSSH_VERBOSE o PSSH_OPTIONS SEE ALSO
parallel-ssh(1), parallel-scp(1), parallel-nuke(1), parallel-rsync(1), ssh(1) AUTHOR
Brent N. Chun <bnc@theether.org> COPYING
Copyright: 2003, 2004, 2005, 2006, 2007 Brent N. Chun NOTES
1. bnc@theether.org mailto:bnc@theether.org 03/30/2009 PARALLEL-SLURP(1)
All times are GMT -4. The time now is 03:30 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy