10-21-2010
earlier i am deleting all occurences on key column1 and stored into seperate file duplicates & uniq records, for that below
sort -t\| -k1 input1.txt|awk '{
x[$1]++
y[NR] = $0
} END {
for(i=1; i<=NR; i++)
{
tmp = y[i]
split(tmp,z)
print tmp> ((x[z[1]]>1) ? "output.txt" : "output2.txt")
}
}' SUBSEP="|" FS="|"
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
HI I am having a file like this
1234
12345678
1234567890123
4321
43215678
432156789028433435
I want to get ouput as
1234567890123
432156789028433435
based on key position 1-4
I am using ksh can anyone give me an idea
Thanks
pukars (1 Reply)
Discussion started by: pukars4u
1 Replies
2. Shell Programming and Scripting
Hi,
I am unable to search the duplicates in a file based on the 1st,2nd,4th,5th columns in a file and also remove the duplicates in the same file.
Source filename: Filename.csv
"1","ccc","information","5000","temp","concept","new"
"1","ddd","information","6000","temp","concept","new"... (2 Replies)
Discussion started by: onesuri
2 Replies
3. Shell Programming and Scripting
Given a file such as this I need to remove the duplicates.
00060011 PAUL BOWSTEIN ad_waq3_921_20100826_010517.txt
00060011 PAUL BOWSTEIN ad_waq3_921_20100827_010528.txt
0624-01 RUT CORPORATION ad_sade3_10_20100827_010528.txt
0624-01 RUT CORPORATION ... (13 Replies)
Discussion started by: script_op2a
13 Replies
4. UNIX for Dummies Questions & Answers
Hi,
I have the input file with the below data:
12345|12|34
12345|13|23
3456|12|90
15670|12|13
12345|10|14
3456|12|13
I need to remove the duplicates based on the first field only.
I need the output like:
12345|12|34
3456|12|90
15670|12|13
The first field needs to be unique . (4 Replies)
Discussion started by: pandeesh
4 Replies
5. Shell Programming and Scripting
Hi team,
I have 20 columns csv files. i want to find the duplicates in that file based on the column1 column10 column4 column6 coulnn8 coulunm2 . if those columns have same values . then it should be a duplicate record.
can one help me on finding the duplicates,
Thanks in advance.
... (2 Replies)
Discussion started by: baskivs
2 Replies
6. Shell Programming and Scripting
Hi All ,
I have a requirement where I need to remove duplicates from a fixed width file which has multiple key columns .Also , need to capture the duplicate records into another file .
File has 8 columns.
Key columns are col1 and col2.
Col1 has the length of 8 col 2 has the length of 3.
... (5 Replies)
Discussion started by: saj
5 Replies
7. Shell Programming and Scripting
Hi All,
I have a text file with three columns. I would like a simple script that removes lines in which column 1 has duplicate entries, but use the largest value in column 3 to decide which one to keep. For example:
Input file:
12345a rerere.rerere len=23
11111c fsdfdf.dfsdfdsf len=33 ... (3 Replies)
Discussion started by: anniecarv
3 Replies
8. Shell Programming and Scripting
Hi Experts ,
we have a CDC file where we need to get the latest record of the Key columns
Key Columns will be CDC_FLAG and SRC_PMTN_I
and fetch the latest record from the CDC_PRCS_TS
Can we do it with a single awk command.
Please help.... (3 Replies)
Discussion started by: vijaykodukula
3 Replies
9. Shell Programming and Scripting
Hi guys,Got a bit of a bind I'm in. I'm looking to remove duplicates from a pipe delimited file, but do so based on 2 columns. Sounds easy enough, but here's the kicker...
Column #1 is a simple ID, which is used to identify the duplicate.
Once dups are identified, I need to only keep the one... (2 Replies)
Discussion started by: kevinprood
2 Replies
10. UNIX for Beginners Questions & Answers
I have /tmp dir with filename as:
010020001_S-FOR-Sort-SYEXC_20160229_2212101.marker
010020001_S-FOR-Sort-SYEXC_20160229_2212102.marker
010020001-S-XOR-Sort-SYEXC_20160229_2212104.marker
010020001-S-XOR-Sort-SYEXC_20160229_2212105.marker
010020001_S-ZOR-Sort-SYEXC_20160229_2212106.marker... (4 Replies)
Discussion started by: gnnsprapa
4 Replies
LEARN ABOUT DEBIAN
parallel-slurp
PARALLEL-SLURP(1) PARALLEL-SLURP(1)
NAME
parallel-slurp - copy files from listed hosts
SYNOPSIS
parallel-slurp [OPTIONS] -h hosts.txt -L destdir remote local
DESCRIPTION
pssh provides a number of commands for executing against a group of computers, using SSH. It's most useful for operating on clusters of
homogenously-configured hosts.
parallel-slurp gathers specified files from hosts you listed.
OPTIONS
-r --recursive
recusively copy directories (OPTIONAL)
-L --localdir
output directory for remote file copies
-h --hosts
hosts file (each line "host[:port] [user]")
-l --user
username (OPTIONAL)
-p --par
max number of parallel threads (OPTIONAL)
-o --outdir
output directory for stdout files (OPTIONAL)
-e --errdir
output directory for stderr files (OPTIONAL)
-t --timeout
timeout (secs) (-1 = no timeout) per host (OPTIONAL)
-O --options
SSH options (OPTIONAL)
-v --verbose
turn on warning and diagnostic messages (OPTIONAL)
EXAMPLE
An example to copy /home/irb2/foo.txt from each host. Files gathered will be stored in /tmp/outdir/hostname/foo.txt.
# prallel-slurp -h hosts.txt -L /tmp/outdir -l irb2
/home/irb2/foo.txt foo.txt
ENVIRONMENT
All four programs take similar sets of options. All of these options can be set using the following environment variables:
o PSSH_HOSTS
o PSSH_USER
o PSSH_PAR
o PSSH_OUTDIR
o PSSH_VERBOSE
o PSSH_OPTIONS
SEE ALSO
parallel-ssh(1), parallel-scp(1), parallel-nuke(1), parallel-rsync(1), ssh(1)
AUTHOR
Brent N. Chun <bnc@theether.org>
COPYING
Copyright: 2003, 2004, 2005, 2006, 2007 Brent N. Chun
NOTES
1. bnc@theether.org
mailto:bnc@theether.org
03/30/2009 PARALLEL-SLURP(1)