Remove Duplicates on multiple Key Columns and get the Latest Record from Date/Time Column
Hi Experts ,
we have a CDC file where we need to get the latest record of the Key columns
Key Columns will be CDC_FLAG and SRC_PMTN_I
and fetch the latest record from the CDC_PRCS_TS
Can we do it with a single awk command.
Please help.
Code:
CDC_PRCS_TS|CDC_SEQ_I|CDC_FLAG|CDC_UPDT_USER|SRC_PMTN_I|TGT_PMTN_I|PMTN_N|PMTN_DESC_T
2013-03-27 10:32:30|0|I|NOT SET |124|215|PROO-2:PMI PROJECT|INITIAL PHASE
2013-03-27 10:32:31|0|I|NOT SET |124|215|PROO-2:PMI PROJECT|INITIAL PHASE
2013-03-27 10:32:32|0|I|NOT SET |124|215|PRMO-2:PMI PROJECT|INITIAL PHASE
2013-03-27 10:32:33|0|I|NOT SET |124|215|PROO-2:PMI PROJECT|INITIAL PHASE
2013-03-27 10:32:30|0|U|NOT SET |125|216|PROO-2:PMI PROJECT|INITIAL PHASE
2013-03-27 10:32:31|0|U|NOT SET |125|216|PROO-2:PMI PROJECT|INITIAL PHASE
2013-03-27 10:32:32|0|U|NOT SET |125|216|PROO-2:PMI PROJECT|INITIAL PHASE
2013-03-27 10:32:33|0|U|NOT SET |125|216|PROO-2:PMI PROJECT|INITIAL PHASE
2013-03-27 10:32:30|0|U|NOT SET |125|216|PROO-2:PMI PROJECT|INITIAL PHASE
2013-03-27 10:32:30|0|I|NOT SET |126|217|PROO-2:PMI PROJECT|INITIAL PHASE
2013-03-27 10:32:30|0|U|NOT SET |127|768|MDS PROJECT|UAT PHASE
2013-03-27 10:32:30|0|U|NOT SET |128|454|MDS PROJECT|UAT PHASE
2013-03-27 10:32:30|0|U|NOT SET |129|234|PROJECT|UAT PHASE
2013-03-27 10:32:30|0|D|NOT SET |130|123|PROMO-2:PMI PROJECT|INITIAL PHASE
2013-03-27 10:32:30|0|D|NOT SET |131|212|PROO-2:PMI PROJECT|INITIAL PHASE
2013-03-27 10:32:30|0|D|NOT SET |132|213|PROO-2:PMI PROJECT|INITIAL PHASE
Last edited by Franklin52; 04-26-2013 at 03:27 AM..
Reason: Please use code tags
Hi All,
I needs to fetch unique records based on a keycolumn(ie., first column1) and also I needs to get the records which are having max value on column2 in sorted manner... and duplicates have to store in another output file.
Input :
Input.txt
1234,0,x
1234,1,y
5678,10,z
9999,10,k... (7 Replies)
Hi,
I am unable to search the duplicates in a file based on the 1st,2nd,4th,5th columns in a file and also remove the duplicates in the same file.
Source filename: Filename.csv
"1","ccc","information","5000","temp","concept","new"
"1","ddd","information","6000","temp","concept","new"... (2 Replies)
Given a file such as this I need to remove the duplicates.
00060011 PAUL BOWSTEIN ad_waq3_921_20100826_010517.txt
00060011 PAUL BOWSTEIN ad_waq3_921_20100827_010528.txt
0624-01 RUT CORPORATION ad_sade3_10_20100827_010528.txt
0624-01 RUT CORPORATION ... (13 Replies)
Hi team,
I have 20 columns csv files. i want to find the duplicates in that file based on the column1 column10 column4 column6 coulnn8 coulunm2 . if those columns have same values . then it should be a duplicate record.
can one help me on finding the duplicates,
Thanks in advance.
... (2 Replies)
Hi All ,
I have a requirement where I need to remove duplicates from a fixed width file which has multiple key columns .Also , need to capture the duplicate records into another file .
File has 8 columns.
Key columns are col1 and col2.
Col1 has the length of 8 col 2 has the length of 3.
... (5 Replies)
Hi,
I have file named file1.txt with below contents
cat file1.txt
1/29/2014 0:00,706886
1/30/2014 0:00,791265
1/31/2014 0:00,987087
2/1/2014 0:00,1098572
2/2/2014 0:00,572477
2/3/2014 0:00,701715
I want to display as below
1/29/2014,706886
1/30/2014,791265
1/31/2014,987087... (5 Replies)
I have /tmp dir with filename as:
010020001_S-FOR-Sort-SYEXC_20160229_2212101.marker
010020001_S-FOR-Sort-SYEXC_20160229_2212102.marker
010020001-S-XOR-Sort-SYEXC_20160229_2212104.marker
010020001-S-XOR-Sort-SYEXC_20160229_2212105.marker
010020001_S-ZOR-Sort-SYEXC_20160229_2212106.marker... (4 Replies)
Hi Experts,
Please bear with me, i need help
I am learning AWk and stuck up in one issue.
First point : I want to sum up column value for column 7, 9, 11,13 and column15 if rows in column 5 are duplicates.No action to be taken for rows where value in column 5 is unique.
Second point : For... (1 Reply)
Hello all,
I need to filter a dataframe composed of several columns of data to remove the duplicates according to one of the columns. I did it with pandas. In the main time, I need that the last column that contains all different data ( not redundant) is conserved in the output like this:
A ... (5 Replies)
Discussion started by: pedro88
5 Replies
LEARN ABOUT DEBIAN
pc-project
PC-PROJECT(1) Probcons Manual PC-PROJECT(1)NAME
pc-project - Program to project multiple alignment to pairwise alignments.
SYNOPSIS
pc-project [ALIGNMENT] [-nocompressgaps]
NOTE
pc-project is named project in the original sources, but has been remamed to avoid collision with other program names.
SEE ALSO pc-makegnufile(1), pc-compare(1),from the probcons-extra package, and probcons(1) and probcons-RNA(1) from the probcons package.
REFERENCE
Please cite Do, C.B., Mahabhashyam, M.S.P., Brudno, M., and Batzoglou, S. 2005. PROBCONS: Probabilistic Consistency-based Multiple Sequence
Alignment. Genome Research 15: 330-340.
AUTHORS
Chuong Do <chuongdo@cs.stanford.edu>
Wrote probcons in collaboration with Michael Brudno in the research group of Serafim Batzoglou, Department of Computer Science,
Stanford University.
Charles Plessy <charles-debian-nospam@plessy.org>
Wrote this manpage in DocBook XML for the Debian distribution.
COPYRIGHT
This program and its manpage are in the public domain.
pc-project 1.12 2007-04-04 PC-PROJECT(1)