Removing duplicates in fixed width file which has multiple key columns


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Removing duplicates in fixed width file which has multiple key columns
# 1  
Old 12-16-2012
Removing duplicates in fixed width file which has multiple key columns

Hi All ,

I have a requirement where I need to remove duplicates from a fixed width file which has multiple key columns .Also , need to capture the duplicate records into another file .

File has 8 columns.
Key columns are col1 and col2.
Col1 has the length of 8 col 2 has the length of 3.

Please help...
# 2  
Old 12-16-2012
Please give a sample input file (showing field contents and separators), and provide the outputs that you expect to get from that input. Please use code tags when you post the input and output files.
# 3  
Old 12-16-2012
Please find the sample input file .
Code:
abc12345567hiabckd
abc12345567njipele
bcd23456890mkpele

Red colored is col1 and blue is col2

Sample output :
Duplicate file
Code:
abc12345567njipele

file with out Duplicate :

Code:
abc12345567hiabckd
bcd23456890mkpele

Please let me know if I need to provide any more details ..

Last edited by Franklin52; 12-16-2012 at 06:00 PM.. Reason: Please use code tags for data and code samples
# 4  
Old 12-16-2012
Assuming your input file is named Input, the following awk script will create a file named Output containing what you described as "file with out Duplicate" and a file named Duplicates that will contain what you described as "Duplicate file":
Code:
awk -v df=Duplicates -v of=Output '
substr($0, 1, 11) in key {
        print > df
        next
}
{       key[substr($0, 1, 11)]
        print > of
}' Input

# 5  
Old 12-17-2012
Thanks Don ,It works , I have one more scenario where Key columns are not continuous.
Ex:
Code:
abc12345567hiabckd
abc12345567hiaipele
bcd23456890mkpele

when it comes in col1 and col3 which are marked as red , can you please help me how to solve this..

Last edited by Franklin52; 12-17-2012 at 07:00 AM.. Reason: code tags
# 6  
Old 12-17-2012
Code:
awk -v df=Duplicates2 -v of=Output2 '
(substr($0, 1, 8),substr($0, 12, 3)) in key {
        print > df
        next
}
{       key[substr($0, 1, 8),substr($0, 12, 3)]
        print > of
}' Input2

Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removing duplicates from delimited file based on 2 columns

Hi guys,Got a bit of a bind I'm in. I'm looking to remove duplicates from a pipe delimited file, but do so based on 2 columns. Sounds easy enough, but here's the kicker... Column #1 is a simple ID, which is used to identify the duplicate. Once dups are identified, I need to only keep the one... (2 Replies)
Discussion started by: kevinprood
2 Replies

2. Shell Programming and Scripting

Remove Duplicates on multiple Key Columns and get the Latest Record from Date/Time Column

Hi Experts , we have a CDC file where we need to get the latest record of the Key columns Key Columns will be CDC_FLAG and SRC_PMTN_I and fetch the latest record from the CDC_PRCS_TS Can we do it with a single awk command. Please help.... (3 Replies)
Discussion started by: vijaykodukula
3 Replies

3. Shell Programming and Scripting

How to parse fixed-width columns which may include empty fields?

I am trying to selectively display several columns from a db2 query, which gives me a fixed-width output (partial output listed here): --------- -------------------------- ------------ ------ 000 0000000000198012 702 29 000 0000000000198013 ... (9 Replies)
Discussion started by: ahsh79
9 Replies

4. UNIX for Dummies Questions & Answers

Removing duplicates based on key

Hi, I have the input file with the below data: 12345|12|34 12345|13|23 3456|12|90 15670|12|13 12345|10|14 3456|12|13 I need to remove the duplicates based on the first field only. I need the output like: 12345|12|34 3456|12|90 15670|12|13 The first field needs to be unique . (4 Replies)
Discussion started by: pandeesh
4 Replies

5. UNIX for Dummies Questions & Answers

Remove duplicates based on a column in fixed width file

Hi, How to output the duplicate record to another file. We say the record is duplicate based on a column whose position is from 2 and its length is 11 characters. The file is a fixed width file. ex of Record: DTYU12333567opert tjhi kkklTRG9012 The data in bold is the key on which... (1 Reply)
Discussion started by: Qwerty123
1 Replies

6. Shell Programming and Scripting

Printing Fixed Width Columns

Hi everyone, I have been working on a pretty laborious shellscript (with bash) the last couple weeks that parses my firewall policies (from a Juniper) for me and creates a nifty little columned output. It does so using awk on a line by line basis to pull out the appropriate pieces of each... (4 Replies)
Discussion started by: cixelsyd
4 Replies

7. Shell Programming and Scripting

Removing inserted newlines from a fileld of fixed width file.

Hi champs! I have a fixed width file in which the records appear like this 11111 <fixed spaces such as 6> description for 11111 <fixed spaces such as 6> some more field to the record of 11111 22222 <fixed spaces such as 6> description for 22222 <fixed spaces such as 6> some more field to the... (8 Replies)
Discussion started by: enigma_1
8 Replies

8. Shell Programming and Scripting

Removing \n within a fixed width record

I am trying to remove a line feed (\n) within a fixed width record. I tried the tr -d ‘\n' command, but it also removes the record delimiter. Is there a way to remove the line feed without removing the record delimiter? (10 Replies)
Discussion started by: CKT_newbie88
10 Replies

9. Shell Programming and Scripting

Combining Two fixed width columns to a variable length file

Hi, I have two files. File1: File1 contains two fixed width columns ID of 15 characters length and Name is of 100 characters length. ID Name 1-43<<11 spaces>>Swapna<<94 spaces>> 1-234<<10 spaces>>Mani<<96 spaces>> 1-3456<<9 spaces>>Kapil<<95 spaces>> File2: ... (4 Replies)
Discussion started by: manneni prakash
4 Replies
Login or Register to Ask a Question