Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Remove duplicates based on a column in fixed width file Post 302437447 by jim mcnamara on Thursday 15th of July 2010 06:37:30 AM
Old 07-15-2010
You want duplicates in a second file, singletons in another.
This creates two files: single.txt duplicate.txt
Code:
awk '{if (arr[substr($0,2,11)]++) {print $0>"duplicate.txt"} else{print $0>"single.txt" }} ' inputfile

 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Changing one column of delimited file column to fixed width column

Hi, Iam new to unix. I have one input file . Input file : ID1~Name1~Place1 ID2~Name2~Place2 ID3~Name3~Place3 I need output such that only first column should change to fixed width column of 15 characters of length. Output File: ID1<<12 spaces>>Name1~Place1 ID2<<12... (5 Replies)
Discussion started by: manneni prakash
5 Replies

2. Shell Programming and Scripting

How to split a fixed width text file into several ones based on a column value?

Hi, I have a fixed width text file without any header row. One of the columns contains a date in YYYYMMDD format. If the original file contains 3 dates, I want my shell script to split the file into 3 small files with data for each date. I am a newbie and need help doing this. (14 Replies)
Discussion started by: bhanja_trinanja
14 Replies

3. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Given a file such as this I need to remove the duplicates. 00060011 PAUL BOWSTEIN ad_waq3_921_20100826_010517.txt 00060011 PAUL BOWSTEIN ad_waq3_921_20100827_010528.txt 0624-01 RUT CORPORATION ad_sade3_10_20100827_010528.txt 0624-01 RUT CORPORATION ... (13 Replies)
Discussion started by: script_op2a
13 Replies

4. Shell Programming and Scripting

remove duplicates based on single column

Hello, I am new to shell scripting. I have a huge file with multiple columns for example: I have 5 columns below. HWUSI-EAS000_29:1:105 + chr5 76654650 AATTGGAA HHHHG HWUSI-EAS000_29:1:106 + chr5 76654650 AATTGGAA B@HYL HWUSI-EAS000_29:1:108 + ... (4 Replies)
Discussion started by: Diya123
4 Replies

5. Shell Programming and Scripting

Removing duplicates in fixed width file which has multiple key columns

Hi All , I have a requirement where I need to remove duplicates from a fixed width file which has multiple key columns .Also , need to capture the duplicate records into another file . File has 8 columns. Key columns are col1 and col2. Col1 has the length of 8 col 2 has the length of 3. ... (5 Replies)
Discussion started by: saj
5 Replies

6. Shell Programming and Scripting

Fixed width file search based on position value

Hi, I am unable to find the right option to extract the data in the fixed width file. sample data abcd1234xgyhsyshijfkfk hujk9876 io xgla loki8787eljuwoejroiweo dkfj9098 dja Search based on position 8-9="xg" and print the entire row output ... (4 Replies)
Discussion started by: onesuri
4 Replies

7. Shell Programming and Scripting

To replace the value of the column in a fixed width file

I have a fixed with file with header & trailer length having the same length of the detail record file. The details record length of this file is 24, for Header and Trailer the records will be padded with spaces to match the record length of the file Currently I am adding 3 spaces in header... (14 Replies)
Discussion started by: ginrkf
14 Replies

8. Shell Programming and Scripting

UNIX command -Filter rows in fixed width file based on column values

Hi All, I am trying to select the rows in a fixed width file based on values in the columns. I want to select only the rows if column position 3-4 has the value AB I am using cut command to get the column values. Is it possible to check if cut -c3-4 = AB is true then select only that... (2 Replies)
Discussion started by: ashok.k
2 Replies

9. Shell Programming and Scripting

Search and replace value based on certain conditions in a fixed width file

Hi Forum. I tried searching for a solution using the internet search but I haven't been able to find any solution for what I'm trying to accomplish. I have a fixed width column file where I need to search for any occurrences of "D0" in col pos.#1-2, 10-11, 20-21 and replaced it with "XD". ... (2 Replies)
Discussion started by: pchang
2 Replies
EXIM_DBMBUILD(8)					      System Manager's Manual						  EXIM_DBMBUILD(8)

NAME
exim_dbmbuild - Build a DBM file. SYNOPSIS
exim_dbmbuild [-nolc] [-nozero] [-noduperr] [-nowarn] inputfile|- outputfile DESCRIPTION
The exim_dbmbuild program reads an input file containing keys and data in the format used by the lsearch lookup (see section 9.1). It writes a DBM file using the lower-cased alias names as keys and the remainder of the information as data. The lower-casing can be pre- vented by calling the program with the -nolc option. A terminating zero is included as part of the key string. This is expected by the dbm lookup type. However, if the option -nozero is given, exim_dbmbuild creates files without terminating zeroes in either the key strings or the data strings. The dbmnz lookup type can be used with such files. The program requires two arguments: the name of the input file (which can be a single hyphen to indicate the standard input), and the name of the output file. It creates the output under a temporary name, and then renames it if all went well. If the native DB interface is in use (USE_DB is set in a compile-time configuration file - this is common in free versions of Unix) the two file names must be different, because in this mode the Berkeley DB functions create a single output file using exactly the name given. For example, exim_dbmbuild /etc/aliases /etc/aliases.db reads the system alias file and creates a DBM version of it in /etc/aliases.db. In systems that use the ndbm routines (mostly proprietary versions of Unix), two files are used, with the suffixes .dir and .pag. In this environment, the suffixes are added to the second argument of exim_dbmbuild, so it can be the same as the first. This is also the case when the Berkeley functions are used in compatibility mode (though this is not recommended), because in that case it adds a .db suffix to the file name. If a duplicate key is encountered, the program outputs a warning, and when it finishes, its return code is 1 rather than zero, unless the -noduperr option is used. By default, only the first of a set of duplicates is used - this makes it compatible with lsearch lookups. There is an option -lastdup which causes it to use the data for the last duplicate instead. There is also an option -nowarn, which stops it listing duplicate keys to "stderr". For other errors, where it doesn't actually make a new file, the return code is 2. BUGS
This manual page needs a major re-work. If somebody knows better groff than us and has more experience in writing manual pages, any patches would be greatly appreciated. SEE ALSO
exim(8), /usr/share/doc/exim4-base/ AUTHOR
This manual page was stitched together from spec.txt by Andreas Metzler <ametzler at downhill.at.eu.org>, for the Debian GNU/Linux system (but may be used by others). March 26, 2003 EXIM_DBMBUILD(8)
All times are GMT -4. The time now is 05:46 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy