Sponsored Content
Top Forums UNIX for Dummies Questions & Answers How to extract duplicate records with associated header record Post 302102755 by run_eim on Friday 12th of January 2007 12:51:10 PM
Old 01-12-2007
How to extract duplicate records with associated header record

All,

I have a task to search through several hundred files and extract duplicate detail records and keep them grouped with their header record. If no duplicate detail record exists, don't pull the header. For example, an input file could look like this:

input.txt
HA
D1
D2
D2
D3
D4
D4
HB
D1
D2
HC
D1
D1
D2
D3
D3

The output would be:

output.txt
HA
D2
D4
HC
D1
D3

Would it be possible to do this with AWK? I do not know python.

Thank you for your time.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Count No of Records in File without counting Header and Trailer Records

I have a flat file and need to count no of records in the file less the header and the trailer record. I would appreciate any and all asistance Thanks Hadi Lalani (2 Replies)
Discussion started by: guiguy
2 Replies

2. UNIX for Dummies Questions & Answers

change order of fields in header record

Hello, after 9 months of archiving 1000 files, now, i need to change the order of fields in the header record. some very large, space padded files. HEADERCAS05212008D0210DOMEST01(spacepadded to record length 210) must now be 05212008HEADERCASD0210DOMEST01(spacepadded to record length 210) ... (1 Reply)
Discussion started by: JohnMario
1 Replies

3. Shell Programming and Scripting

Insertion of Header record

A header record is to be inserted in the begining of a flat file without using extra file or new file. It should be inserted into same file. Advace thanks for all help... (7 Replies)
Discussion started by: shreekrishnagd
7 Replies

4. Shell Programming and Scripting

Specific Header after every 30 records

Hi All, I have got a requirement. I have a source file, EMPFULL.txt and I need to split the data for every 30 records and place a Typical Header as below with system and page number too. 2012.01.03 Employee Dept Report 1... (6 Replies)
Discussion started by: srk409
6 Replies

5. Shell Programming and Scripting

Deleting duplicate records from file 1 if records from file 2 match

I have 2 files "File 1" is delimited by ";" and "File 2" is delimited by "|". File 1 below (3 record shown): Doc1;03/01/2012;New York;6 Main Street;Mr. Smith 1;Mr. Jones Doc2;03/01/2012;Syracuse;876 Broadway;John Davis;Barbara Lull Doc3;03/01/2012;Buffalo;779 Old Windy Road;Charles... (2 Replies)
Discussion started by: vestport
2 Replies

6. Shell Programming and Scripting

Approach on Header record

All, I currently have a requirement to fetch a Date value from a table. And then insert a Header record into a file along with that date value. ex: echo "HDR"" "`date +%Y%j` `date +%Y%m%d` In the above example I used julian date and standard date using Current Date. But the requirement... (0 Replies)
Discussion started by: cmaroju
0 Replies

7. Shell Programming and Scripting

Copy header values into records

I'm using a shell script to manipulate a data file. I have a large file with two sets of data samples (tracking memory consumption) taken over a long period of time, so I have many samples. The problem is that all the data is in the same file so that each sample contains two sets of data.... (2 Replies)
Discussion started by: abercrom
2 Replies

8. Shell Programming and Scripting

Extract timestamp from first record in xml file and it checks if not it will replace first record

I have test.xml <emp><id>101</id><name>AAA</name><date>06/06/14 1811</date></emp> <Join><id>101</id><city>london</city><date>06/06/14 2011</date></join> <Join><id>101</id><city>new york</city><date>06/06/14 1811</date></join> <Join><id>101</id><city>sydney</city><date>06/06/14... (2 Replies)
Discussion started by: vsraju
2 Replies

9. UNIX for Beginners Questions & Answers

Help in printing records where there is a 'header' in the first record ???

Hi, I have a backup report that unfortunately has some kind of hanging indent thing where the first line contains one column more than the others I managed to get the output that I wanted using awk, but just wanting to know if there is short way of doing it using the same awk Below is what... (2 Replies)
Discussion started by: newbie_01
2 Replies

10. Shell Programming and Scripting

CSV File:Filter duplicate records from column1 & another column having unique record

Hi Experts, I have csv file with 30, 40 columns Pasting just 2 column for problem description. Need to print error if below combination is not present in file check for column-1 (DocumentNumber) and filter columns where value in DocumentNumber field is same. For all such rows, the field... (7 Replies)
Discussion started by: as7951
7 Replies
dsimport(1)						    BSD General Commands Manual 					       dsimport(1)

NAME
dsimport SYNOPSIS
dsimport filepath nodepath O|M|A|I|N [options] dsimport --version dsimport --help DESCRIPTION
dsimport is a tool for importing records into an Open Directory source. USAGE
filepath is the path of the file to be imported. nodepath is the path of the Open Directory node where the records should be imported. A flag that specifies how to handle conflicting records: O overwrite of any existing records that have the same record name, UID or GID. All previous attribute values are deleted. M merge import data with existing records or create the record if it does not exist. I ignore the record if there is a conflicting name, UID or GID. A append the data to existing records, but do not create a record if it does not exist. N no duplicate checking should be done. Note this could cause failures and/or a slower import process. A list of options and their descriptions: --crypt is used to signify that all user passwords are crypt-based. Entries in the import file can also be prefixed with {CRYPT} on a per record basis if not all users are crypt-based. By default all passwords are assumed to be provided as listed in the import file. --force attribute value forces a specific value for the named attribute for all records during the import. The new value will overwrite any value specified in the import file. This option may be specified multiple times for forcing more than one attribute. --groupid value is the GID used for any records that do not specify a primary GID. --grouppreset value designate a preset record to be applied to imported group records. --loglevel value changes the amount of logging detail output to the log file. --outputfile value Outputs a plist to the specified file with a list of changed users or groups and rejected records due to name conflicts. Also includes a list of deleted records (overwrite mode), and lists of records that failed and succeeded during import. The format of this file is likely to change in a future release of Mac OS X. --password value is the admin's password for import operations. Used to authenticate to the directory node during import. A secure prompt will be used for interactive input if not supplied via parameter. Using the prompt method is the most secure method of providing password to dsimport. --recordformat string passes in the delimiters and attributes and record type to specify the order and names of attributes in the file to be imported. An example record format string: 0x0A 0x5C 0x3A 0x2C dsRecTypeStandard:Users 7 dsAttrTypeStandard:RecordName dsAttrTypeStandard:Password dsAttrTypeStandard:UniqueID dsAttrTypeStandard:PrimaryGroupID dsAttrTypeStandard:RealName dsAttrTypeStandard:NFSHomeDirectory dsAttrTypeStandard:UserShell A special value of IGNORE can be used for values that should be ignored in the import file on a record-by-record basis. --recordtype type Override the record type defined in the import file. For example, to import ComputerGroups as ComputerLists, use: --recordtype dsRecTypeStandard:ComputerLists The opposite works for importing ComputerLists as ComputerGroups, and so on. --remotehost hostname | ipaddress connects to a remote host at the network address specified. Commonly used to import to a remote Mac OS X Server. --remoteusername value specifies user name to use for the remote connection. --remotepassword value specifies password to use for the remote connection. A secure prompt will be used to ask for the password if --remoteusername is specified and --remotepassword is not. Using the prompt method is the most secure method of providing password to dsimport. --startid value indicates the ID number to start with when the import tool generates user or group IDs for any import file that lacks an ID as part of the import data. --template StandardUser | StandardGroup is used for delimited import of files that lack field descriptions. StandardUser contains the following fields in the order: 1. RecordName 2. Password 3. UniqueID 4. PrimaryGroupID 5. DistinguishedName 6. NFSHomeDirectory 7. UserShell StandardGroup contains the following fields in the order: 1. RecordName 2. Password 3. PrimaryGroupID 4. GroupMembership --username value is the admin username to use when importing records. If this is not specified the current user is the default name. Also, if used in conjunction with --remotehost then this admin user will be used for the Open Directory node whereas the username provided in --remoteusername will be used for the remote connection. If this option is left off but --remoteusername is provided, then the remote username will be used for both the connection and for importing records. --userpreset value designate a preset record to be applied to imported user records. EXAMPLES
To import a standard dsexport file into the Local database: dsimport myimportFile /Local/Default I --username administrator --password adminpassword FILES
/usr/bin/dsimport ~/Library/Logs/ImportExport SEE ALSO
DirectoryService(8) dsexport(1) Darwin May 31, 2019 Darwin
All times are GMT -4. The time now is 01:52 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy