Sponsored Content
Top Forums Shell Programming and Scripting To remove duplicates from pipe delimited file Post 302865857 by drl on Sunday 20th of October 2013 07:35:17 AM
Old 10-20-2013
Hi.

We once needed a code that would run on a number of different systems, yet produce consistent results. We ran into the situation that utility uniq was not consistent among the systems. We introducing an option:
Code:
--last
allows over-writing, effectively keeping the most-recently
seen instance. Some versions of uniq on other *nix systems use
the most recent (Solaris), the default is compatibility with
GNU/Linux uniq, which keeps the first occurrence.

By substituting this idea for the system version of uniq, we were able to produce consistent results.

I think this problem can approached with the sort idea of danmero, but with the stable option set, and a "final filter" that eliminates duplicates. Because the file is already sorted, no additional storage is needed: in the final filter, if the fields of the incoming record differ from that in storage, then write out the saved line, and save the new line. If the fields are the same, then save the new instance of the line. Our code was in perl, but awk could be as easily used.

Best wishes ... cheers, drl

Last edited by drl; 10-20-2013 at 08:44 AM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to generate a pipe ( | ) delimited file?

:)Hi Friends, I have certain log files extracted. I want it to be converted in pipe ( | ) delimited file. How do i do it? E.g. Account Balance : 123456789 Rs O/P (Account Balance: | 123456789 Rs) Account Balance (Last) > 987654321 Rs O/P (Account Balance (Last) | 987654321 Rs) Last... (5 Replies)
Discussion started by: anushree.a
5 Replies

2. Shell Programming and Scripting

convert a pipe delimited file to a':" delimited file

i have a file whose data is like this:: osr_pe_assign|-120|wg000d@att.com|4| osr_evt|-21|wg000d@att.com|4| pe_avail|-21|wg000d@att.com|4| osr_svt|-11|wg000d@att.com|4| pe_mop|-13|wg000d@att.com|4| instar_ready|-35|wg000d@att.com|4| nsdnet_ready|-90|wg000d@att.com|4|... (6 Replies)
Discussion started by: priyanka3006
6 Replies

3. Shell Programming and Scripting

Remove SPACES between PIPE delimited file

This is my input file with extra information in the HEADER and leading & trailing SPACES between PIPE delimiter. 02/04/2010 Dynamic List Display 1 --------------------------------------------------------------------------------------... (6 Replies)
Discussion started by: srimitta
6 Replies

4. Shell Programming and Scripting

How to convert a space delimited file into a pipe delimited file using shellscript?

Hi All, I have space delimited file similar to the one as shown below.. I need to convert it as a pipe delimited, the values inside the pipe delimited file should be as highlighted... AA ATIU2345098809 009697 005374 BB ATIU2345097809 005445 006518 CC ATIU9685098809 003215 003571 DD... (7 Replies)
Discussion started by: nithins007
7 Replies

5. Shell Programming and Scripting

Help with converting Pipe delimited file to Tab Delimited

I have a file which was pipe delimited, I need to make it tab delimited. I tried with sed but no use cat file | sed 's/|//t/g' The above command substituted "/t" not tab in the place of pipe. Sample file: abc|123|2012-01-30|2012-04-28|xyz have to convert to: abc 123... (6 Replies)
Discussion started by: karumudi7
6 Replies

6. Shell Programming and Scripting

Remove few columns from pipe delimited file

I have file as below column1|column2|column3|column4|column5| fill1|fill2|fill3|fill4|fill5| abc1|abc2|abc3|abc4|abc5| . . . . i need to remove column2,3, from that file column1|column4|column5| fill1|fill4|fill5| abc1|abc4|abc5| . . . (3 Replies)
Discussion started by: greenworld123
3 Replies

7. Shell Programming and Scripting

How to ignore Pipe in Pipe delimited file?

Hi guys, I need to know how i can ignore Pipe '|' if Pipe is coming as a column in Pipe delimited file for eg: file 1: xx|yy|"xyz|zzz"|zzz|12... using below awk command awk 'BEGIN {FS=OFS="|" } print $3 i would get xyz But i want as : xyz|zzz to consider as whole column... (13 Replies)
Discussion started by: rohit_shinez
13 Replies

8. Shell Programming and Scripting

Removing duplicates from delimited file based on 2 columns

Hi guys,Got a bit of a bind I'm in. I'm looking to remove duplicates from a pipe delimited file, but do so based on 2 columns. Sounds easy enough, but here's the kicker... Column #1 is a simple ID, which is used to identify the duplicate. Once dups are identified, I need to only keep the one... (2 Replies)
Discussion started by: kevinprood
2 Replies

9. UNIX for Dummies Questions & Answers

Need to convert a pipe delimited text file to tab delimited

Hi, I have a rquirement in unix as below . I have a text file with me seperated by | symbol and i need to generate a excel file through unix commands/script so that each value will go to each column. ex: Input Text file: 1|A|apple 2|B|bottle excel file to be generated as output as... (9 Replies)
Discussion started by: raja kakitapall
9 Replies

10. Shell Programming and Scripting

How to remove new line characters from data rows in a Pipe delimited file?

I have a file as below Emp1|FirstName|MiddleName|LastName|Address|Pincode|PhoneNumber 1234|FirstName1|MiddleName2|LastName3| Add1 || ADD2|123|000000000 2345|FirstName2|MiddleName3|LastName4| Add1 || ADD2| 234|000000000 OUTPUT : ... (1 Reply)
Discussion started by: styris
1 Replies
uniq(1) 						      General Commands Manual							   uniq(1)

Name
       uniq - report repeated lines in a file

Syntax
       uniq [-udc[+n][-n]] [input[output]]

Description
       The  command  reads  the  input	file comparing adjacent lines.	In the normal case, the second and succeeding copies of repeated lines are
       removed; the remainder is written on the output file.  Note that repeated lines must be adjacent in order to be found.  For further  infor-
       mation, see

Options
       The n arguments specify skipping an initial portion of each line in the comparison:

       -n Skips specified number of fields.  A field is defined as a string of non-space, non-tab characters separated by tabs and spaces from its
	  neighbors.

       +n Skips specified number of characters in addition to fields.  Fields are skipped before characters.

       -c Displays number of repetitions, if any, for each line.

       -d Displays only lines that were repeated.

       -u Displays only unique (nonrepeated) lines.

       If the -u flag is used, just the lines that are not repeated in the original file are output.  The -d option specifies  that  one  copy	of
       just the repeated lines is to be written.  The normal mode output is the union of the -u and -d mode outputs.

       The  -c option supersedes -u and -d and generates an output report in default style but with each line preceded by a count of the number of
       times it occurred.

See Also
       comm(1), sort(1)

																	   uniq(1)
All times are GMT -4. The time now is 01:36 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy