Sponsored Content
Top Forums Shell Programming and Scripting Removing duplicates on a single "column" (delimited file) Post 302965416 by Rufinofr on Thursday 28th of January 2016 03:03:15 PM
Old 01-28-2016
Removing duplicates on a single "column" (delimited file)

Hello !

I'm quite new to linux but haven't found a script to do this task, unfortunately my knowledge is quite limited on shellscripts...

Could you guys help me removing the duplicate lines of a file, based only on a single "column"?

For example:

Code:
M202034357;01/2008;J30RJ021;Ciclo 01 de Faturamento;4000029579;01F010800017;270500591331;175130959;000074873-AB;9.9;RIO DE JANEIRO
M202034357;01/2008;J30AP096;Ciclo 01 de Faturamento;4000029579;01F010800017;270500589332;175123672;000001842-AB;9.9;MACAPA
M202034357;01/2008;J30RJ021;Ciclo 01 de Faturamento;4000043657;01F010800002;118000613348;175138146;000161122-AA;9.9;RIO DE JANEIRO
M202034357;01/2008;J30DF061;Ciclo 06 de Faturamento;4000034956;06F010800020;269800607228;173691920;000030011-AA;9.9;GUARA
M202034357;01/2008;J30RJ021;Ciclo 01 de Faturamento;4000029579;01F010800017;270500588743;175121705;000188224-AA;9.9;NITEROI
M202034357;01/2008;J30SP011;Ciclo 01 de Faturamento;4000029579;01F010800017;270500589299;175123639;000241055-AB;9.9;SAO PAULO
M202034357;01/2008;J30SP011;Ciclo 01 de Faturamento;4000029579;01F010800017;270500589787;175125437;000256241-AB;9.9;SAO PAULO
M202034357;01/2008;J30AM097;Ciclo 01 de Faturamento;4000043657;01F010800002;118000614870;175142866;000026153-AA;4.99;MANAUS
M202034357;01/2008;J30PA091;Ciclo 01 de Faturamento;4000043657;01F010800002;118000614087;175140485;000023707-AA;9.9;BELEM
M202034357;01/2008;J30PA091;Ciclo 01 de Faturamento;4000043785;01F010800027;270200624370;175114167;000011219-AB;9.9;BELÉM
M202034357;01/2008;J30SP011;Ciclo 01 de Faturamento;4000029579;01F010800017;270500591956;175132948;000441734-AA;9.9;SAO BERNARDO DO CAMPO
M202034357;01/2008;J30SP011;Ciclo 01 de Faturamento;4000029579;01F010800017;270500590036;175126399;000458131-AA;9.9;SAO CAETANO DO SUL
M202034357;01/2008;J30SP011;Ciclo 01 de Faturamento;4000029579;01F010800017;270500591958;175132950;000441735-AA;9.9;SAO PAULO
M202034357;01/2008;J30SP011;Ciclo 01 de Faturamento;4000043657;01F010800002;118000612017;175130959;000469327-AA;9.9;GUARULHOS

So, the yellow field is found duplicate on a few lines... Like the first and last ones. But the data between them are different many times.

It doesn't matter for my purpose to have the ocurrence twice, even if the info before and after is different... So what I need is a script (maybe awk or cut) that recognizes the same string on position 8 and, if it was already found before, delete that whole line, but keep every other lines that do not contain a repeated string at position 8.

Ideas?

Last edited by jim mcnamara; 01-28-2016 at 04:15 PM.. Reason: code tags
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

"Join" or "Merge" more than 2 files into single output based on common key (column)

Hi All, I have working (Perl) code to combine 2 input files into a single output file using the join function that works to a point, but has the following limitations: 1. I am restrained to 2 input files only. 2. Only the "matched" fields are written out to the "matched" output file and... (1 Reply)
Discussion started by: Katabatic
1 Replies

2. Shell Programming and Scripting

how to create flat file delimited by "\002"

I need to create a flat file with columns delimited by "\002" (octal 2) I tried using the simple echo. name="Adam Smith" age=40 address="1 main st" city="New York" echo ${name}"\002"${age}"\002"${address}"\002"${city} > mytmp but it creates a delimiter with different octal... (4 Replies)
Discussion started by: injey
4 Replies

3. Shell Programming and Scripting

awk command to replace ";" with "|" and ""|" at diferent places in line of file

Hi, I have line in input file as below: 3G_CENTRAL;INDONESIA_(M)_TELKOMSEL;SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL My expected output for line in the file must be : "1-Radon1-cMOC_deg"|"LDIndex"|"3G_CENTRAL|INDONESIA_(M)_TELKOMSEL"|LAST|"SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL" Can someone... (7 Replies)
Discussion started by: shis100
7 Replies

4. Shell Programming and Scripting

Cant get awk 1liner to remove duplicate lines from Delimited file, get "event not found" error..help

Hi, I am on a Solaris8 machine If someone can help me with adjusting this awk 1 liner (turning it into a real awkscript) to get by this "event not found error" ...or Present Perl solution code that works for Perl5.8 in the csh shell ...that would be great. ****************** ... (3 Replies)
Discussion started by: andy b
3 Replies

5. Shell Programming and Scripting

PERL "filtering the log file removing the duplicates

Hi folks, I have a log file in the below format and trying to get the output of the unique ones based on mnemonic IN PERL. Could any one please let me know with the code and the logic ? Severity Mnemonic Log Message 7 CLI_SCHEDULER Logfile for scheduled CLI... (3 Replies)
Discussion started by: scriptscript
3 Replies

6. Shell Programming and Scripting

Removing duplicates from delimited file based on 2 columns

Hi guys,Got a bit of a bind I'm in. I'm looking to remove duplicates from a pipe delimited file, but do so based on 2 columns. Sounds easy enough, but here's the kicker... Column #1 is a simple ID, which is used to identify the duplicate. Once dups are identified, I need to only keep the one... (2 Replies)
Discussion started by: kevinprood
2 Replies

7. UNIX for Dummies Questions & Answers

Using "mailx" command to read "to" and "cc" email addreses from input file

How to use "mailx" command to do e-mail reading the input file containing email address, where column 1 has name and column 2 containing “To” e-mail address and column 3 contains “cc” e-mail address to include with same email. Sample input file, email.txt Below is an sample code where... (2 Replies)
Discussion started by: asjaiswal
2 Replies

8. UNIX for Dummies Questions & Answers

Replacing "." with "GG" in a certain column of a file that has heading

Hi, all, I have a file that looks like: ## XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX ## YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY #AA AB AC AD AE AF AG AH AI AJ AK AL 20 60039 60039 ... (5 Replies)
Discussion started by: kush
5 Replies

9. Shell Programming and Scripting

replace the contents of first column of file "X" with second Column of file "X" in file "Y"

Hi! I am having 02 files. In first file" X" I am having 02 Columns TCP-5100 Sybase_5100 TCP-5600 Sybase_5600 Second file "Y" for example-- :services ( :AdminInfo ( :chkpf_uid ("{A2F79713-B67D-4409-83A4-A90804E983E9}") :ClassName (rule_services) ) :compound ()... (12 Replies)
Discussion started by: shahid1632
12 Replies

10. Shell Programming and Scripting

Bash script - Print an ascii file using specific font "Latin Modern Mono 12" "regular" "9"

Hello. System : opensuse leap 42.3 I have a bash script that build a text file. I would like the last command doing : print_cmd -o page-left=43 -o page-right=22 -o page-top=28 -o page-bottom=43 -o font=LatinModernMono12:regular:9 some_file.txt where : print_cmd ::= some printing... (1 Reply)
Discussion started by: jcdole
1 Replies
DateTime::Locale::wo(3) 				User Contributed Perl Documentation				   DateTime::Locale::wo(3)

NAME
DateTime::Locale::wo SYNOPSIS
use DateTime; my $dt = DateTime->now( locale => 'wo' ); print $dt->month_name(); DESCRIPTION
This is the DateTime locale package for Wolof. DATA
This locale inherits from the DateTime::Locale::root locale. It contains the following data. Days Wide (format) 2 3 4 5 6 7 1 Abbreviated (format) 2 3 4 5 6 7 1 Narrow (format) 2 3 4 5 6 7 1 Wide (stand-alone) 2 3 4 5 6 7 1 Abbreviated (stand-alone) 2 3 4 5 6 7 1 Narrow (stand-alone) 2 3 4 5 6 7 1 Months Wide (format) 1 2 3 4 5 6 7 8 9 10 11 12 Abbreviated (format) 1 2 3 4 5 6 7 8 9 10 11 12 Narrow (format) 1 2 3 4 5 6 7 8 9 10 11 12 Wide (stand-alone) 1 2 3 4 5 6 7 8 9 10 11 12 Abbreviated (stand-alone) 1 2 3 4 5 6 7 8 9 10 11 12 Narrow (stand-alone) 1 2 3 4 5 6 7 8 9 10 11 12 Quarters Wide (format) Q1 Q2 Q3 Q4 Abbreviated (format) Q1 Q2 Q3 Q4 Narrow (format) 1 2 3 4 Wide (stand-alone) Q1 Q2 Q3 Q4 Abbreviated (stand-alone) Q1 Q2 Q3 Q4 Narrow (stand-alone) 1 2 3 4 Eras Wide BCE CE Abbreviated BCE CE Narrow BCE CE Date Formats Full 2008-02-05T18:30:30 = 3, 2008 2 05 1995-12-22T09:05:02 = 6, 1995 12 22 -0010-09-15T04:44:23 = 7, -10 9 15 Long 2008-02-05T18:30:30 = 2008 2 5 1995-12-22T09:05:02 = 1995 12 22 -0010-09-15T04:44:23 = -10 9 15 Medium 2008-02-05T18:30:30 = 2008 2 5 1995-12-22T09:05:02 = 1995 12 22 -0010-09-15T04:44:23 = -10 9 15 Short 2008-02-05T18:30:30 = 2008-02-05 1995-12-22T09:05:02 = 1995-12-22 -0010-09-15T04:44:23 = -010-09-15 Default 2008-02-05T18:30:30 = 2008 2 5 1995-12-22T09:05:02 = 1995 12 22 -0010-09-15T04:44:23 = -10 9 15 Time Formats Full 2008-02-05T18:30:30 = 18:30:30 UTC 1995-12-22T09:05:02 = 09:05:02 UTC -0010-09-15T04:44:23 = 04:44:23 UTC Long 2008-02-05T18:30:30 = 18:30:30 UTC 1995-12-22T09:05:02 = 09:05:02 UTC -0010-09-15T04:44:23 = 04:44:23 UTC Medium 2008-02-05T18:30:30 = 18:30:30 1995-12-22T09:05:02 = 09:05:02 -0010-09-15T04:44:23 = 04:44:23 Short 2008-02-05T18:30:30 = 18:30 1995-12-22T09:05:02 = 09:05 -0010-09-15T04:44:23 = 04:44 Default 2008-02-05T18:30:30 = 18:30:30 1995-12-22T09:05:02 = 09:05:02 -0010-09-15T04:44:23 = 04:44:23 Datetime Formats Full 2008-02-05T18:30:30 = 3, 2008 2 05 18:30:30 UTC 1995-12-22T09:05:02 = 6, 1995 12 22 09:05:02 UTC -0010-09-15T04:44:23 = 7, -10 9 15 04:44:23 UTC Long 2008-02-05T18:30:30 = 2008 2 5 18:30:30 UTC 1995-12-22T09:05:02 = 1995 12 22 09:05:02 UTC -0010-09-15T04:44:23 = -10 9 15 04:44:23 UTC Medium 2008-02-05T18:30:30 = 2008 2 5 18:30:30 1995-12-22T09:05:02 = 1995 12 22 09:05:02 -0010-09-15T04:44:23 = -10 9 15 04:44:23 Short 2008-02-05T18:30:30 = 2008-02-05 18:30 1995-12-22T09:05:02 = 1995-12-22 09:05 -0010-09-15T04:44:23 = -010-09-15 04:44 Default 2008-02-05T18:30:30 = 2008 2 5 18:30:30 1995-12-22T09:05:02 = 1995 12 22 09:05:02 -0010-09-15T04:44:23 = -10 9 15 04:44:23 Available Formats d (d) 2008-02-05T18:30:30 = 5 1995-12-22T09:05:02 = 22 -0010-09-15T04:44:23 = 15 EEEd (d EEE) 2008-02-05T18:30:30 = 5 3 1995-12-22T09:05:02 = 22 6 -0010-09-15T04:44:23 = 15 7 Hm (H:mm) 2008-02-05T18:30:30 = 18:30 1995-12-22T09:05:02 = 9:05 -0010-09-15T04:44:23 = 4:44 hm (h:mm a) 2008-02-05T18:30:30 = 6:30 PM 1995-12-22T09:05:02 = 9:05 AM -0010-09-15T04:44:23 = 4:44 AM Hms (H:mm:ss) 2008-02-05T18:30:30 = 18:30:30 1995-12-22T09:05:02 = 9:05:02 -0010-09-15T04:44:23 = 4:44:23 hms (h:mm:ss a) 2008-02-05T18:30:30 = 6:30:30 PM 1995-12-22T09:05:02 = 9:05:02 AM -0010-09-15T04:44:23 = 4:44:23 AM M (L) 2008-02-05T18:30:30 = 2 1995-12-22T09:05:02 = 12 -0010-09-15T04:44:23 = 9 Md (M-d) 2008-02-05T18:30:30 = 2-5 1995-12-22T09:05:02 = 12-22 -0010-09-15T04:44:23 = 9-15 MEd (E, M-d) 2008-02-05T18:30:30 = 3, 2-5 1995-12-22T09:05:02 = 6, 12-22 -0010-09-15T04:44:23 = 7, 9-15 MMM (LLL) 2008-02-05T18:30:30 = 2 1995-12-22T09:05:02 = 12 -0010-09-15T04:44:23 = 9 MMMd (MMM d) 2008-02-05T18:30:30 = 2 5 1995-12-22T09:05:02 = 12 22 -0010-09-15T04:44:23 = 9 15 MMMEd (E MMM d) 2008-02-05T18:30:30 = 3 2 5 1995-12-22T09:05:02 = 6 12 22 -0010-09-15T04:44:23 = 7 9 15 MMMMd (MMMM d) 2008-02-05T18:30:30 = 2 5 1995-12-22T09:05:02 = 12 22 -0010-09-15T04:44:23 = 9 15 MMMMEd (E MMMM d) 2008-02-05T18:30:30 = 3 2 5 1995-12-22T09:05:02 = 6 12 22 -0010-09-15T04:44:23 = 7 9 15 ms (mm:ss) 2008-02-05T18:30:30 = 30:30 1995-12-22T09:05:02 = 05:02 -0010-09-15T04:44:23 = 44:23 y (y) 2008-02-05T18:30:30 = 2008 1995-12-22T09:05:02 = 1995 -0010-09-15T04:44:23 = -10 yM (y-M) 2008-02-05T18:30:30 = 2008-2 1995-12-22T09:05:02 = 1995-12 -0010-09-15T04:44:23 = -10-9 yMEd (EEE, y-M-d) 2008-02-05T18:30:30 = 3, 2008-2-5 1995-12-22T09:05:02 = 6, 1995-12-22 -0010-09-15T04:44:23 = 7, -10-9-15 yMMM (y MMM) 2008-02-05T18:30:30 = 2008 2 1995-12-22T09:05:02 = 1995 12 -0010-09-15T04:44:23 = -10 9 yMMMEd (EEE, y MMM d) 2008-02-05T18:30:30 = 3, 2008 2 5 1995-12-22T09:05:02 = 6, 1995 12 22 -0010-09-15T04:44:23 = 7, -10 9 15 yMMMM (y MMMM) 2008-02-05T18:30:30 = 2008 2 1995-12-22T09:05:02 = 1995 12 -0010-09-15T04:44:23 = -10 9 yQ (y Q) 2008-02-05T18:30:30 = 2008 1 1995-12-22T09:05:02 = 1995 4 -0010-09-15T04:44:23 = -10 3 yQQQ (y QQQ) 2008-02-05T18:30:30 = 2008 Q1 1995-12-22T09:05:02 = 1995 Q4 -0010-09-15T04:44:23 = -10 Q3 Miscellaneous Prefers 24 hour time? Yes Local first day of the week 2 SUPPORT
See DateTime::Locale. AUTHOR
Dave Rolsky <autarch@urth.org> COPYRIGHT
Copyright (c) 2008 David Rolsky. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. This module was generated from data provided by the CLDR project, see the LICENSE.cldr in this distribution for details on the CLDR data's license. perl v5.18.2 2017-10-06 DateTime::Locale::wo(3)
All times are GMT -4. The time now is 01:18 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy