Sponsored Content
Top Forums Shell Programming and Scripting Find duplicates in the first column of text file Post 302432822 by alister on Sunday 27th of June 2010 10:03:59 AM
Old 06-27-2010
A single-pass version (increased ram requirement since all lines of the file are stored for END use):
Code:
 awk '{a[NR]=$0; a[NR,"k"]=$1; k[$1]++} END {for (i=1; i<=NR; i++) if (k[a[i,"k"]] > 1) print a[i]}' data

Regards,
Alister
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to find the number of column in the text file...?

Hi, i have text file with ~ seperated columns. it is very huge size of file, in the file sompulsary supposed to has 20 columns with ~ seperated. so how can i find if the file has 20 column in the all rows...? Sample file: APA+VU~10~~~~~03~101~101~~~APA.N O 20081017 120.00... (1 Reply)
Discussion started by: psiva_arul
1 Replies

2. UNIX for Dummies Questions & Answers

Remove duplicates based on a column in fixed width file

Hi, How to output the duplicate record to another file. We say the record is duplicate based on a column whose position is from 2 and its length is 11 characters. The file is a fixed width file. ex of Record: DTYU12333567opert tjhi kkklTRG9012 The data in bold is the key on which... (1 Reply)
Discussion started by: Qwerty123
1 Replies

3. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Given a file such as this I need to remove the duplicates. 00060011 PAUL BOWSTEIN ad_waq3_921_20100826_010517.txt 00060011 PAUL BOWSTEIN ad_waq3_921_20100827_010528.txt 0624-01 RUT CORPORATION ad_sade3_10_20100827_010528.txt 0624-01 RUT CORPORATION ... (13 Replies)
Discussion started by: script_op2a
13 Replies

4. UNIX for Dummies Questions & Answers

CSV file:Find duplicates, save original and duplicate records in a new file

Hi Unix gurus, Maybe it is too much to ask for but please take a moment and help me out. A very humble request to you gurus. I'm new to Unix and I have started learning Unix. I have this project which is way to advanced for me. File format: CSV file File has four columns with no header... (8 Replies)
Discussion started by: arvindosu
8 Replies

5. Red Hat

How to find a garbage entry in a column wise text file in Linux?

Suppose I have a file containing :- 1 Apple $50 2 Orange $30 3 Banana $10 4 Guava $25 5 Pine@apple $12 6 Strawberry $21 7 Grapes $12 In the 5th row, @ character inserted. I want through sort command or by any other way this row should either on top or bottom. By sort command garbage... (1 Reply)
Discussion started by: Dipankar Mitra
1 Replies

6. Shell Programming and Scripting

Find duplicates in column 1 and merge their lines (awk?)

Hi, I have a file (sorted by sort) with 8 tab delimited columns. The first column contains duplicated fields and I need to merge all these identical lines. My input file: comp100002 aaa bbb ccc ddd eee fff ggg comp100003 aba aba aba aba aba aba aba comp100003 fff fff fff fff fff fff fff... (5 Replies)
Discussion started by: falcox
5 Replies

7. Shell Programming and Scripting

Find duplicates in 2 & 3rd column and their ID

with below given format, I have been trying to find out all IDs for those entries with duplicate names in 2nd and 3rd columns and their count like how many time duplication happened for any name if any, 0.237788 Aaban Aahva 0.291066 Aabheer Aahlaad 0.845814 Aabid Aahan 0.152208 Aadam... (6 Replies)
Discussion started by: busyboy
6 Replies

8. Shell Programming and Scripting

awk to Sum columns when other column has duplicates and append one column value to another with Care

Hi Experts, Please bear with me, i need help I am learning AWk and stuck up in one issue. First point : I want to sum up column value for column 7, 9, 11,13 and column15 if rows in column 5 are duplicates.No action to be taken for rows where value in column 5 is unique. Second point : For... (1 Reply)
Discussion started by: as7951
1 Replies

9. UNIX for Beginners Questions & Answers

Find duplicates in file with line numbers

Hello All, This is a noob question. I tried searching for the answer but the answer found did not help me . I have a file that can have duplicates. 100 200 300 400 100 150 the number 100 is duplicated twice. I want to find the duplicate along with the line number. expected... (4 Replies)
Discussion started by: vatigers
4 Replies
Template::Plugin::Table(3)				User Contributed Perl Documentation				Template::Plugin::Table(3)

NAME
Template::Plugin::Table - Plugin to present data in a table SYNOPSIS
[% USE table(list, rows=n, cols=n, overlap=n, pad=0) %] [% FOREACH item IN table.row(n) %] [% item %] [% END %] [% FOREACH item IN table.col(n) %] [% item %] [% END %] [% FOREACH row IN table.rows %] [% FOREACH item IN row %] [% item %] [% END %] [% END %] [% FOREACH col IN table.cols %] [% col.first %] - [% col.last %] ([% col.size %] entries) [% END %] DESCRIPTION
The "Table" plugin allows you to format a list of data items into a virtual table. When you create a "Table" plugin via the "USE" directive, simply pass a list reference as the first parameter and then specify a fixed number of rows or columns. [% USE Table(list, rows=5) %] [% USE table(list, cols=5) %] The "Table" plugin name can also be specified in lower case as shown in the second example above. You can also specify an alternative variable name for the plugin as per regular Template Toolkit syntax. [% USE mydata = table(list, rows=5) %] The plugin then presents a table based view on the data set. The data isn't actually reorganised in any way but is available via the "row()", "col()", "rows()" and "cols()" as if formatted into a simple two dimensional table of "n" rows x "n" columns. So if we had a sample "alphabet" list contained the letters '"a"' to '"z"', the above "USE" directives would create plugins that represented the following views of the alphabet. [% USE table(alphabet, ... %] rows=5 cols=5 a f k p u z a g m s y b g l q v b h n t z c h m r w c i o u d i n s x d j p v e j o t y e k q w f l r x We can request a particular row or column using the "row()" and "col()" methods. [% USE table(alphabet, rows=5) %] [% FOREACH item = table.row(0) %] # [% item %] set to each of [ a f k p u z ] in turn [% END %] [% FOREACH item = table.col(2) %] # [% item %] set to each of [ m n o p q r ] in turn [% END %] Data in rows is returned from left to right, columns from top to bottom. The first row/column is 0. By default, rows or columns that contain empty values will be padded with the undefined value to fill it to the same size as all other rows or columns. For example, the last row (row 4) in the first example would contain the values "[ e j o t y undef ]". The Template Toolkit will safely accept these undefined values and print a empty string. You can also use the IF directive to test if the value is set. [% FOREACH item = table.row(4) %] [% IF item %] Item: [% item %] [% END %] [% END %] You can explicitly disable the "pad" option when creating the plugin to returned shortened rows/columns where the data is empty. [% USE table(alphabet, cols=5, pad=0) %] [% FOREACH item = table.col(4) %] # [% item %] set to each of 'y z' [% END %] The "rows()" method returns all rows/columns in the table as a reference to a list of rows (themselves list references). The "row()" methods when called without any arguments calls "rows()" to return all rows in the table. Ditto for "cols()" and "col()". [% USE table(alphabet, cols=5) %] [% FOREACH row = table.rows %] [% FOREACH item = row %] [% item %] [% END %] [% END %] The Template Toolkit provides the "first", "last" and "size" virtual methods that can be called on list references to return the first/last entry or the number of entries in a list. The following example shows how we might use this to provide an alphabetical index split into 3 even parts. [% USE table(alphabet, cols=3, pad=0) %] [% FOREACH group = table.col %] [ [% group.first %] - [% group.last %] ([% group.size %] letters) ] [% END %] This produces the following output: [ a - i (9 letters) ] [ j - r (9 letters) ] [ s - z (8 letters) ] We can also use the general purpose "join" virtual method which joins the items of the list using the connecting string specified. [% USE table(alphabet, cols=5) %] [% FOREACH row = table.rows %] [% row.join(' - ') %] [% END %] Data in the table is ordered downwards rather than across but can easily be transformed on output. For example, to format our data in 5 columns with data ordered across rather than down, we specify "rows=5" to order the data as such: a f . . b g . c h d i e j and then iterate down through each column (a-e, f-j, etc.) printing the data across. a b c d e f g h i j . . . Example code to do so would be much like the following: [% USE table(alphabet, rows=3) %] [% FOREACH cols = table.cols %] [% FOREACH item = cols %] [% item %] [% END %] [% END %] Output: a b c d e f g h i j . . . In addition to a list reference, the "Table" plugin constructor may be passed a reference to a Template::Iterator object or subclass thereof. The Template::Iterator get_all() method is first called on the iterator to return all remaining items. These are then available via the usual Table interface. [% USE DBI(dsn,user,pass) -%] # query() returns an iterator [% results = DBI.query('SELECT * FROM alphabet ORDER BY letter') %] # pass into Table plugin [% USE table(results, rows=8 overlap=1 pad=0) -%] [% FOREACH row = table.cols -%] [% row.first.letter %] - [% row.last.letter %]: [% row.join(', ') %] [% END %] AUTHOR
Andy Wardley <abw@wardley.org> <http://wardley.org/> COPYRIGHT
Copyright (C) 1996-2007 Andy Wardley. All Rights Reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. SEE ALSO
Template::Plugin perl v5.12.1 2009-05-20 Template::Plugin::Table(3)
All times are GMT -4. The time now is 12:58 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy