Sponsored Content
Top Forums Shell Programming and Scripting Remove duplicate based on Group Post 302610771 by yale_work on Wednesday 21st of March 2012 07:34:44 PM
Old 03-21-2012
Remove duplicate based on Group

Hi,

How can I remove duplicates from a file based on group on other column? for example:

Test1|Test2|Test3|Test4|Test5
Test1|Test6|Test7|Test8|Test5
Test1|Test9|Test10|Test11|Test12
Test1|Test13|Test14|Test15|Test16
Test17|Test18|Test19|Test20|Test21
Test17|Test22|Test23|Test24|Test5



First we need to look at column 1 and then remove the duplicate rows based on column 5. Column 1 has two groups Test1 and Test17 so we have to find duplicates in column 5 based on column 1. Output of this file is:

Test1|Test2|Test3|Test4|Test5
Test1|Test9|Test10|Test11|Test12
Test1|Test13|Test14|Test15|Test16
Test17|Test18|Test19|Test20|Test21
Test17|Test22|Test23|Test24|Test5
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Remove duplicate rows of a file based on a value of a column

Hi, I am processing a file and would like to delete duplicate records as indicated by one of its column. e.g. COL1 COL2 COL3 A 1234 1234 B 3k32 2322 C Xk32 TTT A NEW XX22 B 3k32 ... (7 Replies)
Discussion started by: risk_sly
7 Replies

2. Shell Programming and Scripting

Remove duplicate files based on text string?

Hi I have been struggling with a script for removing duplicate messages from a shared mailbox. I would like to search for duplicate messages based on the “Message-ID” string within the messages files. I have managed to find the duplicate “Message-ID” strings and (if I would like) delete... (1 Reply)
Discussion started by: spangberg
1 Replies

3. UNIX for Dummies Questions & Answers

How to get remove duplicate of a file based on many conditions

Hii Friends.. I have a huge set of data stored in a file.Which is as shown below a.dat: RAO 1869 12 19 0 0 0.00 17.9000 82.3000 10.0 0 0.00 0 3.70 0.00 0.00 0 0.00 3.70 4 NULL LEE 1870 4 11 1 0 0.00 30.0000 99.0000 0.0 0 0.00 0 0.00 0.00 0.00 0 ... (3 Replies)
Discussion started by: reva
3 Replies

4. UNIX for Dummies Questions & Answers

Remove duplicate rows when >10 based on single column value

Hello, I'm trying to delete duplicates when there are more than 10 duplicates, based on the value of the first column. e.g. a 1 a 2 a 3 b 1 c 1 gives b 1 c 1 but requires 11 duplicates before it deletes. Thanks for the help Video tutorial on how to use code tags in The UNIX... (11 Replies)
Discussion started by: informaticist
11 Replies

5. Shell Programming and Scripting

Remove duplicate lines based on field and sort

I have a csv file that I would like to remove duplicate lines based on field 1 and sort. I don't care about any of the other fields but I still wanna keep there data intact. I was thinking I could do something like this but I have no idea how to print the full line with this. Please show any method... (8 Replies)
Discussion started by: cokedude
8 Replies

6. Shell Programming and Scripting

Remove duplicate value based on two field $4 and $5

Hi All, i have input file like below... CA009156;20091003;M;AWBKCA72;123;;CANADIAN WESTERN BANK;EDMONTON;;2300, 10303, JASPER AVENUE;;T5J 3X6;; CA009156;20091003;M;AWBKCA72;321;;CANADIAN WESTERN BANK;EDMONTON;;2300, 10303, JASPER AVENUE;;T5J 3X6;; CA009156;20091003;M;AWBKCA72;231;;CANADIAN... (2 Replies)
Discussion started by: mohan sharma
2 Replies

7. Shell Programming and Scripting

How To Remove Duplicate Based on the Value?

Hi , Some time i got duplicated value in my files , bundle_identifier= B Sometext=ABC bundle_identifier= A bundle_unit=500 Sometext123=ABCD bundle_unit=400 i need to check if there is a duplicated values or not if yes , i need to check if the value is A or B when Bundle_Identified ,... (2 Replies)
Discussion started by: OTNA
2 Replies

8. Shell Programming and Scripting

Remove duplicate entries based on the range

I have file like this: chr start end chr15 99874874 99875874 chr15 99875173 99876173 aa1 chr15 99874923 99875923 chr15 99875173 99876173 aa1 chr15 99874962 99875962 chr15 99875173 99876173 aa1 chr1 ... (7 Replies)
Discussion started by: raj_k
7 Replies

9. Shell Programming and Scripting

Remove duplicate rows based on one column

Dear members, I need to filter a file based on the 8th column (that is id), and does not mather the other columns, because I want just one id (1 line of each id) and remove the duplicates lines based on this id (8th column), and does not matter wich duplicate will be removed. example of my file... (3 Replies)
Discussion started by: clarissab
3 Replies

10. Shell Programming and Scripting

Remove sections based on duplicate first line

Hi, I have a file with many sections in it. Each section is separated by a blank line. The first line of each section would determine if the section is duplicate or not. if the section is duplicate then remove the entire section from the file. below is the example of input and output.... (5 Replies)
Discussion started by: ahmedwaseem2000
5 Replies
Devel::GraphVizProf(3pm)				User Contributed Perl Documentation				  Devel::GraphVizProf(3pm)

NAME
Devel::GraphVizProf - per-line Perl profiler (with graph output) SYNOPSIS
perl -d:GraphVizProf test.pl > test.dot dot -Tpng test.dot > test.png DESCRIPTION
NOTE: This module is a hack of Devel::SmallProf by Ted Ashton. It has been modified by Leon Brocard to produce output for GraphViz, but otherwise the only thing I have done is change the name. I hope to get my patches put into the main Devel::SmallProf code eventually, or alternatively read the output of Devel::SmallProf. Anyway, the normal documentation, which you can probably ignore, follows. The Devel::GraphVizProf profiler is focused on the time taken for a program run on a line-by-line basis. It is intended to be as "small" in terms of impact on the speed and memory usage of the profiled program as possible and also in terms of being simple to use. Those statistics are placed in the file smallprof.out in the following format: <num> <time> <ctime> <line>:<text> where <num> is the number of times that the line was executed, <time> is the amount of "wall time" (time according the the clock on the wall vs. cpu time) spent executing it, <ctime> is the amount of cpu time expended on it and <line> and <text> are the line number and the actual text of the executed line (read from the file). The package uses the debugging hooks in Perl and thus needs the -d switch, so to profile test.pl, use the command: perl5 -d:GraphVizProf test.pl Once the script is done, the statistics in smallprof.out can be sorted to show which lines took the most time. The output can be sorted to find which lines take the longest, either with the sort command: sort -k 2nr,2 smallprof.out | less or a perl script: open(PROF,"smallprof.out"); @sorted = sort {(split(/s+/,$b))[2] <=> (split(/s+/,$a))[2]} <PROF>; close PROF; print join('',@sorted); NOTES
o The "wall time" readings come from Time::HiRes and are reasonably useful, at least on my system. The cpu times come from the 'times' built-in and the granularity is not necessarily as small as with the wall time. On some systems this column may be useful. On others it may not. o GraphVizProf does attempt to make up for its shortcomings by subtracting a small amount from each timing (null time compensation). This should help somewhat with the accuracy. o GraphVizProf depends on the Time::HiRes package to do its timings. It claims to require version 1.20, but may work with earlier versions, depending on your platform. OPTIONS
GraphVizProf has 3 variables which can be used during your script to affect what gets profiled. o If you do not wish to see lines which were never called, set the variable "$DB::drop_zeros = 1". With "drop_zeros" set, GraphVizProf can be used for basic coverage analysis. o To turn off profiling for a time, insert a "$DB::profile = 0" into your code (profiling may be turned back on with "$DB::profile = 1"). All of the time between profiling being turned off and back on again will be lumped together and reported on the "$DB::profile = 0" line. This can be used to summarize a subroutine call or a chunk of code. o To only profile code in a certain package, set the %DB::packages array. For example, to see only the code in packages "main" and "Test1", do this: %DB::packages = ( 'main' => 1, 'Test1' => 1 ); o These variables can be put in a file called .smallprof in the current directory. For example, a .smallprof containing $DB::drop_zeros = 1; $DB::profile = 0; will set GraphVizProf to not report lines which are never touched for any file profiled in that directory and will set profiling off initially (presumably to be turned on only for a small portion of code). INSTALLATION
Just the usual perl Makefile.PL make make test make install and should install fine via the CPAN module. BUGS
Subroutine calls are currently not under the control of %DB::packages. This should not be a great inconvenience in general. The handling of evals is bad news. This is due to Perl's handling of evals under the -d flag. For certain evals, caller() returns '(eval n)' for the filename and for others it doesn't. For some of those which it does, the array "@{'_<filename'}" contains the code of the eval. For others it doesn't. Sometime, when I've an extra tuit or two, I'll figure out why and how I can compensate for this. Comments, advice and questions are welcome. If you see inefficent stuff in this module and have a better way, please let me know. AUTHOR
Ted Ashton <ashted@southern.edu> GraphVizProf was developed from code originally posted to usenet by Philippe Verdret <philippe.verdret@sonovision-itep.fr>. Special thanks to Geoffrey Broadwell <habusan2@sprynet.com> for his assistance on the Win32 platform and to Philippe for his patient assistance in testing and debugging. Copyright (c) 1997 Ted Ashton This module is free software and can be redistributed and/or modified under the same terms as Perl itself. SEE ALSO
Devel::DProf, Time::HiRes. perl v5.14.2 2012-04-02 Devel::GraphVizProf(3pm)
All times are GMT -4. The time now is 07:59 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy