Filtering duplicates based on lookup table and rules Post: 302920533

Sponsored Content

Top Forums Shell Programming and Scripting Filtering duplicates based on lookup table and rules Post 302920533 by ritakadm on Thursday 9th of October 2014 09:48:01 PM

10-09-2014

Registered User

Filtering duplicates based on lookup table and rules

please help solving the following. I have access to redhat linux cluster having 32gigs of ram.

I have duplicate ids for variable names, in the file 1,2 are duplicates;3,4 and 5 are duplicates;6 and 7 are duplicates. My objective is to use only the first occurrence of these duplicates.

Lookup file

Code:

varid varname
1 var1
2 var1
3 varx
4 varx
5 varx
6 vary
7 vary
8 varz

I need to use the following rules to filter the file below per category.
1) If all duplicates ids within a category have the same value, use the first occurrence and print the value.

example input

Code:

3;cat1;val3
4;cat1;val3

example output

Code:

3;cat1;val3

2) If all duplicates within a category do not have the same value, print the first occurrence and print the value as ambiguous.

example input

Code:

3;cat1;val1
4;cat1;val2
5;cat1;val1

example output

Code:

3;cat1;ambiguous

3) If only a single id (out of duplicate ids) is present in a category, then print the row as it is.

Data sample input

Code:

varid;category;value
1;cat1;val1
2;cat1;val2
3;cat1;val3
4;cat1;val3
5;cat1;val3
2;cat2;val2
3;cat2;val3
5;cat2;val3
6;cat2;val3
7;cat2;val4
8;cat2;val4

Filtered sample output

Code:

varid;category;value
1;cat1;ambiguous
3;cat1;val3
2;cat2;val2
3;cat2;val3
6;cat2;ambiguous
8;cat2;val4

Last edited by ritakadm; 10-10-2014 at 12:51 AM.. Reason: added code tags for clarity

ritakadm

View Public Profile for ritakadm

Find all posts by ritakadm

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

lookup table in perl??

hi, i am very much new in perl and have this very basic question in the same:( the requirement is as below: i have an input file (txt file) in which i have fields invoice number and customer number. Now i have to take input this combination of invoice n customer number and check in a...

2. UNIX for Dummies Questions & Answers

HELP with using a lookup table

Using AIX 5.2, Bourne and Korn Shell. I have two flat text files. One is a main file and one is a lookup table that contains a number of letter codes and membership numbers as follows: 316707965EGM01 315672908ANM92 Whenever one of these records from the lookup appears in the main file...

3. Programming

64-bit CRC Transition To Bytewise Lookup-Table

Good Evening, I started working on the 17x17 4-colouring challenge, and I ran into a bit of an I/O snag. It was an enormous headache to detect the differences in very similar 289-char strings. Eventually, it made more sense to associate a CRC-Digest with each colouring. After learning...

4. Shell Programming and Scripting

Sed variable from lookup table

I have a file with the following format --TABLEA_START-- field1=data1;field2=data2;field3=data3 --TABLEA_END-- --TABLEB_START-- field1=data1;field2=data2;field3=data3 --TABLEB_END-- --TABLEA_START-- field1=data1;field2=data2;field3=data3 ...

5. UNIX for Dummies Questions & Answers

string replacement using a lookup table

Dear all thanks for helping in advance.. Know this should be fairly simple but I failed in searching for an answer. I have a file (replacement table) containing two columns, e.g.: ACICJ ACIDIPHILIUM ACIF2 ACIDITHIOBACILLUS ACIF5 ACIDITHIOBACILLUS ACIC5 ACIDOBACTERIUM ACIC1 ACIDOTHERMUS...

6. UNIX for Dummies Questions & Answers

Filtering the duplicates

Hello, I want to filter all the duplicates of a record to one place. Sample input and output will give you better idea. I am new to unix. Can some one help me on this? Input: 7488 7389 chr1.fa chr1.fa 3546 9887 chr5.fa chr9.fa 7387 7898 chrX.fa chr3.fa 7488 7389 chr1.fa chr1.fa...

7. Shell Programming and Scripting

Filtering out duplicates with the highest version number

Hi, I have a huge text file with filenames which which looks like the following ie uniquenumber_version_filename: e.g. 1234_1_xxxx 1234_2_vfvfdbb 343333_1_vfvfdvd 2222222_1_ggggg 55555_1_xxxxxx 55555_2_vrbgbgg 55555_3_grgrbr What I need to do is examine the file, look for...

8. Web Development

Help on filtering the table in HTML

1. how to get the filter option on table so that user can enter the fields which ever they want to print only according to the need ? 2.how to print the full fledge table if there is no value in the rows of the table but it should print the whole rows and column in proper tabular form?

9. Shell Programming and Scripting

PERL "filtering the log file removing the duplicates

Hi folks, I have a log file in the below format and trying to get the output of the unique ones based on mnemonic IN PERL. Could any one please let me know with the code and the logic ? Severity Mnemonic Log Message 7 CLI_SCHEDULER Logfile for scheduled CLI...

10. Shell Programming and Scripting

Korn shell - lookup table

Hi All I need to pass country code into a pipe delimited file for lookup. It will search country code (column 3) in the file, if the country code matched, it will return value from other columns. Here is my mapping file. #CountryName|CountryRegion|CountryCode-3|CountryCode-2...

LEARN ABOUT CENTOS

locale::codes::langext

Locale::Codes::LangExt(3)				User Contributed Perl Documentation				 Locale::Codes::LangExt(3)

NAME

       Locale::Codes::LangExt - standard codes for language extension identification

SYNOPSIS

	  use Locale::Codes::LangExt;

	  $lext = code2langext('acm');		       # $lext gets 'Mesopotamian Arabic'
	  $code = langext2code('Mesopotamian Arabic'); # $code gets 'acm'

	  @codes   = all_langext_codes();
	  @names   = all_langext_names();

DESCRIPTION

       The "Locale::Codes::LangExt" module provides access to standard codes used for identifying language extensions, such as those as defined in
       the IANA language registry.

       Most of the routines take an optional additional argument which specifies the code set to use. If not specified, the default IANA language
       registry codes will be used.

SUPPORTED CODE SETS

       There are several different code sets you can use for identifying language extensions. A code set may be specified using either a name, or
       a constant that is automatically exported by this module.

       For example, the two are equivalent:

	  $lext = code2langext('acm','alpha');
	  $lext = code2langext('acm',LOCALE_LANGEXT_ALPHA);

       The codesets currently supported are:

       alpha
	   This is the set of three-letter (lowercase) codes from the IANA language registry, such as 'acm' for Mesopotamian Arabic.

	   This is the default code set.

ROUTINES

       code2langext ( CODE [,CODESET] )
       langext2code ( NAME [,CODESET] )
       langext_code2code ( CODE ,CODESET ,CODESET2 )
       all_langext_codes ( [CODESET] )
       all_langext_names ( [CODESET] )
       Locale::Codes::LangExt::rename_langext  ( CODE ,NEW_NAME [,CODESET] )
       Locale::Codes::LangExt::add_langext  ( CODE ,NAME [,CODESET] )
       Locale::Codes::LangExt::delete_langext  ( CODE [,CODESET] )
       Locale::Codes::LangExt::add_langext_alias  ( NAME ,NEW_NAME )
       Locale::Codes::LangExt::delete_langext_alias  ( NAME )
       Locale::Codes::LangExt::rename_langext_code  ( CODE ,NEW_CODE [,CODESET] )
       Locale::Codes::LangExt::add_langext_code_alias  ( CODE ,NEW_CODE [,CODESET] )
       Locale::Codes::LangExt::delete_langext_code_alias  ( CODE [,CODESET] )
	   These routines are all documented in the Locale::Codes::API man page.

SEE ALSO

       Locale::Codes
	   The Locale-Codes distribution.

       Locale::Codes::API
	   The list of functions supported by this module.

       http://www.iana.org/assignments/language-subtag-registry
	   The IANA language subtag registry.

AUTHOR

       See Locale::Codes for full author history.

       Currently maintained by Sullivan Beck (sbeck@cpan.org).

COPYRIGHT

	  Copyright (c) 2011-2013 Sullivan Beck

       This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

perl v5.16.3							    2013-02-27						 Locale::Codes::LangExt(3)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

lookup table in perl??

Discussion started by: Bhups

2. UNIX for Dummies Questions & Answers

HELP with using a lookup table

Discussion started by: Dolph

3. Programming

64-bit CRC Transition To Bytewise Lookup-Table

Discussion started by: HeavyJ

4. Shell Programming and Scripting

Sed variable from lookup table

Discussion started by: milo7

5. UNIX for Dummies Questions & Answers

string replacement using a lookup table

Discussion started by: roussine

6. UNIX for Dummies Questions & Answers

Filtering the duplicates

Discussion started by: koneru_18

7. Shell Programming and Scripting

Filtering out duplicates with the highest version number

Discussion started by: mantis

8. Web Development

Help on filtering the table in HTML

Discussion started by: sidhi

9. Shell Programming and Scripting

PERL "filtering the log file removing the duplicates

Discussion started by: scriptscript

10. Shell Programming and Scripting

Korn shell - lookup table

Discussion started by: lafrance

LEARN ABOUT CENTOS

locale::codes::langext