Filtering duplicates based on lookup table and rules
please help solving the following. I have access to redhat linux cluster having 32gigs of ram.
I have duplicate ids for variable names, in the file 1,2 are duplicates;3,4 and 5 are duplicates;6 and 7 are duplicates. My objective is to use only the first occurrence of these duplicates.
Lookup file
I need to use the following rules to filter the file below per category.
1) If all duplicates ids within a category have the same value, use the first occurrence and print the value.
example input
example output
2) If all duplicates within a category do not have the same value, print the first occurrence and print the value as ambiguous.
example input
example output
3) If only a single id (out of duplicate ids) is present in a category, then print the row as it is.
Data sample input
Filtered sample output
Last edited by ritakadm; 10-10-2014 at 12:51 AM..
Reason: added code tags for clarity
hi,
i am very much new in perl and have this very basic question in the same:(
the requirement is as below:
i have an input file (txt file) in which i have fields invoice number and customer number. Now i have to take input this combination of invoice n customer number and check in a... (2 Replies)
Using AIX 5.2, Bourne and Korn Shell.
I have two flat text files. One is a main file and one is a lookup table that contains a number of letter codes and membership numbers as follows:
316707965EGM01
315672908ANM92
Whenever one of these records from the lookup appears in the main file... (6 Replies)
Good Evening,
I started working on the 17x17 4-colouring challenge, and I ran into a bit of an I/O snag.
It was an enormous headache to detect the differences in very similar 289-char strings.
Eventually, it made more sense to associate a CRC-Digest with each colouring.
After learning... (0 Replies)
I have a file with the following format
--TABLEA_START--
field1=data1;field2=data2;field3=data3
--TABLEA_END--
--TABLEB_START--
field1=data1;field2=data2;field3=data3
--TABLEB_END--
--TABLEA_START--
field1=data1;field2=data2;field3=data3
... (0 Replies)
Dear all thanks for helping in advance.. Know this should be fairly simple but I failed in searching for an answer.
I have a file (replacement table) containing two columns, e.g.:
ACICJ ACIDIPHILIUM
ACIF2 ACIDITHIOBACILLUS
ACIF5 ACIDITHIOBACILLUS
ACIC5 ACIDOBACTERIUM
ACIC1 ACIDOTHERMUS... (10 Replies)
Hello,
I want to filter all the duplicates of a record to one place. Sample input and output will give you better idea.
I am new to unix. Can some one help me on this?
Input:
7488 7389 chr1.fa chr1.fa
3546 9887 chr5.fa chr9.fa
7387 7898 chrX.fa chr3.fa
7488 7389 chr1.fa chr1.fa... (2 Replies)
Hi,
I have a huge text file with filenames which which looks like the following ie uniquenumber_version_filename:
e.g.
1234_1_xxxx
1234_2_vfvfdbb
343333_1_vfvfdvd
2222222_1_ggggg
55555_1_xxxxxx
55555_2_vrbgbgg
55555_3_grgrbr
What I need to do is examine the file, look for... (4 Replies)
1. how to get the filter option on table so that user can enter the fields which ever they want to print only according to the need ?
2.how to print the full fledge table if there is no value in the rows of the table but it should print the whole rows and column in proper tabular form? (2 Replies)
Hi folks,
I have a log file in the below format and trying to get the output of the unique ones based on mnemonic IN PERL.
Could any one please let me know with the code and the logic ?
Severity Mnemonic Log Message
7 CLI_SCHEDULER Logfile for scheduled CLI... (3 Replies)
Hi All
I need to pass country code into a pipe delimited file for lookup.
It will search country code (column 3) in the file, if the country code matched, it will return value from other columns.
Here is my mapping file.
#CountryName|CountryRegion|CountryCode-3|CountryCode-2... (5 Replies)
Discussion started by: lafrance
5 Replies
LEARN ABOUT SUSE
msguniq
MSGUNIQ(1) GNU MSGUNIQ(1)NAME
msguniq - unify duplicate translations in message catalog
SYNOPSIS
msguniq [OPTION] [INPUTFILE]
DESCRIPTION
Unifies duplicate translations in a translation catalog. Finds duplicate translations of the same message ID. Such duplicates are invalid
input for other programs like msgfmt, msgmerge or msgcat. By default, duplicates are merged together. When using the --repeated option,
only duplicates are output, and all other messages are discarded. Comments and extracted comments will be cumulated, except that if
--use-first is specified, they will be taken from the first translation. File positions will be cumulated. When using the --unique
option, duplicates are discarded.
Mandatory arguments to long options are mandatory for short options too.
Input file location:
INPUTFILE
input PO file
-D, --directory=DIRECTORY
add DIRECTORY to list for input files search
If no input file is given or if it is -, standard input is read.
Output file location:
-o, --output-file=FILE
write output to specified file
The results are written to standard output if no output file is specified or if it is -.
Message selection:
-d, --repeated
print only duplicates
-u, --unique
print only unique messages, discard duplicates
Input file syntax:
-P, --properties-input
input file is in Java .properties syntax
--stringtable-input
input file is in NeXTstep/GNUstep .strings syntax
Output details:
-t, --to-code=NAME
encoding for output
--use-first
use first available translation for each message, don't merge several translations
-e, --no-escape
do not use C escapes in output (default)
-E, --escape
use C escapes in output, no extended chars
--force-po
write PO file even if empty
-i, --indent
write the .po file using indented style
--no-location
do not write '#: filename:line' lines
-n, --add-location
generate '#: filename:line' lines (default)
--strict
write out strict Uniforum conforming .po file
-p, --properties-output
write out a Java .properties file
--stringtable-output
write out a NeXTstep/GNUstep .strings file
-w, --width=NUMBER
set output page width
--no-wrap
do not break long message lines, longer than the output page width, into several lines
-s, --sort-output
generate sorted output
-F, --sort-by-file
sort output by file location
Informative output:
-h, --help
display this help and exit
-V, --version
output version information and exit
AUTHOR
Written by Bruno Haible.
REPORTING BUGS
Report bugs to <bug-gnu-gettext@gnu.org>.
COPYRIGHT
Copyright (C) 2001-2007 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.
SEE ALSO
The full documentation for msguniq is maintained as a Texinfo manual. If the info and msguniq programs are properly installed at your
site, the command
info msguniq
should give you access to the complete manual.
GNU gettext-tools 0.17 November 2007 MSGUNIQ(1)